6,776 Matching Annotations
  1. Jul 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      (1) In Figure 1, the authors show that TF3C binds to the amino terminus of MYCN (Myc box I region), as shown previously. The data in Figure 1 B-D support, but do not rigorously confirm a 'direct' interaction because it has not been ruled out that accessory proteins mediating the association may be present in the mixture.

      In Figure 1B-D we have purified MYCN and the TFIIIC/TauA complex separately and then mixed the purified preparations, demonstrating that the purified proteins interact. We have additionally performed mass spectrometry, which shows that the TauA/MYCN complex is formed without further accessory proteins, as the molecular weight would be higher. Based on the Coomassie stained SDS-PAGE gels, there is no plausible contaminating band in the purified complex that could be mediating the interaction between MYCN and TauA, either in the purified complex (Figure 1C), or in the purified protein used to reconstitute the complex (Figure S1A & S1B).

      (2) The authors indicate in Figure 2 that TF3C has essentially no effect on MYCNdependent gene expression and/or transcription elongation. Yet a previous study (PMID: 29262328) associated with several of the same authors concluded that TF3C positively affects transcription elongation. The authors make no attempt to reconcile these disparate results and need to clarify this point.

      We agree that the data in this manuscript do not support the role on transcription elongation. This point was also raised by Reviewer 3. Comparing our new results to the data published previously we can summarize that the data sets in the two studies show three key results: First, the traveling ratio of RNAPII changes upon induction of MYCN. Second, RNAPII decreases at the transcription start side and third, it increases towards the end side.

      We agree that in the previous study we linked the traveling ratio directly to elongation. However performing ChIP-seq with different RNAPII antibodies showed us that for example RNAPII (N20), which is unfortunately discontinued, gives different results compared to RNAPII (A10). Combining our new results using the RNAPII (8WG16) antibody shows that the traveling ratio is not only reflecting transcription elongation but also includes that the RNAPII is kicked-off chromatin at the start side.

      (3) Figures 2B and C show that unphosphorylated pol2 is TSS-centered, and Ser2-P pol2 occupation is centered beyond the TES. From this data, however, the reader can't tell how much of the phospho-Ser2- pol2 is centered on the TSS. The authors should include overall plots over TSS and TES, and also perhaps the gene-body to allow a better comparison for TSS and TES plotted for both antibodies over the collected gene sets.

      We focused on the TSS for unphosphorylated RNAPII and the TES for pSer2-RNAPII, as these are the regions with specific enrichment of the respective antibodies. As requested for comparison, we now include metagenes showing TSS, gene-body, and TES for both antibodies as new Figure S2A and B. Additionally, we included density plots for unphosphorylated RNAPII at the TES as well as for pSer2-RNAPII at the TSS as a Figure for the Reviewers (Figure 1).

      (4) The authors see more TF3C at promoters in cells with MYCN (Figure 2F). What are the levels of TF3C in the absence and presence of MYCN?

      As shown in the immunoblot in Figure S1E, TF3C5 levels do not change upon induction of MYCN. We therefore think that MYCN helps to recruit TFIIIC5 to RNAPII promoter sites. This is also in accordance to what we previously reported 1.

      (5) The finding that TF3C is increased at TSS (Figure 2F) doesn't necessarily indicate that 1) MYCN is recruiting TF3C there, and 2) that this is due to the phosphorylation status of pol2. It could mean many other things. The logic of conflating these 3 points based on the data shown is questionable.

      We showed previously that knock-down of MYCN affects TFIIIC5 binding, showing that MYCN is required for binding of TFIIIC5 at promoter sites 1.

      Additionally, we included data with DRB treated cells (Figure 2F), which prevents RNAPII loading by preventing downstream de novo elongation. Those data show that TFIIIC5 binding at the TSS is massively increased upon induction of MYCN and additionally upon treatment with DRB. Conversely, we observed that the major effect of TFIIIC knock-down was at the nonphosphorylated RNAPII at the TSS on MYCN induction (Figure 2B). Therefore, we would argue that our assumption fits well to the data presented in the manuscript.

      (6) Figure 3A doesn't add much to the paper, as it is overplotted and no relationship is clear, except that Pol2 and MYCN occupy many of the same sites. Perhaps a less complex or different type of plot would allow the interactions to be better visible.

      We agree with the comment and since in another comment we were asked to show the same window for all shown Hi-ChIP data plots, we changed Figure 3A.

      (7) That depletion of TF3C leads to increased promoter hubs may or may not have anything to do with its association with MYCN (Figure 4E). This could be a direct consequence of its known structural function in cohesin complexes, and the MYCN changes as a secondary consequence of this (also see point 4, above).

      As shown in Büchel et al. (2017) 1 MYCN is needed to recruit RAD21 and depletion of RAD21 has no impact on the recruitment of MYCN. Since RAD21 is part of the cohesin complex we would exclude that the MYCN changes are a secondary consequence.

      (8) Depletion of TF3C5 results in a loss of EXOSC5 (exosome) at TSS in the presence and absence of MYCN (Figure 5B). As TF3C5 is a cohesin, could this simply be a consequence of genomic structure changes?

      We agree that the discovered changes in EXOSC5 can be due to depletion of TFIIIC5. TFIIIC has been shown to recruit cohesin 1 and condensin complexes 2, as well as inducing chromatin architectural changes 3. However, MYCN is needed to recruit TFIIIC and depletion of TFIIIC had no impact on MYCN recruitment 1. Furthermore, MYCN has been shown to recruit exosome 4. Therefore, we would argue that either MYCN can directly play a role or thru chromatin architectural changes.

      (9) The authors suggest that RNA dynamics are affected by changes in exosome function (RNA degradation, etc). What effect, if any does TF3C depletion have on the overall gene expression profile?

      We show in the manuscript that TFIIIC depletion in unperturbed cells has no effect on the global gene expression profile in the time frame analyzed (Figure 2E and S2B).

      Reviewer #2 (Public Review):

      (1) Dynamic inferences are made without kinetic experiments.

      While we agree that we did not collect kinetic data to study the dynamics of RNA polymerase we would argue that the integration of our different data sets make it possible to draw conclusions about dynamic interferences. The transcription cycle and its sequential steps have been well described. In this sense, we use the non-phosphorylated RNAPII data that is situated between RNAPII recruitment and initiation and RNAPII-pSer2 that shows pause-release to elongation to draw conclusions on the dynamic. Likewise, we also made use of our previous published datasets.

      Reviewer #2 (Recommendations For The Authors):  

      (1) A number of changes are reported in hub size, expression, etc. upon treatment with tamoxifen to activate MCN-ER. But MYC is already present in the SHEP cells, so why doesn't MYC support these same phenomena? It would seem that either the ability to cooperate with TFIIIC to clear non-productive polymerase complexes from promoters is particular to MYCN, or else it reflects a quantitative increase in total MYC proteins due to the entry of MYCN-ER into the nucleus with tamoxifen. The authors should address or discuss this issue.

      It could be that protein levels are the limiting factor between MYC and MYCN observed effects in this system. This interpretation would be in accordance with the results of Lorenzin et al. 5, which reported that different levels of MYC had different targets based on the affinity to Eboxes and protein level. A similar profile of MYC levels compared to function was also reported regarding SPT5 6. Those high protein levels mimic what is found in certain tumors in contrast to physiological levels. In this sense, the observed differences can also be between physiological and oncological levels of MYC proteins.

      On the other hand, it has been described both a core MYC- and an isoform specific-signature of target genes. MYCN is described to be involved in gene expression during the S-phase of the cell cycle 7. This suggests that there are differences between MYC and MYCN other than gene sets. The interaction with TFIIIC appears to be one of these differences. We have found multiple TFIIIC subunits as part of the MYCN interactome, but the interaction of TFIIIC with MYC is weaker and we are uncertain how relevant it is 7,8. We show here that depletion of different subunits of the TFIIIC complex show a MYCN-dependent growth defect (Figure 1 E). Similarly, nuclear exosome is a MYCN-specific dependence 4, and we show here that MYCNdependent recruitment of the exosome requires TFIIIC5. We take this as an indication that there is an intrinsic difference between MYC and MYCN and that MYCN engages TFIIIC for this pathway.

      (2) Reciprocal to TFIIIC recruitment to MYCN- rRNA, and other RNAPIII genes. Does this happen targets would be MYCN association with tRNA genes, 5S, and if so, is this association TFIIIC dependent? What happens to the expression of these genes?

      We did observe MYCN in interactions involving tRNA and other RNAPIII sites, such as SINE elements and tRNAs (Figure 4B, 4D, S3F, and S4B). There was no relevant number of 5S rRNA involved in interactions – either because the difficulty to properly map these repetitive regions or due to biology. In any case, none of those regions appeared to be specifically dependent on TFIIIC as the overall number of interactions increased in TFIIIC depletion regardless of the genomic annotation (Figure S4B). Regarding the expression of RNAPIII genes, we are constrained by technical limitations of poly(A) enrichment RNA-seq to globally analyze it in an unbiased way. However, we addressed this point for tRNAs expression in an earlier work 1 and found that tRNA levels do not change upon TFIIIC depletion. We think this is because tRNAs are stable transcripts and RNAPIII recycling can occur in a TFIIICindependent manner 9. Conversely, we reported no significant expression changes in RNAPII genes upon TFIIIC depletion in this work.

      (3) The authors show that TFIIIC depletion does not alter the RNA-expression profile; how do they account for this? Can they comment on "background" transcription that it would seem should be suppressed by TFIIIC-dependent removal of various hypofunctional polymerases?

      Since TFIIIC is important for the removal of non-functional RNAPII we would not expect changes to the gene expression profile upon depletion of TFIIIC in the time frame analyzed. Monitoring the elongating form of RNAPII by measuring pSer2 indeed shows us that transcription elongation is not affected.

      (4) Global changes in expression are difficult to assess with DESEQ2. This hypernormalizing algorithm is not really suited to distinguish differential, but universal upregulation from some targets being truly upregulated while others are downregulated. The authors should comment.

      The authors acknowledge that DESEQ2 relies on the conjecture that genewise estimates of dispersion are generally unchanged among samples. We address this comment in two different ways. We include those in the Figure for the Reviewers (Figure 2). The first was to sequence samples deeper to avoid any bias created by random effect of lower coverage, the range of total reads increased from 6.8-9.3 to 16.5-20.7 million reads. The second was to compare the fold average bin dot plot for RNA-seq of SH-EP-MYCN-ER showing mRNA expression normalized by control per bin using the DESEQ2 (Figure 2A) normalization to TMM in edgeR (Figure 2B) and to quantile normalization (Figure 2C). No major differences were found from the original data or using the different methods, but we updated the Figure 2E in the manuscript to include the deeper sequencing dataset, we also adjusted it to show -/+ MYCN and transformed to log2 to make it more intuitive. Overall, it enhances our original understanding that gene expression remains largely unaffected by TFIIIC5 knockdown.

      (5) On page 7, the authors claim that MYCN-ER increased Ser-2 can reflect MYCN-stimulated transcription elongation. In fact, without kinetic studies, this is not fully supported. Accumulation of Ser-2 RNAPII along a gene can reflect increased initiation of full-speed RNAPs or a pile-up of RNAPs slowing down. This should be resolved or qualified.

      While we agree that we did not collect kinetic data to study the dynamics of RNA polymerase we would argue that the integration of our different data sets make it possible to draw conclusions about dynamic interferences. We showed on the one side that pSer-2 accumulates on the TES and on the other side the induction of MYCN-ER up-regulates gene expression which proves productive transcription elongation.

      (6) pLHiChIP needs to be better described, the Mumbach reference is not sufficient.

      We have reformulated the pLHiChIP in the method section and hope that this will provide now a better description of the method.

      (7) Can the authors recheck all the labels in Figure 2D-I believe there is an error involving + or - MYCN.

      We carefully rechecked all the labels in Figure 2 and it was correct as it was. We understand the confusion that may have created comparing Figure 2D and Figure 2E. To avoid confusion, we updated Figure 2E to show the same direction of Figure 2D. We also log2 transformed the y-axis of Figure 2E to foster a more intuitive reading.

      (8) Why are there different scales for the regions of chromosome 17 shown in Figures 3 and 4? It would be easier to compare if the examples were all shown at the same scale (about 2 MB is shown in another Figure).

      We now show the same region of chromosome 17 in Figure 3 and 4.

      Reviewer #3 (Public Review):

      (1) The connection between the three major findings presented in this study regarding the role of TFIIIC in the regulation of MYCN function remains unclear. Specifically, how the TFIIICdependent restriction of MYCN localization to promoter hubs enhances the association of factors involved in nascent RNA degradation to prevent the accumulation of inactive RNA polymerase II at promoters is not apparent. As they are currently presented, these findings appear as independent observations. Cross-comparison of the different datasets obtained may provide some insight into addressing this question.

      We previously observed that TFIIIC does not affect MYCN recruitment, while MYCN affects TFIIIC binding 1. Moreover, our group reported that MYCN recruits exosome 4 and BRCA1 to promoter-proximal regions 10 to clear out non-functional RNAPII. We are currently reporting that MYCN-TFIIIC complexes exclude non-functional RNAPII. However, MYCN-active promoter hubs have more RNAPII and more transcription than MYCN-active promoter outside hubs. Furthermore, TFIIIC binding occurs upstream of BRCA1 and exosome recruitments as depletion of TFIIIC leads to recruitment decrease of both factors. Therefore, we argue that TFIIIC is required for the proper function of those MYCN-active promoter hubs.

      (2) Another concern involves the disparities in RNA polymerase II ChIP-seq results between this study and earlier ones conducted by the same group. In Figure 2, the authors demonstrate that activation of MYCN results in a reduction of non-phosphorylated RNA polymerase II across all expressed genes. This discovery contradicts prior findings obtained using the same methodology, where it was concluded that the expression of MYCN had no significant effect on the chromatin association of hypo-phosphorylated RNA polymerase II (Buchel et al, 2017). In this regard, the choice of the 8WG16 antibody raises concern, as fluctuations in the signal may be attributed to changes in the phosphorylation levels of the Cterminal domain. It remains unclear why the authors decided against using antibodies targeting the N-terminal domain of RNA polymerase II, which are unaffected by phosphorylation and consistently demonstrated a significant signal reduction upon MYCN activation in their previous studies (Buchel et al, 2017) (Herold et al, 2019). Similarly, the authors previously proposed that depletion of TFIIIC5 abrogates the MYCN-dependent increase of Ser2phosphorylated RNA polymerase II (Buchel et al, 2017), whereas they now show that it has no obvious impact. These aspects need clarification.

      We politely disagree that our discoveries are contradicting each other. Comparing our new results to the data published previously we can summarize that the data sets in the two studies show three key results: First, the traveling ratio of RNAPII changes upon induction of MYCN. Second, RNAPII decreases at the transcription start side and third, it increases towards the end side.

      We agree that in the previous study we linked the traveling ratio directly to elongation. However performing ChIP-seq with different RNAPII antibodies showed us that for example RNAPII (N20), which is unfortunately discontinued, gives different results compared to RNAPII (A10). Combining our new results using the RNAPII (8WG16) antibody shows that the traveling ratio is not only reflecting transcription elongation but also includes that the RNAPII is kicked-off chromatin at the start side.

      In the previous study we only performed manual ChIP experiments for RNAPII (8WG16) and pSer2. Now we did a global analysis which is more meaningful and is also reflected in the RNA sequencing data.

      (3) Finally, the varied techniques employed to explore the role of TFIIIC in MYCNdependent recruitment of nascent RNA degradation factors make it challenging to draw definitive conclusions about which factor is affected and which one is not. While conducting ChIPseq experiments for all factors may be beyond the scope of this manuscript, incorporating proximity ligation assays (PLA) or ChIP-qPCR assays with each factor would have enabled a more direct and comprehensive comparison.

      We understand the criticism that we are comparing different assays. We have performed PLAs with different antibodies. Since the controls of the PLAs were not sufficient for us, we refrain from using them. ChIP-qPCR experiments are much more challenging to do side by side compared to PLAs, which is why we decided against looking at all factors with this method.

      Recommendations For The Authors:

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 2: Why did the authors choose the 8WG16 antibody? Does TFIIIC5 depletion suppress the MYCN-dependent reduction of total RNA polymerase II binding to promoters that they consistently showed in previous studies? Given that phosphorylation of the CTD impacts 8WG16 recognition, including Ser5-phosphorylated RNA polymerase II ChIPseq experiments might clarify this issue.

      We used the RNAPII (8WG16) antibody to exactly map non-phosphorylated RNAPII which shows us the binding of non-functional RNAPII.

      (2) Figures 3 and 4: As it stands, the manuscript does not convincingly establish a functional connection between the results in Figures 2, 3, and 4 or elucidate potential mechanisms. Are changes in RNA polymerase II levels upon MYCN activation more pronounced at promoters located at MYCN hubs? Do changes in MYCN-enriched chromatin contacts upon TFIIIC5 depletion somehow correlate with alterations in RNA polymerase II levels? Performing similar cross-comparisons as in Figure 3C may help address this issue. Furthermore, it not clear how the authors concluded that MYCN/TFIIIC5-bound genes are not part of these so-called promoter hubs.

      In Figure 3C we show that RNAPII levels are more pronounced upon MYCN activation at promoters located at MYCN hubs. Additionally, we show non-phosphorylated ChIP-seq on TSS and RNAPII-pSer2 ChIP-seq on TES density plots for promoters with MYCN interactions in the Figure for the Reviewers (Figure 3). We found no other difference than binding compared to the overall global analysis for all expressed genes showed in Figure 2B and Figure 2C. This goes on the same direction of the high expression observed of those genes in MYCN interactions observed in Figure 3C.

      The changes observed in Figures 2B and 2C are global and do include the promoters with MYCN interactions. At the same time, it is required a higher number of replicates to statistically distinguish the MYCN interaction differences between TFIIIC5 presence and depletion. We acknowledge this limitation, and we therefore restrain any attempt towards this end. We base our conclusions on the other parts of the manuscript and on our previous studies that show that MYCN recruits TFIIIC, BRCA1, and the exosome to promoter proximal regions 1,4,10.

      (3) Figure 5: According to the PLA results, activation of MYCN could enhance RNA polymerase II-NELFE interaction in a TFIIC5-dependent manner. Considering the raised issues regarding the use of the 8WG16 antibody, this result might be of relevance.

      Nevertheless, PLA does not seem to be the optimal technique to address these questions, and I would rather suggest performing ChIP-qPCR experiments for all the factors to be compared. Finally, do the authors conclude that the TFIIIC5 effect on MYCN-dependent changes in RNA polymerase II depends upon the recruitment of EXOSC5 and BRCA1? If so, it would be interesting to determine whether depletion of these factors phenocopies the effects observed with TFIIC5.

      We understand the criticism that we are comparing different assays. We have performed PLAs with different antibodies. Since the controls of the PLAs were not sufficient for us, we refrain from using them.

      (4) In Figure S2 the labels should be EtOH, 4-OHT, and Input.

      We changed this accordingly.

      (5) On page 7, the sentence "We have shown previously that TFIIIC5 depletion does not cause significant changes in expression of multiple tRNA genes that are transcribed by RNAPIII (Buchel et al., 2017)" appears to lack a connection.

      We agree with the reviewer and we deleted this sentence from the manuscript.

      Author response image 1.

      (A) Density plot of ChIP-Rx signal for non-phosphorylated RNAPII. Data show mean (line) ± standard error of the mean (SEM indicated by the shade) of different gene sets based on an RNA-seq of SH-EP-MYCN-ER cells ± 4-OHT. The y-axis shows the number of spike-in normalized reads and it is centered to the TES ± 2 kb. N = number of genes in the gene set defined in the methods. (B) Density plot of ChIP-Rx signal for RNAPII pSer2 as described for panel A. The signal is centered to the TSS ± 2 kb.

      Author response image 2.

      Bin dot plot for RNA-seq of SH-EP-MYCN-ER showing mRNA expression normalized by control per bin comparing the fold average using DESEQ2 (A), normalization to TMM in edgeR (B) and to quantile normalization (C).

      Author response image 3.

      Average density plot of ChIP-Rx signal for non-phosphorylated RNAPII (A) or RNAPII pSer2 (B) at promoters with MYCN interactions.

      References

      (1) Büchel, G., Carstensen, A., Mak, K.-Y., Roeschert, I., Leen, E., Sumara, O., Hofstetter, J., Herold, S., Kalb, J., and Baluapuri, A. (2017). Association with Aurora-A controls NMYC-dependent promoter escape and pause release of RNA polymerase II during the cell cycle. Cell reports 21, 3483-3497.

      (2) Yuen, K.C., Slaughter, B.D., and Gerton, J.L. (2017). Condensin II is anchored by TFIIIC and H3K4me3 in the mammalian genome and supports the expression of active dense gene clusters. Sci Adv 3, e1700191. 10.1126/sciadv.1700191.

      (3) Ferrari, R., de Llobet Cucalon, L.I., Di Vona, C., Le Dilly, F., Vidal, E., Lioutas, A., Oliete, J.Q., Jochem, L., Cutts, E., Dieci, G., et al. (2020). TFIIIC Binding to Alu Elements Controls Gene Expression via Chromatin Looping and Histone Acetylation. Mol Cell 77, 475-487 e411. 10.1016/j.molcel.2019.10.020.

      (4) Papadopoulos, D., Solvie, D., Baluapuri, A., Endres, T., Ha, S.A., Herold, S., Kalb, J., Giansanti, C., Schulein-Volk, C., Ade, C.P., et al. (2021). MYCN recruits the nuclear exosome complex to RNA polymerase II to prevent transcription-replication conflicts. Mol Cell. 10.1016/j.molcel.2021.11.002.

      (5) Lorenzin, F., Benary, U., Baluapuri, A., Walz, S., Jung, L.A., von Eyss, B., Kisker, C., Wolf, J., Eilers, M., and Wolf, E. (2016). Different promoter affinities account for specificity in MYC-dependent gene regulation. Elife 5. 10.7554/eLife.15161.

      (6) Baluapuri, A., Hofstetter, J., Dudvarski Stankovic, N., Endres, T., Bhandare, P., Vos, S.M., Adhikari, B., Schwarz, J.D., Narain, A., Vogt, M., et al. (2019). MYC Recruits SPT5 to RNA Polymerase II to Promote Processive Transcription Elongation. Mol Cell 74, 674-687 e611. 10.1016/j.molcel.2019.02.031.

      (7) Baluapuri, A., Wolf, E., and Eilers, M. (2020). Target gene-independent functions of MYC oncoproteins. Nat Rev Mol Cell Biol. 10.1038/s41580-020-0215-2.

      (8) Koch, H.B., Zhang, R., Verdoodt, B., Bailey, A., Zhang, C.D., Yates, J.R., 3rd, Menssen, A., and Hermeking, H. (2007). Large-scale identification of c-MYCassociated proteins using a combined TAP/MudPIT approach. Cell Cycle 6, 205-217. 10.4161/cc.6.2.3742.

      (9) Ferrari, R., Rivetti, C., Acker, J., and Dieci, G. (2004). Distinct roles of transcription factors TFIIIB and TFIIIC in RNA polymerase III transcription reinitiation. Proc Natl Acad Sci U S A 101, 13442-13447. 10.1073/pnas.0403851101.

      (10) Herold, S., Kalb, J., Büchel, G., Ade, C.P., Baluapuri, A., Xu, J., Koster, J., Solvie, D., Carstensen, A., and Klotz, C. (2019). Recruitment of BRCA1 limits MYCN-driven accumulation of stalled RNA polymerase. Nature 567, 545-549.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Ewing sarcoma is an aggressive pediatric cancer driven by the EWS-FLI oncogene. Ewing sarcoma cells are addicted to this chimeric transcription factor, which represents a strong therapeutic vulnerability. Unfortunately, targeting EWS-FLI has proven to be very difficult, and a better understanding of how this chimeric transcription factor works is critical to achieving this goal. Towards this perspective, the group had previously identified a DBD-𝛼𝛼4 helix (DBD) in FLI that appears to be necessary to mediate EWS-FLI transcriptomic activity. Here, the authors used multi-omic approaches, including CUT&tag, RNAseq, and MicroC to investigate the impact of this DBD domain. Importantly, these experiments were performed in the A673 Ewing sarcoma model where endogenous EWS-FLI was silenced, and EWS-FLI-DBD proficient or deficient isoforms were re-expressed (isogenic context). They found that the DBD domain is key to mediating EWS-FLI cis activity (at msat) and to generating the formation of specific TADs. Furthermore, cells expressing DBD-deficient EWS-FLI display very poor colony-forming capacity, highlighting that targeting this domain may lead to therapeutic perspectives.

      We thank Reviewer 1 for their strong summary of Ewing sarcoma background and accurate description of our experimental approaches and findings.

      Strengths:

      The group has strong expertise in Ewing sarcoma genetics and epigenetics and also in using and analyzing this model (Theisen et al., 2019; Boone et al., 2021; Showpnil et al., 2022).

      We thank the reviewer.  

      They aim at better understanding how EWS-FLI mediated its oncogenic activity, which is critical to eventually identifying novel therapies against this aggressive cancer.

      We are happy to see that our overall aim was also appreciated by Reviewer 1.

      They use the most recent state-of-the-art omics methods to investigate transcriptome, epigenetics, and genome conformation methods. In particular, Micro-C enables achieving up to 1kb resolved 3D chromatin structures, making it possible to investigate a large number of TADs and sub-TADs structures where EWS-FLI1 mediates its oncogenic activity.

      We thank Reviewer 1 for their acknowledgement of our approaches and the resolution achieved with our Micro-C experiments.  

      They performed all their experiments in an Ewing sarcoma genetic background (A673 cells) which circumvents bias from previously reported approaches when working in non-orthologous cell models using similar approaches.

      We agree with the reviewer about the importance of using model systems that accurately capture features of the disease being studied. As we have added an additional cell line in the revision we should note that this second model also represents a Ewing sarcoma genetic background while representing tumors expressing another oncogenic fusion found in this disease. 

      Weaknesses:

      The main weakness comes from the poor reproducibility of Micro-C data . Indeed, it appears that the distances/clustering observed between replicates are typically similar or even larger than between biological conditions. For instance, in Figure 1B, I do not see any clustering when considering DBD1, DBD2, DBD+1, DBD+2.

      Lanes 80-83: "KD replicates clustered together with DBD replicate 1 on both axes and with DBD replicate 2 on the y-axis. DBD+ replicates, on the other hand, clustered away from both KD and DBD replicates. These observations suggest that the global chromatin structure of DBD replicates is more similar to KD than DBD+ replicates."

      When replacing DBD replicate 1 with DBD replicate 2, their statement would not be true anymore.

      Additional replicates to clarify this aspect seem absolutely necessary since those data are paving the way for the entire manuscript.

      These are valid concerns and we thank the reviewers for highlighting this limitation of poor clustering of Micro-C replicates on MDS plot. We account for this variability between different replicates when identifying differentially interacting regions. By using an adjusted p-value < 0.05, we aim to ensure that repeating the experiments we will discover the same differentially interacting regions with a false discovery rate of 5%.

      We also would like to note that the replicates cluster much closely on PCA plot of RNA-seq data (Supplementary Figure 1C) and as well as on PCA plot of H3K27ac CUT&Tag data (Figure 4A). Notably, the RNA-seq result has now reproduced when performed with different sets of hands across multiple studies (Boone, et. al., 2021 and this report), as well as in a second cell line (as reported in this manuscript revision). These observations suggest that the cells of these replicates are functionally similar to each other at a population level. Chromatin organization detected by Micro-C is a highly heterogenous within cells of a population (Misteli, et. al., 2020). Moreover, despite increased resolution with Micro-C over Hi-C, the conventional sequencing depth that Micro-C is performed at makes resolving finer scale 3D interactions, particularly between enhancers and promoters, challenging (Goel, et. al., 2023). Thus biologically relevant interactions driving EWSR1::ETS transcriptional regulation through de novo enhancers may have relatively weak signal in Micro-C. Both the strength of the signal and the heterogeneous chromatin state present in bulk samples could affect the average signal leading to poor clustering replicates (Hafner and Boettiger, 2022). 

      Importantly, rather than add an additional replicate of a single cell line, we repeated our study in an additional cell line, TTC466, and largely reproduced our high-level findings for transcription, enhancer formation, and 3D chromatin. Specific limitations of the TTC466 study are addressed in the Discussion section (392-420). The reproduction of weak/moderate clustering in the MDS plot in both A673 and TTC466 cell lines suggests the α4 helix of EWSR1::ETS fusions are important for reshaping 3D chromatin. However, higher resolution analyses focused on specific EWSR1::ETS-bound loci are likely an important area of future study required to fully understand the role of the α4 helix in chromatin regulation in Ewing sarcoma.

      Similarly:

      - In Figure 1C, how would the result look when comparing DBD2/KD2/DBD+2? Same when comparing DBD 1 with KD1 and DBD+1. Would the difference go in the same direction?

      This is a great point. We added distance decay plots of individual replicates in Supplementary Figure 2 and added discussion of these results in lines 88-89 of the text.

      - Figure 1D-E. How would these plots look like when comparing each replicate to each other's? How much difference would be observed when comparing, for instance, DBD1/DBD2 ? or DBD1/DBD+1?

      Unfortunately, separate replicates are required to conduct Differentially Interacting Region analysis as it determines statistically significant interactions. Therefore, we are unable to plot these analyses with individual replicates. 

      - Figure 2: again, how would these analyses look like when performing the analysis with only DBD1/DBD+1/KD1 or DBD2/DBD+2/KD?

      This is a good suggestion. It is possible to do such analysis. However, we will lose resolution as such that we may not accurately detect TADs, especially smaller TADs. Therefore, we decided to combine the biological replicates.   

      Another major question is the stability of EWS-FLI DBD vs EWS-FLI DBD+ proteins. In the WB, FLAG intensities seem also higher (2/3 replicates) in DBD+ condition compared to the DBD condition (Figure S1B).

      This is a valid concern with shRNA knock-down/rescue system and we regularly validate new constructs to ensure that they have similar expression levels as rescue with the wildtype fusion before proceeding to more exhaustive experimental workups. We would note that while we have not tested for differences in protein stability, for these constructs we largely see similar expression levels across multiple experiments, multiple cell lines, and multiple sets of hands. There may be some variations in expression level from experiment to experiment, but western blotting is a semiquantitative assay and it is also not possible to rule out that slight differences in band intensity may be a result of error in gel loading. For this reason, alongside western blotting for construct expression, we also validate construct function using RNA-seq and colony formation assays (as reported in this manuscript) and these show good agreement across biological replicates.  

      Indeed, it seems that they have more FLAG (i.e., EWS-FLI) peaks in the DBD+ condition compared to the DBD condition (Figure 2B). 

      We appreciate the comment since the legend of Figure 2B led to a misunderstanding. Figure 2B depicts the number of TADs detected in DBD and DBD+ conditions (height of the bar graphs) and the proportion of those TADs overlapped with FLAG, CTCF, both or neither peaks on y-axis. The number of FLAG peaks is actually lower in DBD+ as compared to DBD as shown in Figure 5A-B.  We clarified our Figure 2 legend to accurately describe the various proportions (color coded section) of TADs bound by DBD/DBD+ FLAG and CTCF.

      Would it be possible that DBD+ is just more expressed or more stable than DBD? The higher stability of the re-expressed DBD+ could also partially explain their results independently of the 3D conformational change. In other words, can they exclude that DBD+ and DBD binding are not related to their respective protein stability or their global re-expression levels?

      It is possible that DBD+ protein is overexpressed or more stable than DBD. With our current set of data, we cannot conclusively exclude if binding by DBD and DBD+ are not related to their expression level or stability. We would note, as above, that western blots, RNA-seq, and agar assays have largely reproduced across experiments, hands, and cell lines and that western blot is an imperfect assay for assessing protein stability.

      Surprisingly, WB FLI bands in DBD+ conditions are systematically (3/3 replicates) fainter than in DBD conditions (Figure S1B). How do the authors explain these opposite results between FLI and FALG in the WB?

      This is an excellent observation that highlights one of the intricacies of studying EWSR1::FLI1 in our KD/rescue system. Often the limiting factor for an experiment is whether or not the KD condition maintains KD through a second viral transduction for rescue and selection. We have observed over many years of working with this system that rescue conditions which are fully functional (i.e. wildtype EWSR1::FLI1, DBD+, etc.) tend to maintain better KD of endogenous EWSR1::FLI1. Constructs that don’t rescue EWSR1::FLI1 function sometimes maintain KD to a lesser degree, though frequently to a functional degree (i.e. cells are not transformed and EWSR1::FLI1 transcriptional regulation is not rescued). We suspect this observation, also raised by Reviewer 1 is resulted from a potential selection of cells with more endogenous EWSR1::FLI1 escaping KD in in DBD conditions due to selective pressures during expansion in tissue culture.

      We should note that the antibody used for detecting FLI recognizes residues that are deleted in

      DBD and DBD+ constructs, such that the FLI1 blot in Supplementary Figure 1B does not detect either construct. It only detects endogenous EWSR1::FLI1 and the 3X-FLAG-EWSR1::FLI1 construct in the middle lane that runs at a slightly higher molecular weight. The FLAG antibody is the only antibody that detects all three rescue constructs.    

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Bayanjargal et al. entitled "The DBD-alpha4 helix of EWS::FLI is required for GGAA microsatellite binding that underlies genome regulation in Ewing sarcoma" reports on the critical role of a small alpha helix in the DNA binding domain (DBD) of the FLI1 portion of EWS::FLI1 that is critical for binding to repetitive stretches of GGAA-motifs, i.e. GGAA microsatellites, which serve as potent neoenhancers in Ewing sarcoma.

      We thank Reviewer 2 for their succinct and accurate summary of our manuscript. 

      Strengths:

      The paper is generally well-written, and easy to follow and the data presented are of high quality, welldescribed and underpin the conclusions of the authors. The report sheds new light on how EWS::FLI1 mechanistically binds to and activates GGAA microsatellite enhancers, which is of importance to the field.

      We appreciate the reviewer’s assessment of our work. 

      Weaknesses:

      While there are no major weaknesses in this paper, there are a few minor issues that the authors may wish to address before publication:

      (1) While the official protein symbol for the gene EWSR1 is indeed EWS, the protein symbol for the gene FLI1 is identical, i.e. FLI1. The authors nominate the fusion oncoprotein EWS::FLI1 (even in the title) but it appears more adequate to use EWS::FLI1.

      We appreciate the reviewer for bringing this to our attention. Indeed, the most recent guideline for fusion proteins nomenclature is to use the full gene symbols separated by double colons. Therefore, the accurate nomenclature is EWSR1::FLI1. We replaced instances of EWS::FLI with EWSR1::FLI1 and have used the EWSR1::ERG nomenclature in our revised manuscript.  

      (2) The used cell lines should be spelled according to their official nomenclature (e.g. A-673 instead of A673).

      Corrected, thanks!

      (3) It appears as if the vast majority of results were generated in a single Ewing sarcoma cell line (A-673) which is an atypical Ewing sarcoma cell line harboring an activating BRAF mutation and may be genomically quite unstable as compared to other Ewing sarcoma cell lines (Kasan et al. 2023 preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2023.11.20.567802v1). Hence, it may be supportive for the paper to recapitulate/cross-validate a few key results in other Ewing sarcoma cell lines, e.g. by using EWS::ERG-positive cell lines. Perhaps the authors could make use of available published data.

      We thank Reviewer 2 for this helpful comment. We replicated the experiments in TTC-466 cells containing EWSR1::ERG fusion and found that as for A-673 cells the DBD-α4 helix is important for transcriptional, enhancer, and 3D chromatin regulation (Supplementary Figures 9-18).  

      (4) Figure 6 and Supplementary Figure 5 are very interesting but focus on two selected target genes of the fusion (FCGRT and CCND1). It would be interesting to see whether these findings also extend to common EWS::ETS transcriptional signatures that have been reported. The authors could explore their data and map established consensus EWS::ETS signatures to investigate which other hubs might be affected at relevant target genes.

      We expanded our analysis to other genes demonstrated to be regulated by EWSR1::FLI1 nucleated transcriptional hubs (Chong, et. al., 2018) and included NKX2-2 and GSTM4 gene regions in

      Supplementary Figure 7-8 in A-673 cells. We also investigated the same gene regions of FCGRT, CCND1, NKX2-2, GSTM4 in TTC466 cells and report them in Supplementary Figures 14-17. For the purpose brevity, we decided to include the above examples. We may need to develop different tools to conduct further analysis to understand the gene regulatory networks driven by DBD and DBD+ in relation to hub formation. Although it is a great suggestion to map such network, this may be outside the scope of this manuscript. We thank the reviewer for bringing such a good point to our attention.  

      (5) Table 1 is a bit hard to read. In my opinion, it is not necessary to display P-values with up to 8 decimal positions. The gene symbols should be displayed in italic font.

      Suggestions are adapted, thanks!

      Reviewing Editor (Recommendations For The Authors):

      We would draw the authors' attention to the following issues that would best benefit from additional revision.

      As indicated by Referee 1, an important issue concerns the apparent poor reproducibility of Micro-C data. In Figure 1B, the clustering of the DBD1, DBD2, DBD+1, and DBD+2 is poor.

      It appears that the distances/clustering observed between replicates are typically similar or even larger than between biological conditions. Lines 80-83: "KD replicates clustered together with DBD replicate 1 on both axes and with DBD replicate 2 on the y-axis. DBD+ replicates, on the other hand, clustered away from both KD and DBD replicates. If one replaced DBD replicate 1 with DBD replicate 2, this statement would no longer be true. The referees believe that it is important to fully account for these potential discrepancies. Most of the study is based on analyses of these data sets, so if there are issues with them it has repercussions on the entire study. We note however that in Figure 4A the clustering of the H3K27ac data is much more convincing. The referees also feel that it is important to show immunoblots of the expression of DBD and DBD+ levels in the experiments performed here. While this was previously shown in the Boone et al publication in 2021, it could be illustrated again here.

      We thank the editors for concisely summarizing the main weaknesses of the paper and underscoring the importance of the Micro-C data in the rest of the paper. While the Editors note tighter clustering of the H3K27ac (Figure 4A), we would like to note that the replicates cluster much closely on PCA plot of RNA-seq data (Supplementary Figure 1C). Notably, the RNA-seq result has now reproduced when performed with different sets of hands across multiple studies (Boone, et. al., 2021 and this report), as well as in a second cell line (as reported in this manuscript revision). Though not as tight, the H3K27ac CUT&Tag also reproduces in TTC466 cells. Thus, we interpret these findings to indicate that our replicates are functionally similar to each other. As discussed above in the response to Reviewer 1 in more detail, there are several factors that could affect how these functional similarities are represented in Micro-C data. Micro-C is ultimately a readout of the chromatin organization in a heterogeneous population of cells (Misteli et al., 2020). Additionally, sequencing depth limitations in conventional Micro-C experiments limit the ability to faithfully assess the enhancer-promoter interactions that may be relevant for our model system (Goel, et. al., 2023). Thus, both the strength of the biologically relevant signal and the heterogeneous chromatin state present in bulk samples could affect the average signal and lead to poorly clustering replicates (Hafner and Boettiger, 2022). 

      To address these important concerns about rigor and reproducibility of the analyses, we repeated our study in an additional cell line, TTC466, and largely reproduced our high-level findings for transcription, enhancer formation, and 3D chromatin. These additional studies were not without their own limitations and these are addressed in the Discussion section (392-420). The reproduction of weak/moderate clustering in the MDS plot in both A673 and TTC466 cell lines suggests the α4 helix of EWSR1::ETS fusions are important for reshaping 3D chromatin. However, additional genomic analyses geared toward higher resolution at specific EWSR1::ETS-bound loci are likely an important area of future study required to fully understand the role of the α4 helix in chromatin regulation in Ewing sarcoma. Live cell imaging, as performed by Chong, et. al., 2018 and additional biochemical techniques may also be informative and are beyond the scope of this report.

      With regards to concerns about construct expression, we have included immunoblots of the rescue constructs in both cell lines (Supplementary Figure 1B and 9A) and discussed Reviewer 1’s specific concerns in detail above.  

      The referees also raise the issue of using an additional cell line to make a more general message. Although it would perhaps be asking too much to repeat the MicroC experiments, consolidation of the observations could be performed by focusing on specific loci such as FCGRT and CCND1 that were analyzed in this study. Could the authors use 4C-type experiments to reproduce the conclusions in an additional cell line? It would also be pertinent to consolidate the findings at these loci by 4C-type approaches even in the cell line used here. For the moment, all conclusions are based on the same set of data and a single technical approach.

      We repeated the experiments in TTC466 cells and analyzed the data using same cut-offs used in A-673 cells. This allows us to compare between the two cell lines. We hope this new set of experiments and analyses address the reviewers’ concerns.  

      Reviewer #1 (Recommendations For The Authors):

      All the data are performed in A673 cells. Knowing the transcriptomic and epigenetic heterogeneity of Ewing sarcoma cells, some of the experiments supporting their findings should be replicated in at least another Ewing sarcoma model.

      Per our discussion above, we have replicated our experiments in an additional cell line model of Ewing sarcoma. Importantly, the TTC466 cell line used expresses the EWSR1::ERG fusion found in 10-15% of Ewing sarcoma cases.  

      Supplementary Figure 2B. Proportion of TAD boundaries bound by FLAG (i.e., EWS-FLI1) and CTCF. The number/proportion of FLAG (i.e., EWS-FLI) peaks observed at CTCF peak/TAD boundaries seems unexpectedly high. How do they explain this result since EWS-FLI peaks are rather intra-TAD to mediate their enhancer function?

      In our previous study, we showed that EWSR1::FLI1 binding can be detected at boundaries of TADs (Showpnil, et. al., 2022). We think therefore it is likely that EWSR1::FLI1 binding is able to mediate enhancer function both inside TADs as well as at the borders of TADs and may, in some cases, function as an insulator between TADs.  

      For the >50kb loop analysis, what was the low-range threshold? Up to 15-20 kp, contact frequency interactions may be caused by PFA crosslink (did they use a 5kb threshold ?). Were those excluded from that analysis?

      We acknowledge that we did not use a lower threshold to exclude those short-range loop interactions. In our previous study, we observed that EWSR1::FLI1 binding reduces long-range interactions in favor of short-range interactions (Showpnil, et. al., 2022) and wanted to be able to capture short-range loops in our analysis.  

      In Figure 2D, they observed that within TADs containing FLAG peaks at GGAA microsatellites, the intensity of the DBD+ FLAG peaks was higher compared to DBD FLAG peaks. How would this analysis look when considering the ETS FLAG peaks (i.e., EWS-FLI rather repressive peaks)? Could they compare TAD with GGAA msat vs TAD with ETS peaks?

      We agree that this is an interesting observation. In our prior analyses we found no discernible relationship between EWSR1::FLI1 binding and changes in 3D chromatin associated with repression (Showpnil, et. al., Nucleic Acids Research, 2022). In contrast, EWSR1::FLI1-bound superenhancers had greater H3K27ac deposition when overlapping both a bound GGAA repeat and a non-microsatellite site. While there have been several additional reports about the relevance of EWSR1::FLI1 binding at nonmicrosatellite peaks, motifs at these loci have not yet been rigorously defined as GGAA repeats were by Johnson, et. al. in PLoS One, 2017. Each ETS factor binds different motifs containing the core 5’-GGAA-3’ with varying affinities depending on the flanking residues. There may be >100-fold difference in sequence-specific binding affinity for “high” vs. “low” affinity motifs. Better defining the types of ETS motifs bound by EWSR1::FLI1 and the functional changes associated with them thus represents an interesting area of future study.

      Figure 1F: What is the biological meaning of these results (29.7, 39.5, and 54Mbp)? These distances are typically the size of a chromosome arm and clearly beyond classical chromatin loop/TAD structures in which EWS-FLI mediates its cis-activity.

      We agree with referee here. This panel is now removed in our revised manuscript.  

      How do DBD, KD, and DBD+ conditions compare with WT parental cells in the omics data? (Figures 1B, 4A). Do DBD+ conditions overlap with WT conditions? It would be nice to have these analyses also for Micro-C and Cut&Tag data. To be acknowledged here, the transcriptome data showing this aspect in Figure S1C are very convincing.

      This is a fair point. We were not able to obtain similar sequencing depth of wtEF Micro-C libraries to that of KD, DBD and DBD+ due to disproportional use of wtEF libraries in troubleshooting. Therefore, we decided to exclude wtEF condition from these analysis. 

      EWS-FLI cis-regulation at CCND1 also occurs through a much closer EWS-FLI peak (~-20kb msat upstream of CCND1 TSS) which was not taken into consideration. EWS-FLI peak intensity in both DBD and DBD+ at this msta seems similar. How would this fit into their model?

      The referee is correct. The closest peak upstream of CCND1 TSS is about ~19kb away. We highlighted this peak with the dashed boxes near the CCND1 TSS (Supplementary Figure 6). Peak intensity of DBD+ FLAG is slightly higher compared to DBD. Nonetheless, we acknowledge that the difference is small. We suspect that the DBD-α4 helix is affecting binding dynamics at GGAA repeats, but these genomics approaches are not well suited to detect small, but significant, changes in binding affinity or dynamics. In this case a more biochemical approach may be needed. Even though, both protein can still bind the same microsatellites, it is possible that they might differ in their stability of binding or in the recruitment of additional proteins. These possibilities are discussed in the Discussion section (444-463).  

      For the Micro-C, they sequenced only 7 to 8 million reads per condition. This coverage seems particularly low, especially for their analyses using 1-5kb bins. How does this compare with other published Micro-C data? Can this explain the variability observed between replicates?

      We apologize for the inconsistent verbiage of sequencing coverage that may have caused confusion. 7 to 8 million reads were used for shallow sequencing and QC analysis. Once a sample passed QC, we then sequenced 300 million reads per sample. 300M is now changed to 300 million to prevent a misunderstanding at line 598.  

      They mention:

      "In our recent studies of EWS::FLI, we found a small alpha helix in the DNA binding domain DBD-𝛼𝛼4, to

      be required for transcription and regulation by the fusion protein (Boone et al., 2021). Interestingly, this study did not find any change in chromatin accessibility (ATAC-Seq) and genome localization of EWS::FLI constructs (CUT&RUN) when DBD-𝛼𝛼4 helix was deleted leaving the mechanistic basis for the requirement of DBD-𝛼𝛼4 in transcription regulation unclear. "

      And

      "To assay the enhancer landscape, we collected H3K27ac CUT&Tag data from KD, DBD, and DBD+ cells. Principal component analysis of H3K27ac localization shows that the DBD replicates were clustered closer to the KD replicates while being in between the KD and the DBD+ replicates (Figure 4A), suggesting that DBD-𝛼𝛼4 helix is required to reshape the enhancer landscape."

      But now H3K27ac CUT&Tag show strong differences which were not observed in ATAC seq. How to explain this discrepancy?

      Though both H3K27ac and ATAC signal are associated with enhancers and promoters in euchromatin, they are not exactly measurements of the same thing. H3K4me2 is a mark more closely associated with ATAC signal than H3K27ac (Henikoff, et. al., 2020). Nonetheless, there are clear differences between the prior publication (Boone, et. al., 2021) and this work with regards to similar ATAC signal for each replicate and differences in H3K27ac. We suspect this may be related to a tighter association between H3K27ac and EWSR1::FLI1-mediated genome regulation and ATAC. Notably, there were very few differentially accessible regions between EWSR1::FLI1-depleted cells and conditions with EWSR1::FLI1 expression (either endogenous or wildtype rescue) using the A673 KD/Rescue system in Boone, et. al., 2021. In contrast, other A673 KD-rescue studies have reported differences in H3K27ac in EWSR1::FLI1 expressing conditions relative to EWSR1::FLI1-depleted conditions (Theisen, et. al., 2021). .  

      The authors mention:

      "Our study thus uncovered a surprising role for FLI DBD in the process of hub formation which is usually attributed to the EWS low complexity domain."

      Not sure this can be claimed, hubs are composed of many other factors that are not investigated here. Furthermore, promoter enhancer hubs/loops often include combined ETS and mSat chains to generate transcriptional hubs which have not been considered here. None of these points were discussed here.

      We replaced “uncovered” with “suggest” in our revised manuscript at line 476.  

      What are the barcode patterns in Supp 5, are those frequently observed in their Micro-C data, likely mapping artifacts, do they have any impact on their analyses?

      The barcode patterns in now Supplementary Figure 6 are blind spots in the hg19 genome assembly. Since they are few in numbers, we don’t expect these blind spots to impact our analysis.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02516

      Corresponding author(s): Christopher Shoemaker

      __1. __General Statements [optional]

      Thank you to all the reviewers for their helpful efforts on behalf of our manuscript. We appreciate the time and effort they have invested in providing valuable feedback.

      Overall, the positive reception from our reviewers highlighted their appreciation for our approach and findings. Moreover, their comments underscored the relevance and potential impact of our findings, particularly within the fields of autophagy and protein interaction networks. Their detailed and constructive critiques will also help refine both the content and presentation of our work.

      In response to the reviews, we have proposed targeted revisions to the manuscript, all of which are well within our lab's capabilities and can be executed efficiently. We have detailed our responses to each specific point raised by the reviewers below. * *

      • *

      __2. __Description of the planned revisions

      • *

      Reviewer #1

      Evidence, reproducibility and clarity

      1. EVIDENCE, REPRODUCIBILITY AND CLARITY Summary:

      Selective autophagy receptors (SARs) of the Sequestosome-1 like receptor group (SLRs) including SQSTM1(Sequestosome-1)/p62, NBR1, TAX1BP1, NDP52, CALCOCO1 and Optineurin are soluble SARs that engage cargo and ATG8 family proteins as well as components of the core autophagy machinery like FIP200/RBCC1 to bring about the autophagic degradation of the cargo and themselves. In the autophagic degradation of protein aggregates (aggrephagy) the most studied SAR p62 collaborates with the archetypal autophagy receptor NBR1 and also TAX1BP1 to bring about effective turnover of ubiquitinated cargos sequestered into p62 bodies or droplets by liquid-liquid phase separation. How this intricate co-operation of these SARs is orchestrated is incompletely understood. In the paper by North et al entitled "The LC3-interacting region of NBR1 is a protein interaction hub enabling optimal flux" the authors use peptide arrays to map the binding sites for ATG8-family proteins LC3A and GABARAPL1, FIP200 and TAX1BP1 to the autophagy receptor NBR1. The authors find that three short linear interaction motifs (SLiMs), the LIR, FIR and TIR interacting with ATG8 family proteins, FIP200 and TAX1BP1, respectively, partly overlap in a short region of NBR1 that can adopt different conformations to accommodate the different binding partners. In short, the different interactions are mediated by distinct overlapping determinants, rather than a single, convergent, SLiM. While the important binding determinants for ATG8 proteins and FIP200 show more overlap and it was not possible here to find mutations that distinguish LIR and FIR binding, TAX1BP1 bound more to a region downstream of the LIR and a specific mutation in NBR1 and in TAX1BP1 could abolish binding. Checking the role of phosphorylations in augmenting binding using phosphomimetic mutations it was seen that while FIP200 and Atg8-family binding were generally augmented by phosphorylation, TAX1BP1 binding did not respond to these mutations. Very interestingly, the authors found that co-expression of TAX1BP1 with tandem-tagged NBR1 in pentaKO cells (not expressing the SLRs p62, NBR1, NDP52, TAX1BP1 and OPTN) increased significantly the autophagic turnover of NBR1. None of the other SLRs could do this. Instead, this over-expression assay revealed a competition.

      Major points:

      1) In Fig 4 the peptide array binding assay is not sufficient as it is only semiquantitative. The data shown should be accompanied by a more direct binding assay allowing the determination of kDs for the binding where the WT peptides are directly compared to the phosphor mimicking mutant peptides. Here the fluorescence anisotropy assay the authors use in Suppl Fig. 1E or ITC, OctetRed96 or another assay suitable for kD determinations should be used.

      Response: Thank you for the constructive comments regarding our peptide array binding assay. We agree that the semi-quantitative nature of this method limits its ability to provide detailed binding affinity measurements. To address this, we will purify multiple peptides and assess the binding affinities between phosphomimetic+/- LIR peptides and Atg8s, FIP200, and TAX1BP1. While testing all peptides may be cost and time prohibitive, we will prioritize a representative range for this validation effort.

      2) As this paper is already dominated by the use of peptides it would significantly enhance the quality of the data if the authors had included studied with peptides phosphorylated at the specific positions to allow comparison with the phosphomimetic substitutions to aspartate.

      Response: Thank you for your insightful comment. We agree that incorporating studies with peptides phosphorylated at specific positions could provide a more nuanced comparison with the phosphomimetic substitutions to aspartate. Previous studies, including Popelka and Klionsky (2022) and Kliche et al. (2022), have indeed suggested that phosphomimetic substitutions do not perfectly replicate phosphorylation events.

      In response, we plan to order a peptide array containing phosphorylated peptides, not merely phosphomimetics, and will conduct additional experiments with TAX1BP1, FIP200, and LC3A. This approach will allow us to directly assess the effects of actual phosphorylation compared to phosphomimetic substitutions.

      While we acknowledge the possibility of subtle differences in binding affinity or regulatory interactions, we anticipate that the primary conclusions of our study—namely, that TAX1BP1 is largely insensitive to phosphorylation, whereas FIP200 and LC3A binding activities are affected—will remain unchanged. These experiments will provide valuable data to confirm the robustness of our conclusions under the conditions of true phosphorylation.

      3) The quality of the 2D peptide array probing of GST-LC3A binding in Fig 3A is poor. Is this a stripped and re-probed membrane? I do not think these data are publication quality and the experiment should be redone unless the authors have very good arguments against my suggestion. It would also be nice to see a 2D peptide array of GABARAPL1 binding too to make the comparative study complete.

      Response: Thank you for your constructive feedback regarding the quality of the 2D peptide array probing of GST-LC3A in Figure 3A. As you rightly pointed out, the membrane was indeed stripped and reprobed, with LC3A being the final probe. This method sometimes introduces artifacts, such as the 'ring' effect observed, which are common with this technique. However, the results consistently aligned with established consensus sequences for LC3, reinforcing the reliability of our findings despite the suboptimal image quality.

      Recognizing the concerns about the quality of the blot, we are prepared to repeat this experiment using a new commercial vendor, as our previous collaborator is no longer available. We anticipate some differences in the appearance of the blots due to changes in dot size and spacing from the new supplier. Given these variations, we propose adding the revised blot to the supplementary materials rather than the main figures to avoid disrupting the visual continuity of the data presentation.

      Additionally, in response to the reviewer’s suggestion, we will include a 2D peptide array probing for GABARAPL1. This will enhance the comparative analysis within our study.

      One alternative (related to Reviewer 3, comment 3) that we can deliver is using our LIR arrays to derive consensus sequences for LC3 binders and GABARAPL1 binders. In doing this, we find the same differences in LC3 and GABARAP binding preferences that were reported previously in Rogov et al 2017. Recovering these known, and somewhat subtle, differences in binding preference further bolster the validity of our approach.

      4) For the data shown in Fig 6 it should be noted that although these are very interesting results a clear limitation of the study is that the results on the autophagic turnover is based on overexpressing the SLRs in the pentaKO cells. In a physiological setting with all relevant actors in place and with a different stoichiometry the effects could likely be different.

      Response: We appreciate the observation regarding the limitations of our study due to the use of overexpressed SLRs in pentaKO cells. As the reviewer rightly points out, the stoichiometry and interaction dynamics in a physiological setting might differ significantly. Critically, after submission of this manuscript, a recent preprint by Sascha Martens’ group (Bauer et al. BioRxiv) has shown similar results using endogenously tagged p62, TAX1BP1, and NBR1. This study corroborates our results, suggesting that the interactions we observed are not merely artifacts of overexpression but reflect genuine biological phenomena. We will incorporate a detailed discussion of this study in the Discussion section of our manuscript to contextualize our findings within a more physiologically relevant framework.

      Therefore, we believe that our reductionist approach, while not fully reflective of physiological conditions, offers valuable and generalizable insights into the intricate cooperation of SARs in autophagy.

      Minor points:

      1) It would be beneficial for the reader to show a cartoon of the domain organization of both TAX1BP1 and NBR1 in Figure 1. NBR1 is shown in supplemental figure 1, but there is no depiction of the domain organization of TAX1BP1.

      Response: As suggested, a domain schematic for NBR1 and TAX1BP1 will be included.

      2) The authors say at the bottom of page 4 "Complementary in vivo studies reveal that while SLRs typically compete". But do they actually typically compete? Is this not a result of the experimental strategies employed? There is more a shortage of SLRs based on cargo competition as shown recently by Peter Kim's group that excessive pexophagy may reduce mitophagy etc. (Germain et al. 2023).

      Response: Thank you for pointing out this overstatement. We will soften this statement.

      3) In Fig. 3D it should be shown that D, E, A and V are preferred residues at position +1 for LC3A binding.

      Response: As suggested, we will amend the figure to include these residues at the +1 position.

      4) In such a 2D mutational analysis it is often just as important to determine which residues are not allowed for binding. It would therefore be nice if the authors could summarize/visualize their results in a better way in Fig 3D to also show the residues that lead to loss of binding. These could be shown below the sequence and the use of color to distinguish basic, acidic, hydrophobic and aromatic residues could be attempted.

      Response: As suggested, we will add to this figure to make it more comprehensive by including residues that are both preferred and lead to loss of binding. Furthermore, we have incorporated the use of color to distinguish the traits of different residues (basic, acidic, hydrophobic and aromatic) that are dis(favored) at each position.

      5) Line 327: To be clear about the fact that this is an overexpression assay "simultaneous expression" should be corrected to simultaneous overexpression".

      Response: We will make the suggested change.

      6) There are LIRs and FIRs that overlap and those that do not. To check the degree of overlaps that may occur among known LIRs the authors made a peptide array with 100 established LIR sequences taken from the LIR-Central database (Chatzichristofi et al., 2023). The peptide array was probed with LC3A (29 bound), GABARAPL1 (49 bound), the FIP200 Claw domain (57 bound) and the TAX1BP1 CC2 domain (49 bound). As much as one third (32) of the LIR peptides were not bound by any of the four probes. Do the authors have a good explanation for the fact that so many peptides did not bind?

      Response: Thank you for highlighting the significant number of LIR peptides that did not bind to any of the probes in our study. At first, we were similarly surprised by this. In our manuscript, we will expand on several factors that might explain this observation:

      • Specificity of Atg8 Family Proteins: The LIR-Central database indicates that these sequences bind at least one Atg8-family protein, but not necessarily all. Our assay might not have included the specific Atg8 proteins that some LIRs preferentially bind.
      • Peptide Solubility and Conformation: The solubility and conformational stability of peptides printed on an array can vary, affecting binding efficiency. Certain sequences may not adopt the optimal conformation for binding under these assay conditions.
      • Sequence Context and Accessibility: The native context in which the LIR motif is contained, including neighboring amino acids, can influence binding. Peptide arrays strip these peptides of their physiological context. As short linear interaction motifs, the assumption is that context will not strongly affect binding, but it’s known that many LIRs adopt partially structured motifs that influence binding (e.g. a C-terminal helix). Our peptide array approach is likely to impede such secondary structures from forming and may limit binding.
      • Misannotated sequences. The LIRs included from the database have varying levels of validation. Some sequences might be misannotated and, therefore, do not bind any of the probes. These discussion points will be included in the manuscript to provide a comprehensive explanation for the observed data.

      7) Strangely enough, the NBR1 peptide used in Figure 2A did not bind any of the probes while the NBR1 peptides used in Fig. 1C bound very well. Do the authors have any explanation for this?

      Response: Thank you for noting the discrepancy in NBR1 peptide binding observed in Figure 2A compared to Figure 1C. This observation was noted by all reviewers. The difference likely arises from the solubility issues associated with the NBR1 peptide in the format used for Figure 2A, where the peptide sequence included the LIR motif plus 10 amino acids on each side. The core LIR sequence of NBR1 (YIII) is highly hydrophobic, which can affect its solubility and, consequently, its observed binding in our peptide array.

      To overcome this, we optimized the LIR sequence of NBR1 for peptide arrays (amino acids 725-749), which includes seven residues before the LIR and 14 residues after. This shift enhanced solubility and facilitated more reliable probing in our experiments (notably Fig 3). In Fig2A and other assays, both the standard and the optimized formats of the NBR1 LIR were included: the standard format to maintain consistency with other LIRs extracted from the LIR-Central database and the optimized version as a control to validate our results.

      We will detail this explanation in the manuscript, clarifying the rationale behind the observed binding differences.


      Significance

      SIGNIFICANCE

      I found this paper very interesting to read with a lot of interesting new detailed and useful information on binding specificity for the proteins and motifs involved. It is a generally well performed study with interesting results. I also very much enjoyed the Discussion section which opens up for several interesting possible scenarios. The study also produced important point mutants that can be used in future studies to selectively abolish TAX1BP1 binding to NBR1. I think this is a "must read" paper for researchers interested in selective autophagy and co-operation between SARs, and more generally for getting some insight into how SLiMs may work. As such, this paper will be of interest for all interested in autophagy research and for a wider audience too as it is in essence about how overlapping SLiMs may be employed to orchestrate multiple protein-protein interactions using distinct overlapping determinants, rather than a single, convergent, SLiM. It is also one of the very few papers I have come across exploiting the power of the peptide array method so extensively with success for mapping protein binding sites.

      It could perhaps be interesting if the authors discussed their results in relation to another study from the group of Sascha Martens on the role of TAX1BP1 in p62 bodies or condensates (doi: https://doi.org/10.1101/2024.05.17.594671). These two papers should be read together as they are both very interesting and important contributions.

      Response: Thank you for pointing out this important reference that was posted shortly after our manuscript was submitted. As mentioned above, we will include an expanded discussion section to discuss these corroborating findings. We will also include a citation to Ferrari et al (PMID: ) on Tau evasion of autophagy through exclusion of TAX1BP1.

      Reviewer #2

      Evidence, reproducibility and clarity

      Summary In this manuscript, North et al. examined how short linear interaction motifs (SLiMs) help to orchester selective autophagy receptors (SARs) function during cargo engulfment in autophagosomes. In particular, the authors focused on NBR1 as a model SAR to address the role of its role in the clearance of protein aggregates (aggrephagy). Using binding assays, the authors showed that a SLiM harboring NBR1's LIR motif also mediates binding to FIP200 and TAX1BP1. Intrigued by these overlapping binding sites, the authors probed 100 LIRs for their binding to TAX1BP1's coiled-coil 2 region (CC2), FIP200's claw domain and two different ATG8 family members and found heterogenous binding pattern and distinct correlation between these four binding partners. Using mutational peptide arrays of NBR1's SLiM, the authors revealed unique binding determinants of these NBR1 partners and their potential differential regulation by phosphorylation. Taking advantage of their new NBR1 binding insights, the authors structurally modeled the binding of TAX1BP1's CC2 to NBR1's SLiM and identified crucial residues in both proteins for this interaction. Lastly, the authors turned to autophagy flux assays in cells and showed that TAX1BP1 acts synergistically with NBR1 to increase its lysosomal delivery. Overall, the claims and the conclusions are largely supported by the data. However, a few critical issues should be addressed.

      Are the data and the methods presented in such a way that they can be reproduced?

      Are the experiments adequately replicated and statistical analysis adequate?

      Major comments

      1) What are the expression levels of the different tf-SAR fusions compared to the endogenous levels of the respective SAR? And are tf-NBR1 protein levels changed upon co-expression of the other SARs?

      __Response: __We appreciate the questions concerning the expression levels of tf-SAR fusions relative to the endogenous levels of the respective SARs, similar to inquiries from Reviewer 1 (major comment 4). In our study, the levels of tf-NBR1 are notably higher than the endogenous levels. Interestingly, we observed that the co-expression of autophagy-competent NBR1 and TAX1BP1 generally leads to a decrease in the levels of both proteins, likely due to enhanced autophagic turnover. This pattern is not seen with autophagy-deficient mutants, suggesting a functional interaction affecting protein stability.

      Furthermore, a recent preprint by Sascha Martens’ group (Bauer et al., BioRxiv) has presented findings that echo our results using endogenously tagged versions of p62, TAX1BP1, and NBR1. This study supports our observations, indicating that the interactions and effects we report are not artifacts of overexpression but are reflective of genuine biological processes. These findings will be thoroughly discussed in the Discussion section of our manuscript to provide context for our results within a physiologically relevant framework.

      Therefore, we believe that our reductionist approach, while not fully reflective of physiological conditions, offers valuable and generalizable insights into the intricate cooperation of SARs in autophagy.

      2) Which of the 100 LIRs have been shown to specifically bind LC3A or GABARAPL1? The authors should include this information from the literature in Figure 2 (e.g., highlighted by color or else).

      __Response: __Thank you for your suggestion to detail the specific interactions between the 100 LIRs and Atg8 homologs like LC3A and GABARAPL1 in Figure 2. While each LIR in the LIR-Central database has been validated, detailed information on which LIRs bind specific Atg8 homologs—and with what relative affinity—is often lacking in the literature. This gap makes it challenging to present comprehensive binding preferences in a visually coherent way within Figure 2.

      Nevertheless, we recognize the value of such information. We plan to conduct a thorough literature review on all 100 LIRs included in our study. Should we find sufficient and reliable data regarding binding specificities, we will incorporate this into Figure 2, potentially using color coding or another method to highlight these relationships clearly.

      We can also perform the reciprocal experiment by using our LIR arrays to derive consensus sequences for LC3 binders and GABARAPL1 binders. In doing this, we find the same differences in LC3 and GABARAP preferences that were reported previously in Rogov et al 2017. Recovering these known, and somewhat subtle, differences in binding preference further bolster the validity of our approach. These new data will be added to the manuscript.


      3) How effective is the stripping of the peptide array? The authors should provide evidence that there is no carry over binding from sequential probing the array. As a control, the authors should at least repeat probing for the last binder in their sequential binding assay with a new peptide array that has not yet been incubated with a different binder and then stripped.

      __Response: __This is an important question, related to Reviewer 1 (comment 3), as the stripping of the peptide array can be variably affective. Prior to performing any of the arrays included in this manuscript, we did several validation arrays to identify the proper ordering of probes (e.g. what proteins can be stripped, which cannot). FIP200 and TAX1BP1 probing was performed on fresh or successfully stripped blots. LC3A probing was done last, as there is substantial previous literature defining the LC3 motif. However, the results of the LC3A binding consistently aligned with established consensus sequences for LC3, reinforcing the reliability of our findings despite the stripping process. Therefore, while stripping sometimes introduces artifacts, such as the 'ring effect’ observed in Figure 3A, the results did not appear to be influenced by prior probes.

      As suggested, we are prepared to repeat the LC3A probing on a new array to fully cement this interpretation. We note, however, that this will be done using a new commercial vendor, as our previous collaborator is no longer available (The original blots were ordered over 3 years ago). We anticipate some differences in the appearance of the blots due to changes in dot size and spacing from the new supplier. Given these variations, we propose adding the revised blot to the supplementary materials rather than the main figures to avoid disrupting the visual continuity of the data presentation.

      4) What is the number of replicates for the peptide array assays?

      __Response: __Due to cost considerations, peptide array assays in our study were conducted as one or two replicates. We understand the limitations this presents in terms of statistical robustness and variability assessment. However, where possible, we supplemented these assays with additional validation experiments and controls to ensure reliability of our findings. For critical experiments, including key interaction validations, we used independent biochemical assays to confirm the results obtained from the peptide arrays.

      5) The authors should test whether the enhancement of NBR1 flux by TAX1BP1 is only due to the contribution of an additional LIR or potential other functions of TAX1BP1 (e.g. ubiquitin binding or FIP200 binding). The authors should expand the panel shown in Figure 6E with TAX1BP1 mutant which are deficient in ubiquitin or FIP200 binding.

      __Response: __We thank the reviewer for their suggestion. We will include data with TAX1BP1 mutants that are deficient in ubiquitin or FIP200 binding

      Minor comments

      6) Molecular weight markers are missing on immunoblots.

      __Response: __We apologize for this oversight. We will amend figure to include molecular weight markers.

      7) It would be more informative (since some proteins have more than one LIR) if the actual LIR motif would be displayed next to the peptide array (as e.g. done for NBR1) and not only in the supplements.

      __Response: __We appreciate this thoughtful input and will consider its implementation carefully. We will explore the feasibility of integrating this detail in a manner that maintains figure clarity.

      8) Along this line in Figure 2A, NBR1's LIR (marked with a red star) is among the LIRs for which no binding was observed. The authors should explain this.

      Response: Thank you for noting the discrepancy in NBR1 peptide binding observed in Figure 2A compared to Figure 1C. This observation was noted by all reviewers. The difference likely arises from the solubility issues associated with the NBR1 peptide in the format used for Figure 2A, where the peptide sequence included the LIR motif plus 10 amino acids on each side. The core LIR sequence of NBR1 (YIII) is highly hydrophobic, which can affect its solubility and, consequently, its observed binding in our peptide array.

      To overcome this, we optimized the LIR sequence of NBR1 for peptide arrays (amino acids 725-749), which includes seven residues before the LIR and 14 residues after. This shift enhanced solubility and facilitated more reliable probing in our experiments (notably Fig 3). In Fig2A and other assays, both the standard and the optimized formats of the NBR1 LIR were included: the standard format to maintain consistency with other LIRs extracted from the LIR-Central database and the optimized version as a control to validate our results.

      We will detail this explanation in the manuscript, clarifying the rationale behind the observed binding differences.


      Significance

      Collectively, the work of North and colleagues provide valuable new mechanistic insights into the network of interaction that governs the function of SARs. Importantly, this works extends the knowledge in the field that SARs are acting in an orchestrated manner which reinforces their delivery to lysosomes. However, given the involvement of several SARs in the same process, it is crucial to dissect the binding modalities among these factors. In this regard, the current study on fine mapping binding sites provides an important contribution. In particular, in probing the in vitro findings in reconstituted KO cells. This part is really strong. In addition, the identification of critical residues for these bindings events represents important tools for the autophagy community which will be among the basic research audience most interested in this technical study.

      __ __


    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study entitled "Rifampicin tolerance and growth fitness among isoniazid-resistant clinical Mycobacterium tuberculosis isolates: an in-vitro longitudinal study" by Vijay et al. provides valuable insights into the association of rifampicin tolerance and growth fitness with isoniazid resistance among clinical isolates of M. tuberculosis. Antibiotic tolerance in M. tuberculosis is an important topic since it contributes to the lengthy and complicated treatment required to cure tuberculosis disease and may portend the emergence of antibiotic resistance. The authors found that rifampicin tolerance was correlated with bacterial growth, rifampicin minimum inhibitory concentrations, and isoniazid-resistance mutations.

      Strengths:

      The large number of clinical isolates evaluated and their longitudinal nature during treatment for TB (including exposure to rifampin) are strengths of the study.

      Weaknesses:

      Some of the methodologies are not well explained or justified and the association of antibiotic tolerance with growth rate is not a novel finding. In addition, the molecular mechanisms underlying rifampicin tolerance only in rapidly growing isoniazid-resistant isolates have not been elucidated and the potential implications of these findings for clinical management are not immediately apparent.

      We thank the reviewer for the comments, we have modified the method section and figure 1 to clarify the method as suggested by the reviewer.

      Although we agree that previous studies have shown the association of slow growth rate with antibiotic tolerance, ours is the most comprehensive assessment of rifampicin tolerance among clinical isolates, to our knowledge. In particular, we show that the degree of tolerance in clinical isolates can vary over several orders of magnitude: which had not been previously documented or appreciated. Furthermore, the association of high tolerance among IR isolates is a new finding, and given the potential for tolerance to increase risk of de novo drug resistance, our study suggests that IR isolates with high rifampicin tolerance may present a risk for development of MDR-TB.

      In addition, we have also analysed the longitudinal isolates and the genetic variants emerging in them associated with increase in rifampicin tolerance. This analysis reveals possible multiple pathways to increase in rifampicin tolerance among clinical M. tuberculosis isolates. Possible clinical implication includes associating high rifampicin tolerance and isoniazid resistance as a risk factor for tuberculosis treatment failure. This study helps to develop further clinical studies to evaluate the role of rifampicin tolerance in IR isolates and treatment outcome. We have focused on these aspects in the discussion of the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study by Vijay and colleagues addresses a clinically important, and often overlooked aspect of Tb treatment. Detecting for variations in the level of antibiotic tolerance amongst otherwise antibiotic-susceptible isolates is difficult to routinely screen for, and consequently not performed. The authors, present a convincing argument that indeed, there is significant variation in the susceptibility of isoniazid-resistant strains to killing by rifampicin, in some cases at the same tolerance levels as bona fide resistant strains. On the whole, the study is easy to follow and the results are justified. This work should be of interest to the wider TB community at both a clinical and basic level.

      Weaknesses:

      The manuscript is long, repetitive in places, and the figures could use some amending to improve clarity (this could be a me-specific issue as they look ok on my screen, yet the colour is poor when printed).

      We thank the reviewer for the comments, we have modified the revised manuscript as per the reviewer suggestions.

      It would have been great to have seen some correlation between increased rifampicin tolerance and treatment outcome, although I'm not sure if this data is available to the researchers. I agree with the researchers the use of a single media condition is a limitation. However, this is true of a lot of studies. Rifampicin tolerance and treatment outcome analysis.

      We agree with the reviewer that correlation between rifampicin tolerance and treatment outcome is important. This needs to be performed in future studies with better design to correlate rifampicin tolerance with treatment progression or outcome data.  

      Reviewer #3 (Public Review):

      Summary:

      The authors have initiated studies to understand the molecular mechanisms underlying the devolvement of multi-drug resistance in clinical Mtb strains. They demonstrate the association of isoniazid-resistant isolates by rifampicin treatment supporting the idea that selection of MDR is a microenvironment phenomenon and involves a group of isolates.

      Strengths:

      The methods used in this study are robust and the results support the authors' claims to a major extent.

      Weaknesses:

      The manuscript needs a thorough vetting of the language. At present, the language makes it very difficult to comprehend the methodology and results.

      We thank the reviewer for the comments, we have revised the manuscript as per the reviewer’s suggestions.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Methods: The authors attempt to differentiate between "fast"- and "slow"-growing bacteria in order to determine if the growth rate is associated with rifampicin tolerance. This is accomplished by assessing growth on solid agar at 15 and 60 days post-incubation, respectively. However, mycobacterial growth rate is not a binary phenomenon but rather a continuous variable. Moreover, it is not clear why 15 and 60 days were selected. Also, instead of a "slow growth" phenotype, the 60-day time point might simply reflect a longer lag phase. Were the plates examined at any interval time points? It would be interesting to know whether colony growth was delayed overall in the populations observed only at 60 days, or simply if the appearance of microcolonies visible to the naked eye was delayed (with normal growth afterwards).

      We thank the reviewer for the comments, we want to clarify that we have not used agar plates but most-probable number method to determine the survival fraction post antibiotic treatment. We have clarified this in the revised manuscript and revised figure 1. The MPN method is a binary measure (growth/ no growth) and therefore cannot differentiate between long lag time and other mechanisms. In our original analysis, we included an intermediate time point of 30 days, but these data (included as supp fig. 1) cannot address the issue of lag phase directly. Since the 30-day time point did not add to the overall analysis and interpretation, we had not included them in the original submission.

      (2) Methods/Results/Discussion: Some important clinical information is missing-how were the patients treated who had IR isolates? Did they receive the standard regimen for DS TB or was another drug substituted for isoniazid? Exposure to different drugs could affect the rifampicin-tolerant populations during the intensive phase (Figure 5).

      Thank you for this comment, we have included the information regarding the treatment regimen in the revised manuscript.

      Were there differences in microbiological (sputum culture conversion rate at 8 weeks or time to culture negativity) or clinical outcomes based on isoniazid susceptibility? Perhaps more importantly, were there differences in microbiological/clinical outcomes based on the proportion of bacterial subpopulations with rifampicin tolerance for a particular isolate? There should be more discussion on the potential clinical implications of the study's findings.

      We agree with the reviewer that correlation between rifampicin tolerance and treatment progression or outcome is important. This needs to be performed in future studies with better design to correlate rifampicin tolerance with treatment progression or outcome data.  

      (3) Results (Figure 3A): Although an interesting finding, the increased rifampicin tolerance observed only in the "rapidly" growing populations of isoniazid-resistant isolates (IR) vs. isoniazid-susceptible (IS) isolates is not explained. In contrast, equally, increased rifampicin tolerance is seen in the "slowly" growing populations of both IR and IS isolates. It would be interesting to know if these slowly growing populations show specific tolerance to rifampicin or if, as expected, slow growth confers tolerance to a range of different bactericidal antibiotics.

      We thank the reviewer for the suggestions. we agree these will be interesting to investigate in a future study but are outside the scope of the current study.

      (4) Results (Figure 3B): The basis for the classification into tertiles is not clear and appears somewhat arbitrary-does this represent the survival of a particular isolate following rifampicin exposure relative to the other isolates based on isoniazid susceptibility (IS or IR) or the % growth relative to other populations for the same isolate? Figure 3B is missing a y-axis label. Is it a log10 MPN ratio?

      We thank the reviewer for pointing this, we want to clarify that for the classification into tertiles, first we pooled both group of isolates isoniazid susceptible (IS) and isoniazid resistant (IR) into a single population. Subsequently, we categorized this unified population into three distinct groups: low, medium, and high, based on their survival fraction following rifampicin treatment. Consequently, the 'low,' 'medium,' and 'high' tertiles represent the survival of each isolate following rifampicin exposure relative to the total number of isolates  combing both IS and IR isolates.

      For clarity, we provide a breakdown of the criteria for each tertile:

      +Low tertile: Consists of isolates with the lowest survival fraction (bottom 25%).

      +Medium tertile: Encompasses isolates with survival fractions that fall between the bottom 25% and the top 25%.

      +High tertile: Comprises isolates with the highest survival fractions (top 25%). This we have modified in the revised manuscript to clarify.

      We have also modified the Figure 3B to correct the y-axis label.

      (5) Results (lines 185-186): For correlating relative growth in the absence of antibiotics, 19 clinical isolates "outliers" were removed without explanation.

      We have added explanation for the “outliers” which were removed earlier due to deviation from normal distribution, we have also provided the supplementary figure 3 which includes these outliers.

      (6) Results (lines 203-211): The authors attempted to investigate a potential association between the mechanism of M. tuberculosis isoniazid resistance and the degree of rifampicin tolerance. However, the vast majority of IR clinical isolates (n=71) had a katG_S315X mutation and only 8 isolates had alternative mutations (inhA_I21T and fabG1_C-15X). Given the wide range of rifampicin tolerance observed within these isoniazid-resistant isolates, they concluded that other genetic or epigenetic determinants must be playing a role. WGS of longitudinally collected isolates from the same patients during TB treatment yielded non-synonymous SNPs in a list of genes previously reported to be associated with persistence, tolerance, and mycobacterial survival. However, precise mechanisms (including, e.g., expression of efflux pumps) are not investigated.

      We thank the reviewer for summarising the findings. Yes, we agree that investigating the precise mechanism of rifampicin tolerance is beyond the scope of the current work.

      Minor comments:

      (1) Abstract (line 41): The nonstandard abbreviations "IR" and "IS" have not been introduced prior to this usage.

      We have modified this in the abstract.

      (2) Introduction (line 60): Insert "phenomena" or "mechanisms" after "two".

      We have modified this in the introduction.

      (3) Introduction (lines 66-69): This sentence is confusing, especially the second part ("supporting this studies...").

      We have modified the lines to clarify.

      (4) Introduction (line 84): In the current text, it appears as if "IR" is the abbreviation for "isoniazid". Therefore, I recommend changing "resistance to isoniazid" to "isoniazid resistance".

      We have modified this in the revised manuscript.

      (5) Results (line 141): Insert "the" before "rest".

      We have modified this in the revised manuscript.

      (6) Results (line 187): Replace "did not had" with "did not have".

      We have modified this in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      The abstract is long and repetitive. It needs reworking and shortening to improve clarity and highlight the main takeaway message.

      We thanks the reviewer for the suggestions and have modified this in the revised manuscript.

      The introduction is interesting and contains relevant information. However, it is long and takes a while to get to the point of the study. It needs re-writing to emphasise key prior results and the purpose of this study.

      We thanks the reviewer for the suggestions and we have modified this in the revised manuscript.

      Results:

      As the study relies predominately on the use of MPN, I think a simple schematic of how the experiment is performed would be informative. Could this be added to Figure 1?

      We have revised the figure 1 in the manuscript to include the schematic representation.

      Some of the differences in MKD90, whilst they may be significant, are small so it would at least provide context as to the relevance of these differences. This may also alleviate my confusion as to how the authors can measure the time required to achieve MDK90 as 1.23-1.31 days when the first time point that is taken is day 2 (the data in Figure 2). They have FigS6 but this is small and hard to follow.

      We thank the reviewer for this suggestion, we have modified this in the revised manuscript and figureS6.

      Figure 2:

      Would be helpful to have -1 on the Y axis.

      The grey dots don't print very well (Might be my printer)

      We have modified this in the revised manuscript, figure 2.

      Line 142: The authors note a difference in RIF tolerance at day 15 that disappeared by day 60. I assume they are referring to the day 5 timepoint although this isn't clear as written.

      Yes, it is referring to the day 5 time point and we have clarified this in the revised manuscript.

      The section starting at line 148 (fig 3) is interesting, but it is difficult to read and follow what the difference is between this data and the prior data in Figure 2. It also wasn't until about line 165 that the purpose became clear. Overall the conclusions are sound and interesting.

      We have modified this in the revised manuscript.

      Line 154: What are the early and late time recovery time points?

      Is Figure 3A the same data as Figure 2?

      We have clarified this in the revised manuscript, the figure 3A is the same data as Figure 2.

      I found Figure 6 hard to follow. I'm not sure how better to present this data, but it should be improved. Some further clarification in the text would be helpful.

      We thank the reviewer for the suggestions. We have added more explanation in the text to clarify figure 6.

      Conclusions:

      The conclusions are sound, based on the data presented. The clinical relevance is highlighted, yet appropriately phrased to not be too far-reaching.

      Again, I think the conclusions could be condensed considerably. It is repetitive in places, which distills the main outcomes of this otherwise interesting and important study. The authors appropriately highlight some of the limitations of their study.

      We thank the reviewer for these comments and have modified this in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript "Rifampicin tolerance and growth fitness among isoniazid-resistant clinical Mycobacterium tuberculosis isolates: an in-vitro longitudinal study" by Srinivasan et.al., details the identification/ development of isoniazid-resistant strains in clinical isolates following testament with rifampicin. This is an important aspect of understanding MDR development in TB strains. the results are promising and gel well with the hypothesis. However, the manuscript requires a thorough language modification. While the overall idea is clear the methodology does not come out clearly.

      Specific comments:

      (1) It is not clear whether rifampicin treatments were given for 2 and 5 days before kill curves or for 15 and 60 days? The methodology needs to be phased clearly. Why was this time interval of 15 days and 60 days taken? is there a rationale for this?

      We thank the reviewer for the suggestions, we have modified the method and figure 1 to clarify this in the revised manuscript.

      (2) A concentration of 2ug/ml was used for in vitro culture in this study. While the authors themselves indicate that this is well above the MIC, this might represent a non- natural dose and hence may force the evolution of strains. What will be the scenario in the natural course of antibiotic treatment (dose at MIC or less than MIC)?

      We have observed that till 5 days there is no significant resistant emergence but after 5 days only resistance emerges, therefore we avoided determining the survival fraction after resistance emergence, the kill curve represents mostly tolerant sub population. ADD: Pharmacokinetic studies of rifampicin dosing suggest that peak concentrations of >2-32 µg/mL are typical for standard doses of the drug, therefore we believe the chosen concentration of 2 µg/mL to be physiologically relevant.

      (3) As described in line 155, the survival spanned a broad distribution, across a million times in difference. This is rather surprising that 5 days of rifampicin treatment would lead to such a spread in resistance patterns. Did the authors study the different populations to understand this phenomenon? This is important given the scale of resistance developed in this short time.

      We want to clarify that the broad range of survival fraction reflect the difference in tolerant sub-populations but not resistant sub-population to rifampicin as they are determined post rifampicin treatment in rifampicin free media, this has been clarified in the revised figure 1.

      Overall, the manuscript is a detailed study with new insights into the development of multi-drug resistance by Mtb. A thorough vetting for language is essential for a greater impact of the study.

      We thank the reviewer and have attempted to improve the clarity of the language to increase the potential impact of our findings.

    1. Author response:

      The following is the authors' response to the current reviews.

      Reviewer #1 (Public Review):

      I'll begin by summarizing what I understand from the results presented, and where relevant how my understanding seems to differ from the authors' claims. I'll then make specific comments with respect to points raised in my previous review (below), using the same numbering. Because this is a revision I'll try to restrict comments here to the changes made, which provide some clarification, but leave many issues incompletely addressed.

      As I understand it the main new result here is that certain recurrent network architectures promote emergence of coordinated grid firing patterns in a model previously introduced by Kropff and Treves (Hippocampus, 2008). The previous work very nicely showed that single neurons that receive stable spatial input could 'learn' to generate grid representations by combining a plasticity rule with firing rate adaptation. The previous study also showed that when multiple neurons were synaptically connected their grid representations could develop a shared orientation, although with the recurrent connectivity previously used this substantially reduced the grid scores of many of the neurons. The advance here is to show that if the initial recurrent connectivity is consistent with that of a line attractor then the network does a much better job of establishing grid firing patterns with shared orientation.

      Beyond this point, things become potentially confusing. As I understand it now, the important influence of the recurrent dynamics is in establishing the shared orientation and not in its online generation. This is clear from Figure S3, but not from an initial read of the abstract or main text. This result is consistent with Kropff and Treves' initial suggestion that 'a strong collateral connection... from neuron A to neuron B... favors the two neurons to have close-by fields... Summing all possible contributions would result in a field for neuron B that is a ring around the field of neuron A.' This should be the case for the recurrent connections now considered, but the evidence provided doesn't convincingly show that attractor dynamics of the circuit are a necessary condition for this to arise. My general suggestion for the authors is to remove these kind of claims and to keep their interpretations more closely aligned with what the results show.

      We would like to clarify that the simple (flexible) attractor is a weaker condition than the ones previously used to align grid cells. However, by no means we claim that it is a necessary condition for grid maps to align. Other architectures, certainly more complex ones but perhaps even simpler ones, can align grid maps in our model.

      Major (numbered according to previous review)

      (1) Does the network maintain attractor dynamics after training? Results now show that 'in a trained network without feedforward Hebbian learning the removal of recurrent collaterals results in a slight increase in gridness and spacing'. This clearly implies that the recurrent collaterals are not required for online generation of the grid patterns. This point needs to be abundantly clear in the abstract and main text so the reader can appreciate that the recurrent dynamics are important specifically during learning.

      We respectfully disagree with the interpretation of this result. In this model cells self-organize to produce aligned grid maps. In such systems it makes sense to characterize the equilibrium states of the system. We turned learning off in Figure S3 to show that the recurrent connections have a contractive effect on grid spacing. But artificially turning off learning means that one can no longer make claims about the equilibrium states of the system, since it can no longer evolve freely. In a functional network, if the recurrent attractor is removed, the system will evolve towards poor gridness and no alignment no matter what the starting point is, as also shown in Figure S3. Several experimental results invite us to think of grid cells as the equilibrium solution of a series of constraints that is ready to change at any time: Barry et al, 2012; Yoon et al, 2013; Carpenter et al, 2015; Krupic et al, 2015; Krupic et al, 2018; Jayakumar et al, 2019.

      One point in which we perhaps agree with the reviewer is that information about the hexagonal maps is kept in the feedforward weights, while behavior and the recurrent collaterals act as constraints of which these feedforward weights are the equilibrium solution.

      (2) Additional controls for Figure 2 to test that it is connectivity rather than attractor dynamics (e.g. drawing weights from Gaussian or exponential distributions). The authors provide one additional control based on shuffling weights. However, this is far from exhaustive and it seems difficult on this basis to conclude that it is specifically the attractor dynamics that drive the emergence of coordinated grid firing.

      Again, we do not claim that this is the only way in which grid maps can be aligned, but it is the simplest one proposed so far. We were asked if it was the specific combination of input weights to a cell rather than the organization provided by the attractor which resulted in aligned maps. By shuffling the inputs to a cell we keep the combination of inputs invariant but lose the attractor architecture. Since grid maps in this new situation are not aligned, we can safely conclude that it is not the combination of inputs per se, but the specific organization of these inputs that allows grid alignment. It is not fully clear to us what ‘exhaustive’ means in this context.

      (3) What happens if recurrent connections are turned off? The new data clearly show that the recurrent connections are not required for online grid firing, but this is not clear from the abstract and is hard to appreciate from the main text.

      This point is related to (1). Absent this constraint, Figure S3 shows that the system evolves toward larger spacing, with poorer gridness and no alignment.

      (4) This is addressed, although the legend to Fig. S2D could provide an explanation / definition for the y-axis values.

      We have now added: Mean input fields are the sum of all inputs of a given kind entering a neuron at a given moment in time, averaged across cells and time.

      (5) Given the 2D structure of the network input it perhaps isn't surprising that the network generates 2D representations and this may have little to do with its 1D connectivity. The finding that the networks maintain coordinated grids when recurrent connections are switched off supports my initial concern and the authors explanation, to me at least, remain confusing. I think it would be helpful to consider that the connectivity is specifically important for establishing the coordinated grid firing, but that the online network does not require attractor dynamics to generate coordinated grid firing.

      This point is related to (1) and (3). We agree with the reviewer that the input lies within a 2D manifold, but this is not something that the network has to find out because it receives one datapoint of information at a time. This alone is not enough to form aligned grid cells, since each grid cell can find a roughly equivalent equilibrium in a different direction. It is only the constraint imposed by the recurrent collaterals that aligns grid maps, and, as we show, this constraint does not need to be constructed ad hoc to work on 2D, as previously thought. When recurrent connections are switched off, the system evolves toward unaligned grid maps, with larger spacing and lower gridness. Regarding the results obtained after modifying the network and turning off learning, we think they have a very limited scope (in this case showing the contractive effect of recurrent collaterals on grid spacing), given that the system is artificially being kept out of its natural equilibrium.

      (6) Clarity of the introduction. This is somewhat clearer, but I wonder if it would be hard for someone not familiar with the literature to accurately appreciate the key points.

      We have made our best effort to improve the clarity of the introduction.

      (7) Remapping. I'm not sure why this is ill posed. It seems the proposed model can not account for remapping results (e.g. Fyhn et al. 2007). Perhaps the authors could just clearly state this as a limitation of the model (or show that it can do this).

      We view our model as perfectly consistent with Fyhn et al, 2007. Remapping is not triggered by the network itself, though, but rather by a re-arrangement of the inputs requiring the network to learn new associations. Different simulations of the same model with identical parameters can be interpreted as remapping experiments.

      Reviewer #3 (Public Review):

      Summary:

      The paper proposes an alternative to the attractor hypothesis, as an explanation for the fact that grid cell population activity patterns (within a module) span a toroidal manifold. The proposal is based on a class of models that were extensively studied in the past, in which grid cells are driven by synaptic inputs from place cells in the hippocampus. The synapses are updated according to a Hebbian plasticity rule. Combined with an adaptation mechanism, this leads to patterning of the inputs from place cells to grid cells such that the spatial activity patterns are organized as an array of localized firing fields with hexagonal order. I refer to these models below as feedforward models.

      It has already been shown by Si, Kropff, and Treves in 2012 that recurrent connections between grid cells can lead to alignment of their spatial response patterns. This idea was revisited by Urdapilleta, Si, and Treves in 2017. Thus, it should already be clear that in such models, the population activity pattern spans a manifold with toroidal topology. The main new contributions in the present paper are (i) in considering a form of recurrent connectivity that was not directly addressed before. (ii) in applying topological analysis to simulations of the model. (iii) in interpreting the results as a potential explanation for the observations of Gardner et al.

      We wanted to note that we do not see this paper as proposing an alternative to the attractor hypothesis, given that we use attractor networks, but rather as an exploration of possibilities not yet visited by this hypothesis.

      Strengths:

      The exploration of learning in a feedforward model, when recurrent connectivity in the grid cell layer is structured in a ring topology, is interesting. The insight that this not only align the grid cells in a common direction but also creates a correspondence between their intrinsic coordinate (in terms of the ring-like recurrent connectivity) and their tuning on the torus is interesting as well, and the paper as a whole may influence future theoretical thinking on the mechanisms giving rise to the properties of grid cells.

      Weaknesses:

      (1) In Si, Kropff and Treves (2012) recurrent connectivity was dependent on the head direction tuning, in addition to the location on a 2d plane, and therefore involved a ring structure. Urdapilleta, Si, and Treves considered connectivity that depends on the distance on a 2d plane. The novelty here is that the initial connectivity is structured uniquely according to latent coordinates residing on a ring.

      The recurrent architectures in the cited works are complex and require arranging cells in a 2D manifold to calculate connectivity based on their relative 2D position. In other words, the 2D structure is imprinted in the architecture, as in our 2D condition. In this work the network is much simpler and only requires neighboring relations in 1D. Such relationships have been shown to spontaneously emerge in the hippocampal formation (Pastalkova et al, 2008; Gonzalo Cogno et al, 2024).

      (2) The paper refers to the initial connectivity within the grid cell layer as one that produces an attractor. However, it is not shown that this connectivity, on its own, indeed sustains persistent attractor states. Furthermore, it is not clear whether this is even necessary to obtain the results of the model. It seems possible that (possibly weaker) connections with ring topology, that do not produce attractor dynamics but induce correlations between neurons with similar locations on the ring would be sufficient to align the spatial response patterns during the learning of feedforward weights.

      Regarding the first part of the comment, the recurrent collaterals create one or at times multiple bumps of activity in the network so that neighboring (interconnected) cells activate together. An initial random state of activity rapidly falls into this dynamic, constrained by the attractor. To us this is not surprising given that this connectivity is the classical means of creating a continuous attractor. Perhaps there is some deeper meaning in this comment that we are not fully grasping.

      Regarding the second part of the comment, we fully agree with the reviewer. We are presenting what so far is the simplest connectivity that can align grid maps, but by no means we claim that it is the simplest possible one. Regarding weaker connections with ring topology, we show in Figure S2 that a ring attractor with too weak or too strong connections is incapable of aligning grids, since a balance between feedforward and feedback inputs is required.

      (3) Given that all the grid cells are driven by an input from place cells that span a 2d manifold, and that the activity in the grid cell network settles on a steady state which is uniquely determined by the inputs, it is expected that the manifold of activity states in the grid cell layer, corresponding to inputs that locally span a 2d surface, would also locally span a 2d plane. The result is not surprising. My understanding is that this result is derived as a prerequisite for the topological analysis, and it is therefore quite technical.

      We understand that the reviewer is referring to the motivation behind studying local dimensionality. We agree that the topological analysis approach is quite technical, but it provides unique insights. The theorem of closed surfaces, which allows us to deduce a toroidal topology from Betti numbers (1,2,1), only applies to closed surfaces. One thus needs to show that the point cloud is a surface (local dimensionality of 2) and is closed (no borders or singularities). If borders or singularities were present, a toroidal topology could not be claimed from these Betti numbers. Thus, it is a crucial step of the analysis.

      (4) The modeling is all done in planar 2d environments, where the feedforward learning mechanism promotes the emergence of a hexagonal pattern in the single neuron tuning curve. Under the scenario in which grid cell responses are aligned (i.e. all neurons develop spatial patterns with the same spacing and orientation) it is already quite clear, even without any topological analysis that the emerging topology of the population activity is a torus.

      However, the toroidal topology of grid cells in reality has been observed by Gardner et al also in the wagon wheel environment, in sleep, and close to boundaries (whereas here the analysis is restricted to the a sub-region of the environment, far away from the walls). There is substantial evidence based on pairwise correlations that it persists also in various other situations, in which the spatial response pattern is not a hexagonal firing pattern. It is not clear that the mechanism proposed in the present paper would generate toroidal topology of the population activity in more complex environments. In fact, it seems likely that it will not do so, and this is not explored in the manuscript.

      We agree that our work was constrained to exploration in 2D and that the situations posed by the reviewer are challenging, but we do not see them as unsurmountable. The wagon wheel shows a preservation of toroidal topology locally, where the behavior of the animal is rather 2-dimensional. Globally, hexagonal maps are lost, which is compatible with some flexibility in the way grid maps are formed. If sleep meant that all inputs are turned off, our model would predict a dynamic dictated by the architecture (1D for the ring attractor, for example), but we do not really know that this is the case. In the future, we intend to explore predictive activity along the linear attractor, which could both result in path integration and in some level of preservation of the activity when inputs are completely turned off.

      Regarding boundaries, as we have argued before, the cited work chooses to filter away what looks like more than half of the overall explained variance through PCA, and this is only before applying a non-linear dimensionality reduction algorithm. It is specifically shown that the analyzed components are the ones with global periodicity throughout the environment. Thus, it is conceivable that through this approach, local irregularities found only at the borders are disregarded in favor of a clearer global picture. While using a different methodology, our approach follows a similar spirit, albeit with far less noisy data.

      (5) Moreover, the recent work of Gardner et al. demonstrated much more than the preservation of the topology in the different environments and in sleep: the toroidal tuning curves of individual neurons remained the same in different environments. Previous works, that analyzed pairwise correlations under hippocampal inactivation and various other manipulations, also pointed towards the same conclusion. Thus, the same population activity patterns are expressed in many different conditions. In the present model, this preservation across environments is not expected. Moreover, the results of Figure 6 suggest that even across distinct rectangular environments, toroidal tuning curves will not be preserved, because there are multiple possible arrangements of the phases on the torus which emerge in different simulations.

      We agree with this observation. A symmetry in our implementation results in the fact that only ~50% of times the system falls in the preferred solution, and the rest of the times it falls into other local minima. Whether this result is at odds with current observations can be debated on the basis of probabilities. However, we believe that the symmetry we found is purely circumstantial, and that it can be broken by elements such as head direction modulation or other ingredients used to achieve path integration. In other words, we acknowledge that symmetry is an issue of the implementation we show here (which has been kept as simple as possible to serve as a proof-of-principle) but we do not think that it is a defining feature of flexible attractors in general. We expect that future implementations that incorporate path integration capabilities will not present this kind of symmetry in the space of solutions.

      Regarding the rigid phase translation across modalities, while this effect is very clear in Gardner et al, it is less so in other datasets. The analyses shown in Hermansen et al (2024) can rather be interpreted as somewhere in the way between perfect rigid translation and fully randomized phases across navigation modalities.

      (6) In real grid cells, there is a dense and fairly uniform representation of all phases (see the toroidal tuning of grid cells measured by Gardner et al). Thus, the highly clustered phases obtained in the model (Fig. S1) seem incompatible with the experimental reality. I suspect that this may be related to the difficulty in identifying the topology of a torus in persistent homology analysis based on the transpose of the matrix M.

      We partly agree with this observation and note that a pattern of ordered phases is an issue not only for the 1D attractor but also for the 2D one, which appears much more uniform than in experimental data. The low number of neurons we used for computational economy and the full connectivity could be key ingredients to generate these phase patterns. To show that this is not a defining feature of flexible attractors, apart from the fact that these patterns appear also with non-flexible 2D architectures, we included in Figure S1 simulations with ‘fragmented 1D’ architectures. In this case the architecture is a superposition of 20 random 1D stripe-like attractors. While the alignment of maps achieved with this architecture is almost at the same level as the one obtained with 1D and 2D attractors, the phases are much more similar to what has been observed experimentally, and less uniform than what is obtained with 2D attractors.

      (7) The motivations stated in the introduction came across to me as weak. As now acknolwledged in the manuscript, attractor models can be fully compatible with distortions of the hexagonal spatial response patterns - they become incompatible with this spatial distortions only if one adopts a highly naive and implausible hypothesis that the attractor state is updated only by path integration. While attractor models are compatible with distortions of the spatial response pattern, it is very difficult to explain why the population activity patterns are tightly preserved across multiple conditions without a rigid two-dimentional attractor structure. This strong prediction of attractor models withstood many experimental tests - in fact, I am not aware of any data set where substantial distortions of the toroidal activity manifold were observed, despite many attempts to challenge the model. This is the main motivation for attractor models. The present model does not explain these features, yet it also does not directly offer an explanation for distortions in the spatial response pattern.

      Some interesting examples are experiments in 3D, where grid cells presumably communicate with each other through the same recurrent collaterals, but global periodicity is lost and only some local order is preserved even away from boundaries (Ginosar et al, 2021; Grieves et al, 2021). While these datasets have not been explored using topological analysis, they serve as strong motivators to understanding 2D grid cells as one equilibrium solution that arises under some set of constraints, but belongs to a wider space of possible solutions that may arise as well under more flexible constraints. Even (and especially) if one adheres to the hypothesis that grid cells are pre-wired into a 2D torus, a concept like flexible attractors might become useful to understand how their activity is rendered in 3D. Another strong motivation is our lack of understanding of how a perfectly balanced 2D structure is formed and maintained. Simpler architectures could be thought of as alternatives, but also as an intermediate step towards it.

      Regarding the rigid phase translation across modalities, while this effect is very clear in Gardner et al, it is less so in other datasets. The analyses shown in Hermansen et al (2024) can rather be interpreted as somewhere in the way between perfect rigid translation and fully randomized phases.

      In a separate point, although it might not be strictly related to the comment, we do not fully share the idea that persistent activity patterns during sleep are necessary or sufficient conditions for attractor dynamics, although we do agree that attractors could be the mechanism behind them and any alternative is at least as complex as attractors. On the necessity side, attractors in the hippocampus are not constantly engaged (Wills et al, 2005). For sufficiency, one should prove that no other network is capable of reproducing the phenomenon, and to our best knowledge we are still far from that point.

      (8) There is also some weakness in the mathematical description of the dynamics. Mathematical equations are formulated in discrete time steps, without a clear interpretation in terms of biophysically relevant time scales. It appears that there are no terms in the dynamics associated with an intrinsic time scale of the neurons or the synapses (a leak time constant and/or synaptic time constants). I generally favor simple models without lots of complexity, yet within this style of modelling, the formulation adopted in this manuscript is unconventional, introducing a difficulty in interpreting synaptic weights as being weak or strong, and a difficulty in interpreting the model in the context of other studies.

      We chose to keep the model as simple as possible and in the line of previous publications developing it. However, we see the usefulness of putting it in what in the meantime has become a canonical framework. Fortunately this has been done by D’Albis and Kempter (2017). In our simplified version of the model there is no leak term and adaptation on its own brings down activity in the absence of input, but we agree that such a term could be added, albeit not without modifying all other network parameters.

      In my view, the weaknesses discussed above limit the ability of the model, as it stands, to offer a compelling explanation for the toroidal topology of grid cell population activity patterns, and especially the rigidity of the manifold across environments and behavioral states. Still, the work offers an interesting way of thinking on how the toroidal topology might emerge.

      Reviewer 1:

      Reviewer #1 (Recommendations For The Authors):

      See comments above. In addition:

      (1) Abstract: '...interconnected by a two-dimensional attractor guided by path integration'. This is unclear. I think the intended meaning might be along the lines of '...their being computed by a 2D continous attractor that performs path integration'?

      'path integration allowing for no deviations from the hexagonal pattern' This is incorrect. Local modulation of the gain of the speed input to a standard CAN would distort the grid pattern.

      'Using topological data analysis, we show that the resulting population activity is a sample of a torus' Activity in the model?

      'More generally, our results represent a proof of principle against the intuition that the architecture and the representation manifold of an attractor are topological objects of the same dimensionality, with implications to the study of attractor networks across the brain' I guess one might hold this intuition, but it strikes me as obvious that if you impose an sufficiently strong n-dimensional input on a network then it it's activity could have the same dimensionality. I don't really see this as being a point worth highlighting. Perhaps the more interesting point, it that during learning the recurrent connectivity aligns the grid fields of neurons in the network, and this may be a specific function of the 1D attractor dynamcis, although I don't think the authors have made this point convincing.

      'The flexibility of this low dimensional attractor allows it to negotiate the geometry of the representation manifold with the feedforward inputs'. See above for comments on the use of 'negotiate'.

      'while the ensemble of maps preserves features of the network architecture'. I don't understand this. What is the 'ensemble of maps' and what are the features referred to.

      We have reviewed the abstract considering these points. Regarding the ‘strong n-dimensional input’, we want to point out that it is not the input itself that generates a torus (the no attractor condition does not lead to a torus) but rather the interplay between the input and the attractor.

      ‘Perhaps the more interesting point …’, we do not fully understand how this sentence deviates from our own conclusions. We here show that a strong n-dimensional input is not enough to align grid cells (produce a n-torus), it is the interplay between inputs and attractor dynamics that does so, even if the attractor is not n-dimensional in terms of architecture.

      The ensemble of maps refers to the transpose of the population activity matrix, where each point in the cloud is a map, and the features refer to the persistent homology.

      (2) The manuscript still fails to clarify the difference between a model that path integrates in two dimensions and a model that simply represents information with a given dimensionality. The argument that it's surprising that a network with 1D architecture represents a higher dimensional input strikes me as incorrect and an unnecessary attempt to argue for conceptual importance. At least to me this isn't surprising. It would be surprising if the 1D network could path integrate but this doesn't seem to be the case.

      In response to the reviewer’s concerns, we have made clear in the introduction and discussion that this model has no path integration capabilities, although we aim to develop a model capable of path integration using the kind of simple architecture presented here. We want to highlight here that equating attractor dynamics with path integration would be a conceptual mistake.

      (3) Other wording also seems to make unnecessary conceptual claims. E.g. The repeated use of 'negotiate' implies some degree of intelligence, or at least an exchange of information, that isn't shown to exist. I wonder if more precise language could be used? As I understand it the dimensionality is bounded by the inputs on the one hand, and the network connectivity on the other, with the actual dimensionality being a function of the recurrent and feedforward synaptic weights. There's clearly some role for the relative weights and the properties of plasticity rules, but I don't see any evidence for a negotiation.

      An interesting observation in Figure S2 is that grid maps are aligned only if the relative strength of feedforward and recurrent inputs is similar. If one of them can impose over the other, grid maps do not align. This equilibrium can metaphorically be thought of as a negotiation instance, where the negotiation is an emergent property of the system rather than something happening at an individual synapse.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Reviewer #1 (Recommendations For The Authors):

      Major

      (1) What is the evidence that, after training, the 1D network maintains its attractor dynamics when feedforward inputs are active? If the claim is that it does then it's important to provide evidence, e.g. responses to perturbations, or other tests. The alternative is that after training the recurrent inputs are drowned out by the feed forward spatial inputs.

      We agree with the reviewer on the importance of this point. In our model, networks are always learning, and the population activity represented by aligned grid maps in a trained network is a dynamic equilibrium that emerges from the interplay between feedforward and collateral constraints. If Hebbian learning is turned off, one gets a snapshot of the network at that moment. We now show in Fig. S3 that in a trained network without feedforward Hebbian learning the removal of recurrent collaterals results in a slight increase in gridness and spacing. The expansion is due to the fact that, as we argue in the Results section, the attractor has a contractive effect on grid maps, which could relate to observations in novel environments (Barry et al, 2007). If Hebbian learning is turned on in the same situation, the maps, no longer constrained by the attractor, drift toward the equilibrium solution of the ‘No attractor’ condition, with significantly larger spacing, no alignment and lower individual gridness. Thus, the attractor is the force preventing them to do so when feedforward Hebbian learning is on.

      These observations point to the key role played by the attractor not only in forming but also in sustaining grid activity. The dynamic equilibrium framework fits well known properties of the system, such as its capacity to recalibrate very fast (Jayakumar et al, 2019), although this particular feature cannot be modeled with the current version of our model, that lacks path integration capabilities.

      (2) It would be useful to include additional control conditions for Figure 2 to test the hypothesis that it is simply connectivity, rather than attractor dynamics, that drives alignment.

      This could be achieved by randomly assigning strengths to the recurrent connections, e.g. drawing from exponential or Gaussian distributions.

      We agree and have included Fig. S2b-d, showing that the same distribution of collateral input weights entering each neuron, but lacking the 1D structure provided by the attractor, does not align grid maps. This is achieved by shuffling rows in the connectivity matrix, while avoiding self connections to make the comparison fair (self connections substantially alter the dynamic of the network, making it much more rigid). We observed that individual grid maps have very low gridness levels, even lower than in the no-attractor condition. In contrast, they have levels of population gridness slightly higher than in the no-attractor condition, but closer to 0 than to levels achieved with attractors. Our interpretation of these results is that irregular connectivity achieves some alignment in a few arbitrary directions and/or locations, which improves the coordination between maps at the expense of impairing rather than improving hexagonal responses of individual cells. Such observations stand in clear context to what is observed with continuous attractors with an orderly architecture.

      These results suggest that it is the structure of the attractor that allows grid cells to be aligned rather than the mere presence of recurrent collateral connections.

      (3) It seems conceivable that once trained the recurrent connections would no longer be required for alignment. Can this be evaluated by considering what happens if the recurrent connections are turned off after training (or slowly turned off during training)? Does the network continue to generate aligned grid fields?

      This point has elements in common with point 1. As we argued in that response, the attractor has two main effects on grid maps: it aligns them and it contracts them. If the attractor is turned off, feedforward Hebbian learning progressively drives maps toward the solution obtained for the ‘no attractor’ condition, characterized by maps with larger spacing, poorer gridness and lack of alignment.

      (4) After training what is the relative strength of the recurrent and feedforward inputs to each neuron?

      Both recurrent and feedforward synaptic-strength matrices are normalized throughout training, so that the overall incoming synaptic strength to each neuron is invariant. Because of this, although individual feed-forward and recurrent input fields vary dynamically, their average is constant, with the exception of the very first instances of the simulation, before a stable regime is reached in grid-cell activity levels. We have included Fig. S2d, showing the dynamics of feedforward and recurrent mean fields throughout learning as well as their ratio. In addition, Fig. S2a shows that the strength of recurrent relative to feedforward inputs is an important parameter, since alignment is only obtained in an intermediate range of ratios.

      (5) It would be helpful to also evaluate the low dimensional structure of the input to the network. Assuming it has a 2D structure, as it represents 2D space, can an explanation be provided for why it is surprising that the trained network also encodes activity with a 2D manifold? It strikes me that the more interesting finding might relate to alignment of the grids rather than claims about a 1D attractor encoding a 2D representation. Either way, stronger evidence and clearer discussion would be helpful.

      The reviewer is correct in assuming that the input has a 2D structure, that can be represented by a sheet embedded in a high dimensional space and thus has the Betti numbers [1,0,0]. The surprising element in our results is that we are showing for the first time that the population activity of an attractor network is constrained to a manifold that results from the negotiation between the architecture of the attractor and the inputs, and does not merely reflect the former as previously assumed. In this sense, the alignment of grid cells by a 1D attractor is an instance of the more general case that 1D attractors can encode 2D representations.

      It is certainly the case that the 2D input is a strong constraint pushing population activity toward a 2D manifold. However, the final form of the 2D manifold is strongly constrained by the attractor, as shown by the contrast with the no-attractor condition (a 2D sheet, as in the input, vs a torus when the attractor is present). The 1D attractor is able to flexibly adapt to the constraint posed by the inputs while doing its job (as demonstrated in previous points), which results in 2D grid maps aligned by a 1D attractor. Generally speaking, this work provides a proof of principle demonstrating that the topology of the attractor architecture and the manifold of the population activity space need not be identical, as previously widely assumed by the attractor community, and need not even have the same dimensionality. Instead, a single architecture can potentially be applied to many purposes. Hence, our work provides a valuable new perspective that applies to the study of attractors throughout the brain.

      (6) The introduction should be clearer about the different types of grid model and the computations they implement. E.g. The authors' previous model generates grid fields from spatial inputs, but if my understanding is correct it isn't able to path integrate. By contrast, while the many 2D models with continuous attractor dynamics also generate grid representations, they do so by path integration mechanisms that are computationally distinct from the spatial transformation implemented by feedforward models (see also general comments above).

      We agree with the reviewer and have made this point explicit in the introduction.

      (7) A prediction from continuous attractor models is that when place cells remap the low dimensional manifold of the grid activity is unaffected, except that the location of the activity bump is moved. It strikes me as important to test whether this is the case for the model presented here (my intuition is that it won't be, but it would be important to establish either way).

      We want to emphasize that our model is a continuous attractor model, so the question regarding the difference between what our model and continuous attractor network models predict is an ill-posed one. One of our main conclusions is precisely that attractors can work in a wider spectrum of ways than previously thought.

      In lack of a better definition, our multiple simulations could be thought of as training in different arenas. It is true that in our model maps take time to form, but this is also the case in novel environments (Barry et al, 2007 ), and continuous attractor models exclusively or strongly guided by self motion cues struggle to replicate this phenomenon. We show that the current version of our model accepts multiple solutions (in practice four but conceptually infinite countable), all of them resulting in a torus for the population activity (i.e. the same topology or low dimensional manifold). It is not clear to us how easy it would be to differentiate between most of these solutions in experimental data, with only incomplete information. This said, incorporating a symmetry-breaking ingredient to the model, for example related to head direction modulation, could perhaps lead to the prevalence of a single type of solution. We intend to explore this possibility in the future in order to add path-integration capabilities to the system, as described in the discussion.

      (8) The Discussion implies that 1D networks could perform path integration in a manner similar to 2D networks. This is a strong claim but isn't supported by evidence in the study. I suggest either providing evidence that this is the case for models of this kind or replacing it with a more careful discussion of the issue.

      The current version of our model has no path integration capabilities, as is now made explicit in the Introduction and Discussion. In addition, we have now made clear that the idea that path integration could perhaps be implemented using 1D networks is, although reasonable, purely speculative.

      Minor

      (1) Introduction. 'direct excitatory communication between them'. Suggest rewording to 'local synaptic interactions', as communication can also be purely inhibitory (e.g. Burak and Fiete, 2009) or indirect by excitation of local interneurons (e.g. Pastoll et al., Neuron, 2013).

      We agree and have adopted this phrasing.

      (2) The decision to focus the topology analysis on the 60 cm wide central square appears somewhat arbitrary. Are the irregularities referred to a property of the trained networks or would they also emerge with analysis of simulated ideal data? Can more justification be expanded and supplementary analyses be shown when the whole arena is used?

      In practical terms, a subsampling of the data to around half was needed because the persistent homology packages struggle to handle large amounts of data, especially in the calculation of H2. We decided to cut a portion of contiguous pixels in the open field at least larger than the hexagonal tile representing the whole grid population period (as represented in Figure 6). Leaving the borders aside was a logical choice since it is known that the solution at the borders is particularly influenced by the speed anisotropy of the virtual rat (see Si, Kropff & Treves, 2012), in a way that mimics how borders locally influence grid maps in actual rats (Krupic et al, 2015). The specific way in which our virtual rat handles borders is arbitrary and might not generalize. A second issue around borders is that maps are differently affected by incomplete smoothing, although this issue does not apply to our data because we did not smooth across neighboring pixels. In sum, considering the central 60 cm wide square was sufficient to contain the whole torus and a reasonable compromise that would allow us to perform all analyses in the part of the environment less influenced by boundaries.

      (3) It could help the general reader to briefly explain what a persistence diagram is.

      This is developed in the Appendix, but we have now added a reference to it and a brief description in the main text.

      (4) For the analyses in Figure 3-4, and separately for Figure 5, it might help the reader to provide visualizations of the low dimensional point cloud.

      All these calculations take place in the original high-dimensional point cloud. Doing them in a reduced space would be incorrect because there is no dimensionality reduction technique that guarantees the preservation of topology. In Figure 7 we reduce the dimensionality of data but emphasize that it is only done for visualization purposes, not to characterize topology. We also point out in this Figure that the same non-linear dimensionality reduction technique applied to objects with identical topology yields a wide variety of visualizations, some of them clear and some less clear. This observation further exemplifies why one cannot assume that a dimensionality-reduction technique preserves topology, even for a low-dimensional object embedded in a high-dimensional space.

      (5) The detailed comparison of the dynamics of each model is limited by the number of data points. Why not address this by new simulations with more neurons?

      We are not sure we understand this comment. In Figure 2, the dynamics for each model are markedly different. These are averages over 100 simulations. We are not sure what benefit would be obtained from adding more neurons. Before starting this work we searched for the minimal number of neurons that would result in convergence to an aligned solution in 2D networks, which we found to be around 100. Optimizing this parameter in advance was important to reduce computational costs throughout our work.

      (6) Could the variability in Figure 7 also be addressed by increasing the number of data points?

      As we argued in a previous point, there is no reason to expect preservation of topology after applying Isomap. We believe this lack of topology preservation to be the main driver of variability.

      (7) Page/line numbers would be useful.

      We agree. However, the text is curated by biorxiv which, to our best knowledge, does not include them.

      Reviewer 2:

      Reviewer #2 (Recommendations For The Authors):

      (1) I highly suggest that the author rewrite some parts of the Results. There are lots of details which should be put into the Methods part, for example, the implementation details of the network, the analysis details of the toroidal topology, etc. It will be better to focus on the results part first in each section, and then introduce some of the key details of achieving these results, to improve the readability of the work.

      This suggestion contrasts with that of Reviewer #1. As a compromise, we decided to include in the Results section only methodological details that are key to understanding the conclusions, and describe everything else in the Methods section.

      (2) 'Progressive increase in gridness and decrease in spacing across days have been observed in animals familiarizing with a novel environment...' From Fig.2c I didn't see much decrease. The authors may need to carry out some statistical test to prove this. Moreover, even the changes are significant, this might be not the consequence of the excitatory collateral constraint. To prove this, the authors may need to offer some direct evidence.

      We agree that the decrease is not evident in this figure due to the scale, so we are adding the correlation in the figure caption as proof. In addition, several arguments, some related to new analyses, demonstrate that the attractor contracts grid maps. First, the ‘no attractor’ condition has a markedly larger spacing compared to all other conditions (Fig. 2a). We also now show that spacing monotonically decreases with the strength of recurrent relative to feedforward weights, in a way that is rather independent of gridness (Fig. S2a). Second, as we now show in Fig. S2b-d, simulations with a shuffled 1D attractor, such that the sum of input synapses to each neuron are the same as in the 1D condition but no structure is present, lead to a spacing that is mid-way between the ‘no attractor’ condition and the conditions with attractors. Third, as we now show in Fig. S3a, turning off both recurrent connections and feedforward learning in a trained network results in a small increase in spacing. Fourth, as we now show in Fig. S3b, turning off recurrent connections while feedforward learning is kept on increases grid spacing to levels comparable to those of the ‘no attractor’ condition. All these elements support a role of the attractor in contracting grid spacing.

      (3) Some of the items need to be introduced first before going into details in the paper, for instance, the stipe-like attractor network, the Betti number, etc.

      We have added in the Results section a brief description and references to full developments in the Appendix.

      Reviewer 3 (Public Review):

      (1) It is not clear to me that the proposal here is fundamentally new. In Si, Kropff and Treves (2012) recurrent connectivity was dependent on the head direction tuning and thus had a ring structure. Urdapilleta, Si, and Treves considered connectivity that depends on the distance on a 2d plane.

      In the work of Si et al connectivity is constructed ad-hoc for conjunctive cells to represent a torus, it depends on head-directionality but also on the distance in a 2D plane. The topology of this architecture has not been assessed, but it is close to the typical 2D ‘rigid’ constraint. In the work of Urdapilleta et al, the network is a simple 2D one. The difference with our work is that we focus on the topology of the recurrent network and do not use head-direction modulation. In this context, we prove that a 1D network is enough to align grid cells and, more generally, we provide a proof of principle that the topology of the architecture and the representation space of an attractor network do not need to be identical, as previously assumed by the attractor community. These two important points were neither argued, speculated nor self-evident from the cited works.

      (2) The paper refers to the connectivity within the grid cell layer as an attractor. However, would this connectivity, on its own, indeed sustain persistent attractor states? This is not examined in the paper. Furthermore, is this even necessary to obtain the results in the model? Perhaps weak connections that do not produce an attractor would be sufficient to align the spatial response patterns during the learning of feedforward weights, and reproduce the results? In general, there is no exploration of how the strength of collateral interactions affects the outcome.

      The reviewer makes several important points. Local excitation combined with global inhibition is the archetypical architecture for continuous attractors (see for example Knierim and Zhang, Annual review of neuroscience, 2012). Thus, in the absence of feedforward input, we observe a bump of activity. As in all continuous attractors, this bump is not necessarily ‘persistent’ and instead is free to move along the attractor.

      We cannot prove that there is not a simpler architecture that has the same effect as our 1D or 1DL conditions, and we think that there are some interesting candidates to investigate in the future. What we now prove in new Fig. S2b-d is that it is not the strength of recurrent connections themselves, but instead the continuous attractor structure that aligns grid cells in our model. To demonstrate this, we shuffle incoming recurrent connections to each neuron in the 1D condition (while avoiding self-connections for fairness), and show that training does not lead to grid alignment. We also show in Fig. S1 that an architecture represented by 20 overlapping 1DL attractors, each formed by concatenating 10 random cells, aligns grid cells to levels slightly lower but similar to the 1D or 1DL attractors. This architecture can perhaps be considered as simpler to build in biological terms than all the others, but it is still constituted by continuous attractors.

      The strength of recurrent collaterals, or more precisely the recurrent to feedforward ratio, is crucial in our model to achieve a negotiated outcome from constraints imposed by the attractor and the inputs. We now show explicit measures of this ratio in Fig. S2, as well as examples showing that an imbalance in this ratio impairs grid alignment. When the ratio is too high or too low, both individual and population gridness are low. Interestingly, grid spacing behaves differently, decreasing monotonically with the relative strength of recurrent connections.

      (3) I did not understand what is learned from the local topology analysis. Given that all the grid cells are driven by an input from place cells that spans a 2d manifold, and that the activity in the grid cell network settles on a steady state that depends only on the inputs, isn't it quite obvious that the manifold of activity in the grid cell layer would have, locally, a 2d structure?

      The dimensionality of the input is important, although not the only determinant of the topology of the activity. The recurrent collaterals are the other determinant, and their architecture is a crucial feature. For example, as we now show in Figure S2b-d, shuffled recurrent synaptic weights fail to align grid cells. In the 1D condition, if feedforward inputs were absent, the dynamics of the activity would be confined to a ring. The opposite condition is our ‘no attractor’ condition, in which activity in the grid cell layer mimics the topology of inputs, a 2D sheet (and not a torus). It is in the intermediate range, when both feedforward and recurrent inputs are important, that a negotiated solution (a torus) is achieved.

      The analyses of local dimensionality and local homology of Figure 3 are crucial steps to demonstrate toroidal topology. According to the theorem of classification of closed surfaces, global homology is not enough to univocally define the topology of a point cloud, and thus this step cannot be skipped. The step is aimed to prove that the point cloud is indeed a closed surface.

      (4) The modeling is all done in planar 2d environments, where the feedforward learning mechanism promotes the emergence of a hexagonal pattern in the single neuron tuning curve. This, combined with the fact that all neurons develop spatial patterns with the same spacing and orientation, implies even without any topological analysis that the emerging topology of the population activity is a torus.

      We cannot agree with this intuition. In the ‘no attractor’ condition, individual maps have hexagonal symmetry with standardized spacing, but given the lack of alignment the population activity is not a closed surface and thus not a torus. It can rather be described as a 2D sheet embedded in a high dimensional space, a description that also applies to the input space.

      While it is rather evident that an ad hoc toroidal architecture folds this 2D population activity into a torus, it is less evident and rather surprising that 1D architectures have the same capability. This is the main novelty in our work.

      (5) Moreover, the recent work of Gardner et al. demonstrated much more than the preservation of the topology in the different environments and in sleep: the toroidal tuning curves of individual neurons remained the same in different environments. Previous works, that analyzed pairwise correlations under hippocampal inactivation and various other manipulations, also pointed towards the same conclusion. Thus, the same population activity patterns are expressed in many different conditions. In the present model, the results of Figure 6 suggest that even across distinct rectangular environments, toroidal tuning curves will not be preserved, because there are multiple possible arrangements of the phases on the torus which emerge in different simulations.

      We agree with the reviewer in the main point, although the recently found ring activity in the absence of sensory feedback (Gonzalo Cogno et al, 2023) suggests that what is happening in the EC is more nuanced than a pre-wired torus. Solutions in Figure 6 are different ways of folding a 1D strip into a torus, with or without the condition of periodicity in the 1D strip. Whether or not these different solutions would be discernible from one another in a practical setup is not clear to us. For example, global homology, as addressed in the Gardner paper, is the same for all these solutions. Furthermore, while our solutions of up to order 3 are highly discernable, higher order solutions, potentially achievable with other network parameters, would be impossible to discern by eye in representations similar to the ones in Figure 6. In addition, while we chose to keep our model in the simplest possible form as a clear proof of principle, new elements introduced to the model such as head directionality could break the symmetry and lead to the prevalence of one preferred solution for all simulation replicates. We plan to investigate this possibility in the future when attempting to incorporate path-integration capabilities to the model.

      (6) In real grid cells, there is a dense and fairly uniform representation of all phases (see the toroidal tuning of grid cells measured by Gardner et al). Here the distribution of phases is not shown, but Figure 7 suggests that phases are non uniformly represented, with significant clustering around a few discrete phases. This, I believe, is also the origin for the difficulty in identifying the toroidal topology based on the transpose of the matrix M: vectors representing the spatial response patterns of individual neurons are localized near the clusters, and there are only a few of them that represent other phases. Therefore, there is no dense coverage of the toroidal manifold that would exist if all phases were represented equally. This is not just a technical issue, however: there appears to be a mismatch between the results of the model and the experimental reality, in terms of the phase coverage.

      As mentioned in the results section, Figure 7 is meant for visualization purposes only, and serves more as cautionary tale regarding the imprevisible risks of non-linear dimensionality reduction than as a proof of the organization of activity in the network. Isomap is a non-linear transformation that deforms each of our solutions in a unique way so that, while all have the topology of a torus embedded in a high dimensional space, only a few of them exhibited one of two possible toroidal visualizations in a 3D Isomap reduction. Isomap, as well as all other popular dimensionality reduction techniques, provide no guarantee of topology invariance. A better argument to judge the homogenous distribution of phases is persistent homology, which identifies relatively large holes (compared to the sampling spacing) in the original manifold embedded in a high dimensional space. In our case, persistent homology identified only two holes significantly larger than noise (the two cycles of a torus) and one cavity in all conditions that included attractors. Regarding the specific distribution of phases in different conditions, however, see our reply below.

      (7) The manuscript makes several strong claims that incorrectly represent the relation between experimental data and attractor models, on one hand, and the present model on the other hand. For the latter, see the comments above. For the former, I provide a detailed list in the recommendations to the authors, but in short: the paper claims that attractor models induce rigidness in the neural activity which is incompatible with distortions seen in the spatial response patterns of grid cells. However, this claim seems to confuse distortions in the spatial response pattern, which are fully compatible with the attractor model, with distortions in the population activity patterns, which would be incompatible with the attractor model. The attractor model has withstood numerous tests showing that the population activity manifold is rigidly preserved across conditions - a strong prediction (which is not made, as far as I can see, by feedforward models). I am not aware of any data set where distortions of the population activity manifold have been identified, and the preservation has been demonstrated in many examples where the spatial response pattern is disrupted. This is the main point of two papers cited in the present manuscript: by Yoon et al, and Gardner et al.

      First of all, we would like to note that our model is a continuous attractor model. Different attractor models have different outcomes, and one of the main conclusions of our manuscript is that attractors can do a wider range of operations than previously thought.

      We agree with the reviewer that distortions in spatial activity (which speak against a purely path-integration guided attractor) should not be confused with distortions in the topology of the population activity (which would instead speak against the attractor dynamics itself). We have rephrased these observations in the manuscript. In fact, we believe that the capacity of grid cells to present distorted maps without a distortion of the population activity topology, as shown for example by Gardner and colleagues, could result from a tension between feedforward and recurrent inputs, the potential equilibriums of which our manuscript aims to characterize.

      (8) There is also some weakness in the mathematical description of the dynamics. Mathematical equations are formulated in discrete time steps, without a clear interpretation in terms of biophysically relevant time scales. It appears that there are no terms in the dynamics associated with an intrinsic time scale of the neurons or the synapses, and this introduces a difficulty in interpreting synaptic weights as being weak or strong. As mentioned above, the nature of the recurrent dynamics within the grid cell network (whether it exhibits continuous attractor behavior) is not sufficiently clear.

      We agree with the reviewer that our model is rather simple, and we value the extent to which this simplicity allows for a deep characterization. All models are simplifications and the best model in any given setup is the one with the minimum amount of complexity necessary to describe the phenomenon under study. We believe that to understand whether or not a 1D continuous attractor architecture can result in a toroidal population activity, a biophysically detailed model, with prohibitive computational costs, would have been unnecessarily complex. This argument does not intend to demerit biophysically detailed models, which are capable of addressing a wider range of questions regarding, for example, the spiking dynamics of grid cells, which cannot be addressed by our simple model.

      Reviewer #3 (Recommendations For The Authors):

      The work points to an interesting scenario for the emergence of toroidal topology, but the interpretation of this idea should be more nuanced. I recommend reconsidering the claims about limitations of the attractor theory, and acknowledging the limitations of the present theory.

      I don't see the limitations mentioned above as a reason to reject the ideas proposed in this manuscript, for two main reasons: first, additional research might reveal a regime of parameters where some issues can be resolved (e.g. the clustering of phases). In addition, the mechanism described here might act at an early stage in development to set up initial dynamics along a toroidal manifold, while other mechanisms might be responsible for the rigidity of the toroidal manifold in an adult animal. But all this implies that the novelty in the present manuscript is weaker than implied, the ability to explain experimental observations is more limited than implied, and these limitations should be acknowledged and discussed.

      I recommend reporting on the distribution of grid cell phases and, if indeed clustered, this should be discussed. It will be helpful to explore whether this is the reason for the difficulty in identifying the toroidal topology based on the collection of spatial response patterns (using the transpose of the matrix M).

      Ideally, a more complete work would also explore in a more systematic and parametric way the influence of the recurrent connectivity's strength on the learning, and whether a toroidal manifold emerges also in non-planar, such as the wagon-wheel environment studied in Gardner et al.

      Part of these recommendations have been addressed in the previous points (public review). Regarding the reason why the transpose of M does not fully recapitulate architecture with our conservative classification criteria, we believe that there is no reason why it should in the first place. We view the fact that the transpose of M recapitulates some features of the architecture as a purely phenomenological observation, and we think it is important as a proof that M is not exactly the same for the different conditions. We imagined that if M matrices were exactly the same this could be due to poor spatial sampling by our bins. Knowing that they are intrinsically different is important even if the reason why they have these specific features is not fully clear to us.

      Although we do not think that the distribution of phases is related to the absence of a cavity in the transpose of M or to the four clusters found in Isomap projections, it remains an interesting question that we did not explore initially. We are now showing examples of the distribution of phases in Figure S1. We observed that in both 2D and 1D conditions phases are distributed following rather regular patterns. Whether or not these patterns are compatible with experimental observations of phase distribution is to our view debatable, given that so far state-of-the-art techniques have only allowed to simultaneously record a small fraction of the neurons belonging to a given module. This said, we think that it is important to note that ordered phase patterns are an anecdotal outcome of our simulations rather than a necessary outcome of flexible attractors or attractors in general. To prove this point, we simulated a condition with a new architecture represented by the overlap of 20 short 1DL attractors, each recruiting 10 random neurons from the pool of 100 available ones.

      The rest of the parameters of the simulations were identical to those in the other conditions.

      By definition, the topology of this architecture has Betti numbers [20,0,0]. We show in Figure S1 that this architecture aligns grid cells, with individual and population gridness reaching slightly lower levels compared to the 1D condition. However, the distribution of phases of these grid cells has no discernible pattern. This result is an arbitrary example that serves as a proof-of-principle to show that flexible attractors can align grid cells without exhibiting ordered phases, not a full characterization of the outcome of this type of architecture, which we leave for future work. For the rest of our work, we stick to the simplest versions of 1D architectures, which allow for a more in-depth characterization.

      The wagon-wheel is an interesting case in which maps loose hexagonal symmetry although the population activity lies in a torus, perhaps evidencing the tension between feedforward and recurrent inputs and suggesting that grid cell response does not obey the single master of path integration. If we modeled it with a 1D attractor, we believe the outcome would strongly depend on virtual rat trajectory. If the trajectory was strictly linear, the population activity would be locally one-dimensional and potentially represented by a ring. Instead, if the trajectory allowed for turns, i.e. a 2D trajectory within a corridor-like maze, the population activity would be toroidal as in our open field simulations, while maps would not have perfect hexagonal symmetry, mimicking experimental results.

      More minor comments:

      Recurrent dynamics are modeled as if there is no intrinsic synaptic or membrane time constant. This may be acceptable for addressing the goals of this paper, but it is a bit unusual and it will be helpful to explain and justify this choice.

      As mentioned above, we believe that the best model in a given setup is the one with the lowest number of complexities that can still address the phenomenon under study. One does not use general relativity to build a bridge, although it provides a ‘more accurate’ description of the physics involved. All models are simplifications, and the more complex a model, the more it has to be taken as a black box.

      The Introduction mentions that in most models interaction between co-modular neurons occurs through direct excitatory communication, but in quite a few models the interaction is inhibitory. The crucial feature is that the interaction is strongly inhibitory between neurons that differ in their tuning, and either less inhibitory or excitatory between neurons with similar phases.

      We agree that directed inhibition has been shown to be as efficient as directed excitation, and we have modified the introduction to reflect this.

      The Discussion claims that the present work is the first one in which the topology of the recurrent architecture differs from the topology of the emergent state space. However, early works on attractor models of grid cells showed how neural connectivity which is arranged on a 2d plane, without any periodic boundary conditions, leads to a state space that exhibits the toroidal topology. Therefore, this claim should be revised.

      We agree, although the 2D sheet in this case acts as a piece of the torus, and locally the input space and architecture are identical objects. It could be argued that architectures that represent a 2D local slice of the torus, the whole torus, or several cycles around the torus form a continuous family parametrized by the extension of recurrent connections, and as a consequence it is not surprising that these works have not made claims about the incongruence between architecture and representation topologies. The 2D sheet connectivity is still constructed ad hoc to organize activity in a 2D bump, and there is no negotiation between disparate constraints because locally the constraints imposed by input and architecture are the same. We believe this situation is conceptually different from our flexible 1D attractors. We have adapted our claim to include this technical nuance.

      Why are neural responses in the perimeter of the environment excluded from the topological analysis? The whole point of the toroidal manifold analysis on real experimental data is that the toroidal manifold is preserved regardless of the animal's location and behavioral condition.

      We agree, although experimental data needs to go through extensive pre-processing such as dimensionality reduction before showing a toroidal topology. Such manipulations might smooth away the specific effects of boundaries on maps, together with other sources of noise. In our case, the original reason to downsample the dataset is related to the explosion in computational time that we experience with the ripser package when using more than ~1000 data points. For a proof-of-principle characterization we were much more interested in what happened in the center of the arena, where a 1D attractor could fold itself to confine population activity into a torus. The area we chose was sufficiently large to contain the whole torus. Borders do affect the way the attractor folds (they also affect grid maps in real rats). We feel that these imperfections could be interesting to study in relation to the parameters controlling how our virtual rat behaves at the borders, but not at this proof-of-principle stage.

      The periodic activity observed in Ref. 29 could in principle provide the basis for the ring arrangement of neurons. However, it is not yet clear whether grid cells participate in this periodic activity.

      We agree. So far it seems that entorhinal cells in general participate in the ring, which would imply that all kinds of cells are involved. However, it could well be that only some functional types participate in the ring and grid cells specifically do not, as future experiments will tell.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work explores death coding data to understand the impact of COVID-19 on cancer mortality. The work provides solid evidence that deaths with cancer as a contributing cause were not above what would be expected during pandemic waves, suggesting that cancer did not strongly increase the risk of dying of COVID-19. These results are an interesting exploration into the coding of causes of death that can be used to make sense of how deaths are coded during a pandemic in the presence of other underlying diseases, such as cancer.

      We thank the editor and reviewers for the time they took to review our manuscript and for the thoughtful suggestions they provided. We have completed several revisions based on their feedback and we feel our paper is stronger as a result. However, none of these revisions change the overall conclusions of our study.

      Reviewer #1 (Public Review):

      Summary:

      In the paper "Disentangling the relationship between cancer mortality and COVID-19", the authors study whether the number of deaths in cancer patients in the USA went up or down during the first year (2020) of the COVID-19 pandemic. They found that the number of deaths with cancer mentioned on the death certificate went up, but only moderately. In fact, the excess with-cancer mortality was smaller than expected if cancer had no influence on the COVID mortality rate and all cancer patients got COVID with the same frequency as in the general population. The authors conclude that the data show no evidence of cancer being a risk factor for COVID and that the cancer patients were likely actively shielding themselves from COVID infections.

      Strengths:

      The paper studies an important topic and uses sound statistical and modeling methodology. It analyzes both, deaths with cancer listed as the primary cause of death, as well as deaths with cancer listed as one of the contributing causes. The authors argue, correctly, that the latter is a more important and reliable indicator to study relationships between cancer and COVID. The authors supplement their US-wide analysis by analysing three states separately.

      Weaknesses:

      The main findings of the paper can be summarized as six numbers. Nationally, in 2022, multiple-cause cancer deaths went up by 2%, Alzheimer's deaths by 31%, and diabetes deaths by 39%. At the same time, assuming no relationship between these diseases and either Covid infection risk or Covid mortality risk, the deaths should have gone up by 7%, 46%, and 28%. The authors focus on cancer deaths and as 2% < 7%, conclude that cancer is not a risk factor for COVID and that cancer patients must have "shielded" themselves against Covid infections.

      However, I did not find any discussion of the other two diseases. For diabetes, the observed excess was 39% instead of "predicted by the null model" 28%. I assume this should be interpreted as diabetes being a risk factor for Covid deaths. I think this should be spelled out, and also compared to existing estimates of increased Covid IFR associated with diabetes.

      And what about Alzheimer's? Why was the observed excess 31% vs the predicted 46%? Is this also a shielding effect? Does the spring wave in NY provide some evidence here? Why/how would Alzheimer's patients be shielded? In any case, this needs to be discussed and currently, it is not.

      We thank the reviewer for their positive feedback on the paper and for these suggestions. It is true that we have emphasized the impact on cancer deaths, as this was the primary aim of the paper. In the revised version, we have expanded the results and discussion sections to more fully describe the other chronic conditions we used as comparators (lines 267-284;346 – 386).

      Note that we are somewhat reluctant to designate any of these conditions as risk factors based solely on comparing the time series model with the demographic model of our expectations. As we mention in the discussion, there is considerable uncertainty around estimates from the demographic model in terms of the size of the population-at-risk, the mean age of the population-at-risk, and the COVID-19 infection rates and infection fatality ratios. Our demographic model is primarily used to demonstrate the effects of competing risks across types of cancers and chronic conditions, since these findings are robust to model assumptions. In contrast, the demographic model should be used with caution if the goal is to titrate the level of these risk factors (as the level of imputed risk is dependent on model assumptions). In the updated version of the manuscript, we have included uncertainty intervals in Table 3, using the upper and lower bounds of the estimated infection rates and IFRs, to better represent this uncertainty. We have also discussed this uncertainty more explicitly in the text and ran sensitivity analyses with different infection rate assumptions in the discussion (lines 354-362; 367 -370).

      We would like to note that rather than interpreting the absolute results, we used this demographic model as a tool to understand the relative differences between these conditions. From the demographic model we determined that we would expect to see much higher mortality in diabetes and Alzheimer’s deaths compared to cancer deaths due to three factors (1. Size of population-at-risk, 2. Mean age of the population-at-risk, 3. Baseline risk of mortality from the condition), that are separate from the COVID-19 associated IFR. And in general, this is what we observed.

      In comparing the results from the demographic model to the observed excess, diabetes does standout as an outlier from cancer and Alzheimer’s disease in that the observed excess is consistently above the null hypothesis which does lend support to the conclusion that diabetes is in fact a risk factor for COVID-19. A conclusion which is also supported by many other studies. Our findings for hematological cancers are also similar, in that we find consistent support for this condition being a risk factor. We have commented on this in the discussion and added a few references (lines 346-354; 395-403).

      Our hypothesis regarding non-hematological cancer deaths (lower than anticipated mortality due to shielding) could also apply to Alzheimer’s deaths. Furthermore, we used the COVID-19 attack rate for individuals >65 years (based on the data that is available), but we estimate that the mean age of Alzheimer’s patients is actually 80-81 years, so this attack rate may in fact be a bit too high, which would increase our expected excess. We have commented on this in the discussion (lines 363-377).

      Reviewer #2 (Public Review):

      The article is very well written, and the approach is quite novel. I have two major methodological comments, that if addressed will add to the robustness of the results.

      (1) Model for estimating expected mortality. There is a large literature using a different model to predict expected mortality during the pandemic. Different models come with different caveats, see the example of the WHO estimates in Germany and the performance of splines (Msemburi et al Nature 2023 and Ferenci BMC Medical Research Methodology 2023). In addition, it is a common practice to include covariates to help the predictions (e.g., temperature and national holidays, see Kontis et al Nature Medicine 2020). Last, fitting the model-independent for each region, neglects potential correlation patterns in the neighbouring regions, see Blangiardo et al 2020 PlosONE.

      Thank you for these comments and suggestions. We agree there are a range of methods that can be used for this type of analysis, and they all come with their strengths, weaknesses, and caveats. Broadly, the approach we chose was to fit the data before the pandemic (2014-2019), and project forward into 2020. To our knowledge it is not a best practice to use an interpolating spline function to extrapolate to future years. This is demonstrated by the WHO estimates in Germany in the paper you mention. This was our motivation for using polynomial and harmonic terms.

      Based on the above:

      a. I believe that the authors need to run a cross-validation to justify model performance. I would suggest training the data leaving out the last year for which they have mortality and assessing how the model predicts forward. Important metrics for the prediction performance include mean square error and coverage probability, see Konstantinoudis et al Nature Communications 2023. The authors need to provide metrics for all regions and health outcomes.

      Thank you for this suggestion. We agree that our paper could be strengthened by including cross validation metrics to justify model performance. Based on this suggestion, and your observations regarding Alzheimer’s disease, we have done two things. First, for the full pre-pandemic period (2014-2019) for each chronic condition and location we tested three different models with different degree polynomials (1. linear only, 2. linear + second degree polynomial, 3. linear + second degree polynomial + third degree polynomial) and used AIC to select the best model for each condition and location. Next, also in response to your suggestion, we estimated coverage statistics. Using the best fit model from the previous step, we then fit the model to data from 2014-2018 only and used the model to predict the 2019 data. We calculated the coverage probability as the proportion of weekly observed data points that fell within the 95% prediction interval. For all causes of death and locations the coverage probability was 100% (with the exception of multiple cause kidney disease in California, which is only shown in the appendix). The methods and results have been updated to reflect this change and we have added a figure to the appendix showing the selected model and coverage probability for each cause of death and location (lines 504 – 519; 847-859; Appendix 1- Figure 11).

      b. In the context of validating the estimates, I think the authors need to carefully address the Alzheimer case, see Figure 2. It seems that the long-term trends pick an inverse U-shape relationship which could be an overfit. In general, polynomials tend to overfit (in this case the authors use a polynomial of second degree).It would be interesting to see how the results change if they also include a cubic term in a sensitivity analysis.

      Thank you for this observation. Based on the changes described above, the model for Alzheimer’s disease now includes a cubic term in the national data and in Texas and California. The model with the second-degree polynomial remained the best fit for New York (Appendix 1 – Figure 11).

      c. The authors can help with the predictions using temperature and national holidays, but if they show in the cross-validation that the model performs adequately, this would be fine.

      At the scale of the US, adding temperature or environmental covariates is difficult and few US-wide models do so (see Goldstein 2012 and Quandelacy 2014 for examples from influenza). Furthermore, because we are looking at chronic disease outcomes, it is unclear that viral covariates or national holidays would drive these outcomes in the same way as they would if we were looking at mortality outcomes more directly related to transmissible diseases (such as respiratory mortality). Our cross validation also indicates that our models fit well without these additional covariates.

      d. It would be nice to see a model across the US, accounting for geography and spatial correlation. If the authors don't want to fit conditional autoregressive models in the Bayesian framework, they could just use a random intercept per region.

      We think the reviewer is mistaken here about the scale of our national analysis. Our national analysis did not fit independent models for each state or region. Rather, we fit a single model to the weekly-level national mortality data where counts for the whole of the US have been aggregated. We have clarified in the text (lines 156, 464). As such, we do not feel a model accounting for spatial correlation would be appropriate nor would we be able to include a random intercept for each region. We did fit three states independently (NY, TX, CA), but these states are very geographically distant from each other and unlikely to be correlated. These states were chosen in part because of their large population sizes, yet even in these states, confidence intervals were very wide for certain causes of death. Fitting models to each of the 50 US states, most of which are smaller than those chosen here, would exacerbate this issue.

      (2) I think the demographic model needs further elaboration. It would be nice to show more details, the mathematical formula of this model in the supplement, and explain the assumptions

      Thank you for this comment. We have added additional details on the demographic model to the methods. We have also extended this analysis to each state to further strengthen our conclusions (lines 548-590).

      Reviewing Editor Recommendations:

      I think that perhaps something that is missing is that the authors never make their underlying assumption explicit: they are assuming that if cancer increases the risk of dying of COVID-19, this would be reflected in the data on multiple causes of death where cancer would be listed as one of the multiple causes rather than as the underlying cause, and that their conclusions are predicated on this assumption. I would suggest explicitly stating this assumption, as opposed to other reasons why cancer mortality would increase (ex. if cancer care worsened during pandemic waves leading to poorer cancer survival).

      Response: Thank you for this suggestion. We have added a few sentences to the introduction to make this assumption clear (lines 106-112).

      Reviewer #1 (Recommendations For The Authors):

      - It could make sense to add "in the United States" into the title, as the paper only analyses US data.

      - It may make sense to reformulate the title from "disentangling the relationship..." into something that conveys the actual findings, e.g. "Lack of excess cancer mortality during Covid-19 pandemic" or something similar. Currently, the title tells nothing about the findings.

      Thank you for these suggestions. We have added “in the US” to the title. However, we feel that our findings are a bit more subtle than the suggested reformulation would imply, and we prefer to leave it in its current form.

      - Abstract, lines 42--45: This is the main finding of the paper, but I feel it is simplified too strongly in the abstract. Your simulations do *not* "largely explain" excess mortality with cancer; they give higher numbers! Which you interpret as "shielding" etc., but this is completely absent from the abstract. This sentence makes the impression that you got a good fit between simulated excess and real excess, which I would say is not the case.

      Thank you for this comment. We have rephrased the sentence in the abstract to better reflect our intentions for using the demographic model (lines 46-49). As stated above, the purpose of the demographic model was not to give a good fit with the observed excess mortality. Rather, we used the demographic model as a tool to understand the relative differences between these conditions in terms of expected excess mortality given the size, age-distribution, and underlying risk of death from the condition itself, assuming similar IFR and attack rates. And based on this, we conclude that it is not necessarily surprising that we see higher excess mortality for diabetes and Alzheimer’s compared to cancer.

      - Results line 237: you write that it's "more consistent with the null hypothesis", however clearly it is *not* consistent with the null hypothesis either (because 2% < 7%). You discuss in the Discussion that it may be due to shielding, but it would be good to have at least one sentence about it already here in the Results, and refer to the Discussion.

      We have mentioned this in the results and refer to the discussion (lines 277-278).

      - Results line 239: why was it closer to the assumption of relative risk 2? If I understand correctly, your model prediction for risk=1 was 7% and for risk=2 it was 13%. In NY you observed 8% (line 187). How is this closer to risk=2?

      Thank you for this observation. We have updated the demographic model with new data, extended the model to state-level data, and included confidence intervals on these estimates. We have also added additional discussion around the differences between our observations and expectations (lines 249-284).

      - Discussion line 275: "we did not expect to see large increases" -- why exactly? Please spell it out here. Was it due to the age distribution of the cancer patients? Was it due to the high cancer death risk?

      We demonstrate that it is the higher baseline risk of death for cancer that seems to be driving our low expectations for cancer excess mortality (lines 304-320). We have added this to the sentence to clarify our conclusions on this point and have added a figure to better illustrate this concept of competing risks (Figure 6).

      - Methods, line 405: perhaps it makes sense to cite some other notable papers on Covid excess mortality such as Msemburi et al Nature 2023, Karlinsky & Kobak eLife 2021, Islam et al BMJ 2021, etc.

      Thank you for mentioning this oversight. We certainly should have cited these papers and have included them in the updated version.

      - Methods line 410: why did you use a 5-week moving average? Why not fit raw weekly death counts? NB regression should be able to deal with it.

      Smoothing time series data with a moving average prior to running regression models is a very common practice. We did a sensitivity analysis using the raw data. This produced excess estimates with slightly larger confidence intervals, but does not change the overall conclusions of the paper.

      - Methods line 416: please indicate the software/library/package you used for fitting NB regression.

      We fit the NB regression using the MASS package in R version 4.3. We have added this to the methods (line 519).

      - Line 489: ORCHID -> ORCID

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      Codol et al. present a toolbox that allows simulating biomechanically realistic effectors and training Artificial Neural Networks (ANNs) to control them. The paper provides a detailed explanation of how the toolbox is structured and several examples that demonstrate its usefulness.

      Main comments:

      (1) The paper is well written and easy to follow. The schematics help in understanding how the toolbox works and the examples provide an idea of the results that the user can obtain.

      We thank the reviewer for this comment.

      (2) As I understand it, the main purpose of the paper should be to facilitate the usage of the toolbox. For this reason, I have missed a more explicit link to the actual code. As I see it, researchers will read this paper to figure out whether they can use MotorNet to simulate their experiments, and how they should proceed if they decide to use it. I'd say the paper provides an answer to the first question and assures that the toolbox is very easy to install and use. Maybe the authors could support this claim by adding "snippets" of code that show the key steps in building an actual example.

      This is an important point, which we also considered when writing this paper. We instead decided to focus on the first approach, because it is easier to illustrate the scientific use of the toolbox using code or interactive (Jupyter) notebooks than a publication format. We find the “how to proceed” aspect of the toolbox can more easily and comprehensively be covered using online, interactive tutorials. Additionally, this allows us to update these tutorials as the toolbox evolves over different versions, while it is more difficult to update a scientific article. Consequently, we explicitly avoided code snippets on the article itself. However, we appreciate that the paper would gain in clarity if this was more explicitly stated early. We have modified the paper to include a pointer to where to find tutorials online. We added this at the last paragraph of the introduction section:

      The interested reader may consult the full API documentation, including interactive tutorials on the toolbox website at https://motornet.org.

      (3) The results provided in Figures 1, 4, 5 and 6 are useful, because they provide examples of the type of things one can do with the toolbox. I have a few comments that might help improving them:

      a. The examples in Figures 1 and 5 seem a bit redundant (same effector, similar task). Maybe the authors could show an example with a different effector or task? (see point 4).

      The effectors from figures 1 and 5 are indeed very similar. However, the tasks in figure 1 and 5 present some important differences. The training procedure in figure 1 never includes any perturbations, while the one from figure 5 includes a wide range of perturbations of different magnitudes, timing and directions. The evaluation procedure of figure 1 includes center-out reaches with permanent viscous (proportional to velocity) external dynamics, while that of figure 5 are fixed, transient, square-shaped perturbation orthogonal to the reach direction. Finally, the networks in figure 1 undergo a second training procedure after evaluation while the network of figure 5 do not.

      While we agree that some variation of effectors would be beneficial, we do show examples of a point-mass effector in figure 6. Overall, figure 5 shows a task that is quite different from that of figure 1 with a similar effector, while the opposite is true for figure 6. We have modified the text to clarify this for the reader, by adding the following.

      End of 1st paragraph, section 2.4.

      Therefore, the training protocol used for this task largely differed from section 2.1 in that the networks are exposed to a wide range of mechanical perturbations with varying characteristics.

      1st paragraph of section 2.5

      […] this asymmetrical representation of PMDs during reaching movements did not occur when RNNs were trained to control an effector that lacked the geometrical properties of an arm such as illustrated in Figure 4c-e and section 2.1.

      b. I missed a discussion on the relevance of the results shown in Figure 4. The moment arms are barely mentioned outside section 2.3. Are these results new? How can they help with motor control research?

      We thank the reviewer for this comment. This relates to a point from reviewer 2 indicating that the purpose of each section was sometimes difficult to grasp as one reads. Section 2.3 explains the biomechanical properties that the toolbox implements to improve realism of the effector. They are not new results in the sense that other toolboxes implement these features (though not in differentiable formats) and these properties of biological muscles are empirically well-established. However, they are important to understand what the toolbox provides, and consequently what constraints networks must accommodate to learn efficient control policies. An example of this is the results in figure 6, where a simple effector versus a more biomechanically complex effector will yield different neural representations.

      Regarding the manuscript itself, we agree that more clarity on the goal of every paragraph may improve the reader’s experience. Consequently, we ensured to specify such goals at the start of each section. Particularly, we clarify the purpose of section 2.3 by adding several sentences on this at the end of the first paragraph in that section. We also now clearly state the purpose of section 2.3 with the results of figure 6 and reference figure 4 in that section.

      c. The results in Figure 6 are important, since one key asset of ANNs is that they provide access to the activity of the whole population of units that produces a given behavior. For this reason, I think it would be interesting to show the actual "empirical observations" that the results shown in Fig. 6 are replicating, hence allowing a direct comparison between the results obtained for biological and simulated neurons.

      These empirical observations are available from previous electrophysiological and modelling work. Particularly, polar histograms across reaching directions like panel C are displayed in figures 2 and 3 of Scott, Gribble, Graham, Cabel (2001, Nature). Colormaps of modelled unit activity across time and reaching directions like panel F are also displayed in figure 2 of Lillicrap, Scott (2013, Neuron). Electrophysiological recordings of M1 neurons during a similar task in non-human primates can also be seen on “Preserved neural population dynamics across animals performing similar behaviour” figure 2 B (https://doi.org/10.1101/2022.09.26.509498) and “Nonlinear manifolds underlie neural population activity during behaviour” figure 2 B as well (https://doi.org/10.1101/2023.07.18.549575). Note that these two pre-prints use the same dataset.

      We have added these citations to the text and made it explicit that they contain visualizations of similar modelling and empirical data for comparison:

      This heterogeneous set of responses matches empirical observations in non-human primate primary motor cortex recordings (Churchland & Shenoy, 2007; Michaels et al., 2016) and replicate similar visualizations from previously published work (Fortunato et al., 2023; Lillicrap & Scott, 2013; Safaie et al., 2023).

      (4) All examples in the paper use the arm26 plant as effector. Although the authors say that "users can easily declare their own custom-made effector and task objects if desired by subclassing the base Plant and Task class, respectively", this does not sound straightforward. Table 1 does not really clarify how to do it. Maybe an example that shows the actual code (see point 2) that creates a new plant (e.g. the 3-joint arm in Figure 7) would be useful.

      Subclassing is a Python process more than a MotorNet process, as python is an object-oriented language. Therefore, there are many Python tutorials on subclassing in the general sense that would be beneficial for that purpose. We have amended the main text to ensure that this is clearer to the reader.

      Subclassing a MotorNet object, in a more specific sense, requires overwriting some methods from the base MotorNet classes (e.g., Effector or Environment classes, which correspond to the original Plant and Task object, respectively). Since we made the decision (mentioned above) to not include code in the main text, we added tutorials to the online documentation, which include dedicated tutorials for MotorNet class subclassing. For instance, this tutorial showcases how to subclass Environment classes:

      https://colab.research.google.com/github/OlivierCodol/MotorNet/blob/master/examples/3-environments.ipynb

      (5) One potential limitation of the toolbox is that it is based on Tensorflow, when the field of Computational Neuroscience seems to be, or at least that's my impression, transitioning to pyTorch. How easy would it be to translate MotorNet to pyTorch? Maybe the authors could comment on this in the discussion.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which are reflected by changes made throughout the manuscript, notably in Figure 3 and Table 1.

      (6) Supervised learning (SL) is widely used in Systems Neuroscience, especially because it is faster than reinforcement learning (RL). Thus providing the possibility of training the ANNs with SL is an important asset of the toolbox. However, SL is not always ideal, especially when the optimal strategy is not known or when there are different alternative strategies and we want to know which is the one preferred by the subject. For instance, would it be possible to implement a setup in which the ANN has to choose between 2 different paths to reach a target? (e.g. Kaufman et al. 2015 eLife). In such a scenario, RL seems to be a more natural option Would it be easy to extend MotorNet so it allows training with RL? Maybe the authors could comment on this in the discussion.

      The new implementation of MotorNet that relies on PyTorch is already standardized to use an API that is compatible with Gymnasium. Gymnasium is a standard and popular interfacing toolbox used to link RL agents to environments. It is very well-documented and widely used, which will ensure that users who wish to employ RL to control MotorNet environments will be able to do so relatively effortlessly. We have added this point to accurately reflect the updated implementation, so users are aware that it is now a feature of the toolbox (new section 3.2.4.).

      Impact:

      MotorNet aims at simplifying the process of simulating complex experimental setups to rapidly test hypotheses about how the brain produces a specific movement. By providing an end-to-end pipeline to train ANNs on the simulated setup, it can greatly help guide experimenters to decide where to focus their experimental efforts.

      Additional context:

      Being the main result a toolbox, the paper is complemented by a GitHub repository and a documentation webpage. Both the repository and the webpage are well organized and easy to navigate. The webpage walks the user through the installation of the toolbox and the building of the effectors and the ANNs.

      Reviewer #2 (Public Review):

      MotorNet aims to provide a unified interface where the trained RNN controller exists within the same TensorFlow environment as the end effectors being controlled. This architecture provides a much simpler interface for the researcher to develop and iterate through computational hypotheses. In addition, the authors have built a set of biomechanically realistic end effectors (e.g., an 2 joint arm model with realistic muscles) within TensorFlow that are fully differentiable.

      MotorNet will prove a highly useful starting point for researchers interested in exploring the challenges of controlling movement with realistic muscle and joint dynamics. The architecture features a conveniently modular design and the inclusion of simpler arm models provides an approachable learning curve. Other state-of-the-art simulation engines offer realistic models of muscles and multi-joint arms and afford more complex object manipulation and contact dynamics than MotorNet. However, MotorNet's approach allows for direct optimization of the controller network via gradient descent rather than reinforcement learning, which is a compromise currently required when other simulation engines (as these engines' code cannot be differentiated through).

      The paper could be reorganized to provide clearer signposts as to what role each section plays (e.g., that the explanation of the moment arms of different joint models serves to illustrate the complexity of realistic biomechanics, rather than a novel discovery/exposition of this manuscript). Also, if possible, it would be valuable if the authors could provide more insight into whether gradient descent finds qualitatively different solutions to RL or other non gradient-based methods. This would strengthen the argument that a fully differentiable plant is useful beyond improving training time / computational power required (although this is a sufficiently important rationale per se).

      We thank the reviewer for these comments. We agree that more clarity on the section goals may improve the reader’s experience and ensured this is the case throughout the manuscript. Particularly, we added the following on the first paragraph of section 2.3, for which an explicit goal was most missing:

      In this section we illustrate some of these biomechanical properties displayed by MotorNet effectors using specific examples. These properties are well-characterised in the biology and are often implemented in realistic biomechanical simulation software.

      Regarding the potential difference in solutions obtained from reinforcement or supervised learning, this would represent a non-trivial amount of work to do so conclusively and so may not be within the scope of the current article. We do appreciate however that in some situations RL may be a more fitting approach to a given task design. In relation to this point we now specify in the discussion that the new API can accommodate interfacing with reinforcement learning toolboxes for those who may want to pursue this type of policy training approach when appropriate (new section 3.2.4.).

      Reviewer #3 (Public Review):

      Artificial neural networks have developed into a new research tool across various disciplines of neuroscience. However, specifically for studying neural control of movement it was extremely difficult to train those models, as they require not only simulating the neural network, but also the body parts one is interested in studying. The authors provide a solution to this problem which is built upon one of the main software packages used for deep learning (Tensorflow). This allows them to make use of state-of-the-art tools for training neural networks.

      They show that their toolbox is able to (re-)produce several commonly studied experiments e.g., planar reaching with and without loads. The toolbox is described in sufficient detail to get an overview of the functionality and the current state of what can be done with it. Although the authors state that only a few lines of code can reproduce such an experiment, they unfortunately don't provide any source code to reproduce their results (nor is it given in the respective repository).

      The possibility of adding code snippets to the article is something we originally considered, and which aligns with comment two from reviewer one (see above). Hopefully this provides a good overview of the motivation behind our choice not to add code to the article.

      The modularity of the presented toolbox makes it easy to exchange or modify single parts of an experiment e.g., the task or the neural network used as a controller. Together with the open-source nature of the toolbox, this will facilitate sharing and reproducibility across research labs.

      I can see how this paper can enable a whole set of new studies on neural control of movement and accelerate the turnover time for new ideas or hypotheses, as stated in the first paragraph of the Discussion section. Having such a low effort to run computational experiments will be definitely beneficial for the field of neural control of movement.

      We thank the reviewer for these comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The main goal of the authors was to study the testis-specific role of the protein FBXO24 in the formation and function of the ribonucleoprotein granules (membraneless electron-dense structures rich in RNAs and proteins).

      We appreciate the summary comment of reviewer #1.

      Strengths:

      The wide variety of methods used to support their conclusions (including transgenic models)

      We appreciate the positive comment of reviewer #1.

      Weaknesses:

      The lack of specific antibodies against FBXO24. Some of the experiments showing a specific phenotype are descriptive and lack of logical explanation about the possible mechanism (i.e. AR or the tail structure).

      Because we could not obtain specific antibodies against FBXO24, we generated Fbxo24-FLAG transgenic mice, which can be used to show the interaction between FBXO24 and IPO5. For the mechanism of impaired acrosome reaction, we added some results and discussion as written in the response to the question (1) of reviewer #1 (public review). For the mechanism of abnormal flagellar structure, we added new results and fixed the manuscript as written in the response to the major comments of reviewer #3 (recommendations for the authors).

      Questions:

      The paper is excellent and employs a wide variety of methods to substantiate the conclusions. I have very few questions to ask:

      (1) KO mice cannot undergo acrosome reaction (AR) even spontaneously. How do you account for this, given that no visible defects were observed in the acrosome?

      One possibility is that Fbxo24 KO spermatozoa cannot undergo capacitation; however, it is difficult to analyze the capacitation status such as tyrosine phosphorylation because most Fbxo24 KO spermatozoa are not alive (Figure S3A). Other possibility is that AR-related proteins are affected in Fbxo24 KO spermatozoa. Therefore, we analyzed the amounts of AR-related proteins with mass spectrometry (Figure S3C). Although previous studies indicate that the assembly of the SNARE complex is a key event prior to AR [Hutt et al., 2005 (PMID: 15774481); Katafuchi et al., 2000 (PMID: 11066067); Schulz et al., 1997 (PMID: 9356173); Tomes et al., 2002 (PMID: 11884041)], no clear differences were detected for SNARE proteins (Figure S3C and D). PLCD4 that is important for AR [Fukami et al., 2001 (PMID: 11340203)) was also detected in Fbxo24 KO spermatozoa (Figure S3C). Although we could not find differences in the amounts of AR-related proteins, it is still possible that FER1L5, another AR-related protein [Morohoshi et al., 2023 (PMID: 36696506)] not detected in the mass spectrometry analyses, or AR-related proteins not yet identified are affected in Fbxo24 KO spermatozoa. We added these results and discussion (line 160-166 and 305-312).

      (2) KO sperm are unable to migrate in the female tract, and, more intriguingly, they do not pass through the utero-tubal junction (UTJ). The levels of ADAM3 are normal, suggesting that the phenotype is influenced by other factors. The authors should investigate the levels of Ly6K since mice also exhibit the same phenotype but with normal levels of ADAM3.

      We detected LY6K in Fbxo24 KO spermatozoa with immunoblotting, but no difference was found.

      We added the results (Figure S3E and line 172–175).

      (3) In Figure 4A, the authors assert that "RBGS Tg mice revealed that mitochondria were abnormally segmented in Fbxo24 KO spermatozoa." I am unable to discern this from the picture shown in that panel. Could you please provide a more detailed explanation or display the information more explicitly?

      We are sorry for the ambiguous explanation on the morphology of sperm mitochondria sheath. Fbxo24 KO cauda epidydimal spermatozoa shows disorganized mitochondria sheath rather than “segmented”. We fixed the sentence (line 190-192) and added white arrowheads that indicate the disorganized regions (Figure 4A).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kaneda et al "FBXO24 ensures male fertility by preventing abnormal accumulation of membraneless granules in sperm flagella" is a significant paper on the role of FBXO24 in murine male germ cell development and sperm ultrastructure and function. The body of experimental evidence that the authors present is extraordinarily strong in both breadth and depth. The authors investigate the protein's functions in male germ cells and sperm using a wide variety of approaches but focusing predominantly on their novel mouse model featuring deletion of the Fbxo24 gene and its product. Using this mouse, and a cross of it with another model that expresses reporters in the head and midpiece, they logically build from one experiment to the next. Together, their data show that this protein is involved in the regulation of membraneless electron-dense structures; loss of FBXO24 led to an accumulation of these materials and defects in the sperm flagellum and fertilizing ability. Interestingly, the authors found that several of the best-known components of electron-dense ribonucleoprotein granules that are found in the intermitochondrial cement and chromatoid body were not disrupted in the Fbxo24 knockout, suggesting that the electron-dense material and these structures are not all the same, and the biology is more complicated than some might have thought. They found evidence for the most changes in IPO5 and KPNB1, and biochemical evidence that FBXO24 and IPO5 could interact.

      We appreciate the summary comment of reviewer #2.

      Strengths:

      The authors are to be commended for the thoroughness of their experimental approaches and the extent to which they investigated impacts on sperm function and potential biochemical mechanisms. Very briefly, they start by showing that the Fbxo24 message is present in spermatids and that the protein can interact with SKP1, in a way that is dependent on its F-box domain. This points toward a potential function in protein degradation. To test this, they next made the knockout mouse, validated it, and found the males to be sterile, although capable of plugging a female. Looking at the sperm, they identified a number of ultrastructural and morphological abnormalities, which they looked at in high resolution using TEM. They also cross their model with RBGS mice so that they have reporters in both the acrosome and mitochondria. The authors test a variety of sperm functions, including motility parameters, ability to fertilize by IVF, cumulus-free IVF, zona-free-IVF, and ICSI. They found that ICSI could rescue the knockout but not other assisted reproductive technologies. Defects in male fertility likely resulted from motility disruption and failure to get through the utero-tubal junction but defects in acrosome exocytosis also were noted. The authors performed thorough investigations including both targeted and unbiased approaches such as mass spectrometry. These enabled them to show that although the loss of the FBXO24 protein led to more RNA and elevated levels of some proteins, it did not change others that were previously identified in the electron-dense RNP material.

      The manuscript will be highly significant in the field because the exact functions of the electron-dense RNP materials have remained somewhat elusive for decades. Much progress has been made in the past 15 years but this work shows that the situation is more complex than previously recognized. The results show critical impacts of protein degradation in the differentiation process that enables sperm to change from non-descript round cells into highly polarized and compartmentalized mature sperm, with an equally highly compartmentalized flagellum. This manuscript also sets a high bar for the field in terms of how thorough it is, which reveals wide-ranging impacts on processes such as mitochondrial compaction and arrangement in the midpiece, the correct building of the major cytoskeletal elements in the flagellum, etc.

      We appreciate the positive comment of reviewer #2.

      Weaknesses:

      There are no real weaknesses in the manuscript that result from anything in the control of the authors. They attempted to rescue the knockout by expressing a FLAG-tagged Fbxo24 transgene, but that did not rescue the phenotype, either because of inappropriate levels/timing/location of expression, or because of interference by the tag. They also could not make anti-FBXO24 that worked for coimmunoprecipitation experiments, so relied on the FLAG epitope, an approach that successfully showed co-IP with IPO5 and SKP1.

      We could not rescue the phenotype with Fbxo24-FLAG transgene, but different Fbxo24 mutant mice show the same phenotypes (Figure S6G). Further, another group showed that Fbxo24 KO mice exhibited abnormal mitochondrial coiling [Li et al., 2024 (PMID: 38470475)], confirming that

      FBXO24 is involved in the mitochondrial sheath formation.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors found that FBXO24, a testis-enriched F-box protein, is indispensable for male fertility. Fbxo24 KO mice exhibited malformed sperm flagellar and compromised sperm motility.

      We appreciate the summary comment of reviewer #3.

      Strengths:

      The phenotype of Fbxo24 KO spermatozoa was well analyzed.

      We appreciate the positive comment of reviewer #3.

      Weaknesses:

      The authors observed numerous membraneless electron-dense granules in the Fbxo24 KO spermatozoa. They also showed abnormal accumulation of two importins, IPO5 and KPNB1, in the Fbxo24 KO spermatozoa. However, the data presented in the manuscript do not support the conclusion that FBXO24 ensures male fertility by preventing the abnormal accumulation of membraneless granules in sperm flagella, as indicated in the manuscript title.

      Fbxo24 KO mice showed abnormal accumulation of membraneless granules in sperm flagella and male infertility, suggesting that FBXO24 is involved in these processes, but there are no results that show the direct relationship as reviewer #3 mentioned. Therefore, we fixed the title.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      On page 4, lines 152-154, the authors introduce the RBGS mouse model and use it in their experiments.

      However, they left out an obvious but helpful sentence that tells the reader that they crossed the Fbxo24-null mouse with the RBGS. As one continues reading it is clear, but best to avoid even slight confusion.

      We revised the explanation in the result section (line 150-153).

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, the authors found that FBXO24, a testis-enriched F-box protein, is indispensable for male fertility. Fbxo24 KO mice exhibited malformed sperm flagellar and compromised sperm motility. The phenotype of Fbxo24 KO spermatozoa was well analyzed.

      The authors observed numerous membraneless electron-dense granules in the Fbxo24 KO spermatozoa. They also showed abnormal accumulation of two importins, IPO5 and KPNB1, in the Fbxo24 KO spermatozoa. However, the data presented in the manuscript do not support the conclusion that FBXO24 ensures male fertility by preventing the abnormal accumulation of membraneless granules in sperm flagella, as indicated in the manuscript title.

      Fbxo24 KO mice showed abnormal accumulation of membraneless granules in sperm flagella and male infertility, suggesting that FBXO24 is involved in these processes, but there are no results that show the direct relationship as reviewer #3 mentioned. Therefore, we fixed the title.

      Major comments:

      In the title, abstract, introduction, and some sections such as lines 275-276, the authors conclude that FBXO24 prevents the accumulation of importins and RNP granules during spermiogenesis. However, the provided data do not substantiate this claim. To provide conclusive evidence to support the current title, the authors need to present evidence supporting: 1) direct degradation of IPO5 and KPNB1 by FBXO24; 2) the direct requirement of IPO5 for the formation of the membraneless granules, and 3) infertility resulting from the presence of membraneless granules, rather than other issues such as abnormal ODF and AX.

      (1) direct degradation of IPO5 and KPNB1 by FBXO24.

      To examine if IPO5 can be degraded by FBXO24, we performed a ubiquitination assay using HEK293T cells. Ubiquitination of IPO5 was upregulated in the presence of WT FBXO24 but not with the mutant ΔF-box FBXO24, suggesting that IPO5 can be ubiquitinated by FBXO24. We did not examine the ubiquitination of KPNB1 because we failed to construct a plasmid vector expressing mouse KPNB1. We think that KPNB1 is not the substrate because we did not detect the interaction between FBXO24 and KPNB1 (Figure 5E). We added the results of the ubiquitination assay (Figure

      5F and line 261-265) and mentioned it in the abstract (line 35).

      (2) the direct requirement of IPO5 for the formation of the membraneless granules.

      (3) infertility resulting from the presence of membraneless granules, rather than other issues such as abnormal ODF and AX.

      We revealed that IPO5 aggregate under stress condition in COS7 cells (Figure 6C and D); however, we did not examine whether IPO5 is required for the formation of the membraneless granules. We consider that protein degradation systems such as PROTAC or Trim-Away to knockdown IPO5 at the protein level in Fbxo24 KO mice could be a good way to see if the membraneless granules are diminished and male fertility is rescued. However, it takes time to apply the degradation systems in vivo. Therefore, we would like to leave this rescue experiment for future studies. We fixed the title and  abstract (line 37-38), and removed the last sentence of the introduction.

      Also, the other group reported the analyses of Fbxo24 KO mice [Li et al., 2024 (PMID: 38470475)] right after we submitted our manuscript to the eLife. They reported not only disorganized flagellar structures but also abnormal head morphology, which may lead to male infertility. The differences from our study may be due to different mouse genetic backgrounds. We mentioned it in the discussion section (line 348-353).

      Minor comments:

      (1) The authors claimed a significant increase in the total amount of RNAs in Fbxo24 KO spermatozoa (lines 259-261), suggesting that the ...contain RNAs. More direct evidence supporting this claim should be provided.

      We show that the amounts of IPO5 and KBNB1 increased in Fbxo24 KO spermatozoa (Figure 5A and B), both of which could be incorporated into RNP granules in COS7 cells (Figure 6C and D), supporting the idea that membraneless electron-dense structures may be RNP granules. However, because we did not show direct evidence that electron-dense structures contain RNAs, we removed the sentences (line 259-261 of the 1st submission manuscript). 

      (2) The author should provide an explanation for the absence of a FLAG band in the input Tg in Figure 5D and the larger size of the IPO5 band in the FLAG-IP group compared to the input. Similar observations are also noted in Figure 5E.

      The FLAG band is weak because the protein amount is low. When we increase the contrast, we can see the FLAG band. We added an image with high contrast (Figure 5D). Sometimes, proteins run differently with SDS-PAGE after immunoprecipitation, likely due to varying protein composition in the sample. We explained it in the figure legend (line 868-869).

      (3) In Line 526, clarify the procedure for sperm purification, and determine the potential for contamination from somatic cells.

      We did not perform sperm purification, but when we observed spermatozoa obtained from cauda epididymis, we rarely observed either somatic cells or immature spermatogenic cells. We added  pictures in Figure S7. Further, we added detailed explanation about how to collect spermatozoa from the epididymis (line 549-550).

      (4) Define the Y-axis in Figure 2E, F, and G.

      We have revised the figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors investigate the impact of fecal microbiota transfer (FMT) on intestinal recovery from enterotoxigenic E. coli infection following antibiotic treatment. Using a piglet model of intestinal infection, the authors demonstrate that FMT reduces weight loss and diarrhea and enhances the expression of tight junction proteins. Sequencing analysis of the intestinal microbiota following FMT showed significant increases in Akkermansia muciniphila and Bacteroides fragilis. Using additional mouse and organoid models, the authors examine the impact of these microbes on intestinal recovery and modulation of the Wnt signaling pathway. Overall, the data support the notion that FMT following ETEC infection is beneficial, however, additional investigation is required to fully elucidate the mechanisms involved.

      Strengths:

      Initial experiments used a piglet model of infection to test the value of FMT on recovery from E. coli. The FMT treatment was beneficial and the authors provide solid evidence that the treatment increased the diversity of the microbiota and enhanced the recovery of the intestinal epithelium. Sequencing data highlighted an increase in Akkermansia muciniphila and Bacteroides fragilis after FMT.

      The mouse data are consistent with the observations in pigs, and reveal that daily gavage with A. muciniphila or B. fragilis enhances intestinal recovery based on histological analysis, expression of tight junction proteins, and analysis of intestinal barrier function.

      The authors demonstrate the benefit of probiotic treatment following infection using a range of model systems.

      Weaknesses:

      Without sequencing the pre-infection pig microbiota or the FMT input material itself, it's challenging to firmly say that the observed bloom in Akkermansia muciniphila and Bacteroides fragilis stemmed from the FMT.

      Response: We have determined the relative abundance of each bacterium in fecal bacterial suspension, referring to Hu et al. (2018). The absolute abundances of Akkermansia muciniphila and Bacteroides fragilis in the FMT were 1.3 × 103 ± 2.6 × 103 and 4.5 × 103 ± 6.1 × 103 respectively.

      Reference:

      Hu LS, Geng SJ, Li Y, et al. Exogenous Fecal Microbiota Transplantation from Local Adult Pigs to Crossbred Newborn Piglets. Front. Microbiol. 2018, 8.

      The lack of details for the murine infection model, such as weight loss and quantification of bacterial loads over time, make it challenging for a reader to fully appreciate how treatment with Akkermansia muciniphila and Bacteroides fragilis is altering the course of infection. Bacterial loads of E. coli were only quantified at one time point, and the mice that received A. muciniphila and B. fragilis had very low levels of E. coli. Therefore, it is not clear if all mice were subjected to the same level of infection in the first place. The reduced translocation of E. coli to the organs and enhanced barrier function may just reflect the low level of infection in these mice. Further, the authors' conclusion that the effect is specific to A. muciniphila or B. fragilis would be more convincing if the experiments included an inert control bacterium, to demonstrate that gavage with any commensal microbe would not elicit a similar effect.

      The weight loss was added in Figure S2A. All mice were subjected to the same level of infection in the first place.

      Many of the conclusions in the study are drawn from the microscopy results. However, the methods describing both light microscopy and electron microscopy lack sufficient detail. For example, it is not clear how many sections and fields of view were imaged or how the SEM samples were prepared and dehydrated. The mucus layer does not appear to be well preserved, which would make it challenging to accurately measure the thickness of the mucus layer.

      For light microscopy, 3-4 fields were selected from each mouse to count about 30 crypts. The method of electron microscopy was complemented on line 263-270. We have removed data of the mucus layer.

      Gene expression data appears to vary across the different models, for example, Wnt3 expression in mice versus organoids. Additional experiments may be required to clarify the mechanisms involved. Considering that both of the bacteria tested elicited similar changes in Wnt signaling, this pathway might be broadly modulated by the microbiota.

      The reason why the Wnt3 expression pattern is different in mice and in porcine intestinal organoids may be caused by the different infection periods of ETEC in vivo and in vitro. Furthermore, in vivo, the stem cell niche of intestinal stem cells is not only regulated by intestinal epithelial cells, but also affected by mesenchymal cells in connective tissues (Luo et al., 2022). However, in vitro models, stem cell niche is only regulated by epithelial secretory factors, which may also account for the differences in in vitro and in vivo results.

      It has been reported that B. fragilis pretreatment significantly increased the relative abundance of A. muciniphila in the intestine of CDI mice, and the growth and maintenance of A. muciniphila were involved in the restoration of intestinal barrier integrity after CDI infection, indicating that there might exist a bacterial metabolic symbiosis between A. muciniphila and B. fragilis (Deng et al., 2018).

      References:

      Luo HM, Li MX, Wang F, et al. The role of intestinal stem cell within gut homeostasis: Focusing on its interplay with gut microbiota and the regulating pathways. Int. J. Biol. Sci. 2022, 18(13): 5185-5206.

      Deng HM, Yang SQ, Zhang YC, et al. Bacteroides fragilis Prevents Clostridium difficile Infection in a Mouse Model by Restoring Gut Barrier and Microbiome Regulation. Front. Microbiol. 2018, 9.

      The unconventional choice to not include references in the results section makes it challenging for the reader to put the results in context with what is known in the field. Similarly, there is a lack of discussion acknowledging that B. fragilis is a potential pathogen, associated with intestinal inflammation and cancer (Haghi et al. BMC Cancer 19, 879 (2019) ), and how this would impact its utility as a potential probiotic.

      Bacteroides fragilis is one of the symbiotic anaerobes within the mammalian gut and is also an opportunistic pathogen which often isolated from clinical specimens. Bacteroides fragilis was first isolated from the pathogenic site and considered to be pathogenic bacteria. However, with the deepening of research, it is gradually realized that in the long-term evolution process, Bacteroides fragilis colonized in the gut has established a friendly relationship with the host, which is an essential component for maintaining the health of the host, especially for obesity, diabetes and immune deficiency diseases. We have supplemented the discussion on line 598-603.

      Reviewer #2 (Public Review):

      Ma X. et al proposed that A. muciniphila was a key strain that promotes the proliferation and differentiation of intestinal stem cells by acting on the Wnt/β-catenin signaling pathway. They used various models, such as the piglet model, mouse model, and intestinal organoids to address how A. muciniphila and B. fragilis offer protection against ETEC infection. They showed that FMT with fecal samples, A. muciniphila or B. fragilis protected piglets and/or mice from ETEC infection, and this protection is manifested as reduced intestinal inflammation/bacterial colonization, increased tight junction/Muc2 proteins, as well as proper Treg/Th17 cells. Additionally, they demonstrated that A. muciniphila protected basal-out and/or apical-out intestinal organoids against ETEC infection via Wnt signaling. While a large body of work has been performed in this study, there are quite a few questions to be addressed.

      Major comments:

      - The similar protective effect of FMT with fecal samples, A. muciniphila or B. fragilis is perhaps not that surprising, considering that FMT likely restores microbiota-mediated colonization resistance against ETEC infection. While FMT with fecal samples increases SCFAs, it is unclear whether/how FMT with A. muciniphila or B. fragilis alter the microbiota composition/abundance as well as metabolites in the current models in a way that offers protection.

      We examined changes in the gut microbiota of mice treated with A. muciniphila and B. fragilis through 16s rRNA, and results showed that both A. muciniphila and B. fragilis improved the alpha and beta diversities of the microbiota, while these results were not included in this manuscript.

      - Does ETEC infection in piglets/mice cause histological damage in the intestines? These data should be shown.

      The results of scanning electron microscopy (Figure 3A) showed the intestinal damage of piglets after ETEC infection. H&E staining and transmission electron microscopy (Figure 5A and 5B) showed the intestinal damage of mice after ETEC infection.

      - Line 447, "ETEC adheres to intestinal epithelial cells". However, there is no data showing the adherence (or invasion) of ETEC to intestinal epithelial cells, irrespective of piglets/mouse/organoids.

      The scanning electron microscope (Figure 3A bottom) showed that ETEC K88 infected piglets existed obvious rod-shaped bacterial adhesion on the surface of microvilli. Figure 2C showed the colonization of ETEC K88 in the jejunum and colon of piglets. Figure S2A showed the E. coli colonization in intestines and other tissues of mice.

      - In both basal-out and apical-out intestinal organoid models, A. muciniphila protects organoids against ETEC infection. Did ETEC enter into intestinal epithelial cells at all after only one hour of infection? Is the protection through certain A. muciniphila metabolites?

      It has been reported that the duration of the co-culture for studying the host-microbiota cross-talk by apical-out organoids model is 1 hour (Poletti et al., 2021). In addition, Co et al. (2019) used apical-out organoids model to study host-pathogen interactions, with Salmonella enterica serovar Typhimurium or Listeria monocytogenes invading organoids for an hour.

      References:

      Poletti M, Arnauts K, Ferrante M, et al. Organoid-based Models to Study the Role of Host-microbiota Interactions in IBD. J. Crohns Colitis. 2021, 15(7): 1222-1235.

      Co JY, Margalef-Catala M, Li XN, et al. Controlling Epithelial Polarity: A Human Enteroid Model for Host-Pathogen Interactions. Cell Reports. 2019, 26(9): 2509-2520.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow-up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      The major weakness is that, as presented, the manuscript is quite difficult to follow, even for someone familiar with the field. The lack of detail in figure legends, organization of the text, and frequent use of non-intuitive abbreviated group names without a clear key (ex. EP/EF, or C E A B) make comprehension challenging. The results section is perhaps too succinct and does not provide sufficient information to understand experimental design and interpretation without reading the methods section first or skipping to the discussion (as an example: WNT-c59 treatment). Extensive revisions could be encouraged to aid in communicating the potentially exciting findings.

      The abbreviations of experimental groups are firstly defined in the Methods and Materials, and we have supplemented the experimental design in the results section on line 397-399, 439-442 and 516-520.

      The bioinformatics section of the methods requires revision and may indicate issues in the pipeline. Merging the forward and reverse reads may represent a problem for denoising. Also since these were sequenced on a NovaSeq, the error learning would have to be modified or the diversity estimates would be inappropriately multiplied. "Alpha diversity and beta diversity were calculated by normalized to the same sequence randomly." Not sure what this means, does this mean subsampled? "Blast was used for sequence alignment", does this mean the taxonomic alignment? This would need to be elaborated on and database versions should be included. The methods, including if any form of multiple testing was included, for LEFSE was also not included.

      Denoising was conducted using UNOISE3 to correct for sequencing errors. Subsequent analysis of alpha diversity and beta diversity were all performed based on the output normalized data. Multiple sequence alignment was performed using MUSCLE (v3.8.31) software to obtain the phylogenetic relationships of all OTUs sequences. We have supplemented the method of multiple testing on line 323-328.

      Reviewer #1 (Recommendations For The Authors):

      At some points, the rationale for using both porcine and murine models was unclear, and it would be helpful for the reader to elaborate on the benefits of these models and why they were used in the introduction. Similarly, it would be helpful to describe the benefits of basal-in organoids versus injecting standard organoids with bacteria.

      The main subject of this study was piglets, supplemented by a mouse model for validation. Interpretation of measurements from organoid microinjection experiments must account for multiple confounding variables such as heterogeneous exposure concentrations and durations, as well as impacts of disrupting the organoid wall. We have added the description in the introduction on line 88-90.

      Line 165 -- The number of piglets used seems high, is it correct approximately 100 pigs were used?

      Nine litters were selected for processing, while only 18 piglets were finally slaughtered.

      There is very little discussion of the preliminary experiment that the authors used to determine how much bacteria to use. I recommend either discussing the data and how the doses were chosen or omitting it. It was not clear if the authors used pasteurized or live bacteria in the experiments. It would also be interesting to include a discussion of the observation that relatively low levels of Akkermansia (10^6 CFU) appeared more beneficial than the higher doses, typically used in these types of experiments.

      We removed these results. The experiments used live bacteria.

      Microscopy methods for both light microscopy and EM would be stronger with added details including how many sections and fields of view were imaged and how the numbers of goblet cells normalized across samples. Without having a clear cross-section of a crypt, it is not clear to me how the images can be used to accurately quantify the number of cells per crypt. Additional details in the methods on how many total crypts were counted should also be included.

      For light microscopy, 3-4 fields were selected from each mouse to count about 30 crypts. We have removed the data of the mucus layer and goblet cells.

      Line 236 -- missing which gene was used.

      The Genbank Accession was added on line 232-233.

      Line 310 -- OTU nomenclature.

      We have supplemented the OTU nomenclature on line 314.

      Line 413 -- This line seems inconsistent with the data analysis described in the methods section. The authors may need to expand their description of the 16S data analysis to be clear and reproducible.

      We have redescribed the 16S data analysis on line 312-328.

      Line 413 -- it is not surprising that 16s analysis did not capture species, it will have limited resolution beyond the genus level.

      We deleted this sentence.

      Methods are missing some details on the data analysis, eg. methods/programs and statistical analysis of PCoA and NMDS, LefSe.

      The methods and statistical analysis of PCoA, NMDS and LEfSe were supplemented on line 323-328.

      Fig 4C -- The images do not clearly capture the mucus layer or how it was analyzed. The sections appear to be cut at a slight angle, with multiple partial sections of crypts. I think this might make it challenging to count goblet cells, especially if the counts are normalized over the number of crypts or villi. The mucus layer does not appear well preserved. For example, I would expect to see an intact mucus layer lining the colon in the PBS control group. Re-cutting sections with a clean cross-section through the tissue will make data analysis easier.

      We have removed data of the mucus layer.

      Fig 4D -- The images appear to be of the mouse proximal colon, whereas the mucus layer and most muc2 will be in the distal colon. If the authors have tissue sections of the distal colon, this may give a clearer image of the mucus layer and might be more consistent with the TEM images in Fig. 4B.

      We apologize for the absence of the distal colon sections.

      To fully preserve the mucus layer, in addition to fixing in Carnoy's solution, the embedding process must be run without the standard washes in 70% ethanol (see: Johansson and Hansson. Methods Mol Biol. (2012) 229; doi: 10.1007/978-1-61779-513-8_13). The mucus will wash away during standard paraffin embedding if the tissue is washed with 70% ethanol, and I wonder if that has occurred in these samples.

      The tissue wasn’t washed with 70% ethanol.

      Fig 6A and 6B -- Although the legend indicates that the data is representative of two independent experiments, it is not clear how many fields of view or cells were imaged. In the bar graphs, it is not clear how many crypts were analyzed and from how many fields of view.

      3-4 fields were selected from each mouse to count about 30 crypts.

      **For all of the bar graphs, this could be addressed by displaying all of the data points, rather than just the mean, to give the reader a sense of how many cells were counted. (as was done in Fig 7B).

      We have changed the bar graphs with data points.

      498-501 -- The text says that the gene expression patterns in the organoids are consistent with the in vivo data, but the data patterns of gene expression appear to be different. For example, patterns for Wnt3 and B-catenin expression in mice, appear to be the opposite of what was observed in the organoid?

      Lines 509-512 mean that the expression patterns of mice in organoids and in vivo is consistent. Figure 7C was incorrectly written as Figure 8C, we have changed it.

      Since Akkermansia does not grow under aerobic conditions, it should be made clear that the organoid co-culture treatment does not involve actively growing bacterial cultures.

      Reunanen et al. found that Akkermansia can tolerate oxygen, more than 90% Akkermansia can keep for 1 h under oxic, 5% CO2 conditions.

      Reference:

      Reunanen J, Kainulainen V, Huuskonen L, et al. Akkermansia muciniphila Adheres to Enterocytes and Strengthens the Integrity of the Epithelial Cell Layer. Appl. Environ. Microbiol. 2015, 81(11): 3655-3662.

      Minor points

      Line 50 -"evidence".

      We have changed to “evidence” on line 49.

      Line 64, 422 - italicize, check italics throughout.

      We have checked italics throughout the manuscript.

      Line 64 - may need to be reworded.

      We have changed to “Clostridioides difficile” on line 66.

      Line 77 - pathogen.

      We have changed to “pathogen” on line 77.

      Line 161 - the.

      We have removed “the” on line 161.

      Line 178 - mouse.

      We have changed to “mouse” on line 179.

      Line 313 -- wording is confusing.

      We have changed the description on line 319-320.

      Line 318 -- Silva version #.

      The version is Silva 132. We have added it on line 316.

      Line 334 - Manufacturer for Live/Dead cell stain?

      The Live/Dead cell stain was used BD Biosciences FVS510. We have added it on line 345.

      Line 433 -- FD4 not defined until here.

      We have refined the FD4 on line 218-219.

      Line 512 -- but did not promote.

      We have changed to “but did not promote” on line 526.

      Line 517 -- Looks like this should be "basal-in organoids" instead of basal-out?

      We have changed the "basal-out" to "apical-to" on line 531.

      Line 546 -- induced neonatal should be protected?

      They are in separate pens.

      Jumps from Fig 7B to Fig 8C in the text.

      We apologize for the wrong writing, and we have change it.

      Reviewer #2 (Recommendations for The Authors):

      The title itself is a bit misleading. Please consider changing it. The authors meant that A. muciniphila prevents pathogen invasion, but does not function in pathogen invasion.

      We have changed the title.

      Major comments:

      - Figures 4A, 4D, and 6B should include presentation of cross-section pictures.

      We provided cross-section pictures to the journal.

      - Figures 7, 8, and 9 should indicate clearly whether mouse or piglet organoids are used. For instance, in the main text, line 490, it indicates piglet organoids, but in Figure 7A legend, it indicates mouse tissue.

      We apologize for the misspelling, and have changed to “mice” on line 501-502.

      - In Figure 7A, the 3rd row, 2nd panel, crypts formed into spherical organoids; whereas in Figure 8, ETEC infection of basal-out organoids formed budding organoids. This needs to be better explained.

      Mouse intestinal organoids were cultured ex vivo from crypts isolated from mice infected with ETEC, while porcine intestinal organoids were co-cultured with ETEC in vitro.

      Minor comments:

      - In the result section, the numbering of Figures or supplementary Figures is problematic, i.e it should start with Figure 1..., Figure S1, but not directly go to Figure S2A etc.

      The Figure 1 was in Materials and Methods.

      - Line 458, please add the gating strategy used in the flow cytometry study.

      The gating strategy was added on line 351-356.

      - The effect of A. muciniphila on the proliferation of intestinal epithelium through the Wnt/β-catenin signaling pathway is well known (such as PMID: 32138776). The authors should discuss this in detail.

      We have supplemented the discussion on line 637-639.

      Reviewer #3 (Recommendations For The Authors):

      It is somewhat unusual that the results from the piglets are in the supplement as this is a major strength of the manuscript (Fig S2).

      We have put these results into Figure 2 of the manuscript.

      "Collectively, our results may provide theoretical basis that FMT is a promising mitigation method for pathogenic bacteria infection and a new strategy for precise application of FMT in clinical and livestock production"- This is somewhat of an odd statement as the introduction of the manuscript completely skips over most of what is known about FMTs in the context of C. difficile. Also if anything, does the authors' own data not point mostly at using A. muciniphila on its own? Clinical trials are well underway in humans.

      We have changed the sentences to “Collectively, our results may provide theoretical basis that A. muciniphila is a promising method to repair intestinal barrier damage and a new strategy for the precise application of A. muciniphila in livestock production.” on line 98-100.

      Line 26: I am not sure probiotic is the right word here given its strict scientific definition. Perhaps beneficial or protective would be more appropriate.

      We have changed “probiotic” to “beneficial” on line 25.

      Line 27: I believe AIMD is antibiotic-induced microbiome-depletion in most usages which may be more accurate and informative than dysregulated.

      The type, dosing, and time of antibiotic we used were applied to induce microbiota disorder.

      It would appear that there are issues in the reference formatting where a number of journal names are missing.

      We have re-edited the reference formatting.

      Line 64- I believe eLife requires the standard practice of italicizing genus and species names. Also Clostridium difficile should now be referred to as Clostridioides difficile.

      We have changed to “Clostridioides difficile” and italicized it on line 66 and 569. The italicizing genus and species names were checked throughout the manuscript.

      Figure S2C: is it not clear why the melt curve was included here, but the legend should make it more clear what is being shown. I assume this is to provide evidence of specificity?

      The melting curve was used to demonstrate that only the ETEC K88 could be amplified by the primers we used. We have added an illustration in the figure legend.

      Figure 2D: there should be a quantitative analysis done on the staining of Muc2.

      We have quantified the staining of MUC2 in Figure 3D.

      Figure 3: The legends are not sufficient. For example: it is not clear what Figure 3A actually shows as the y-axis is not labelled and it is not clear what the relationship is between this and the anosim which is a function for permanova.

      Anosim analysis was performed using the R software with anosim package function based on the rank order of Bray-Curtis distance values to test the significance of differences between groups. The y-axis is the rank of the distance between samples.

      Line 416- OTU not OUT.

      We have changed to “OTU” on line 428.

      Figure 4- the naming key needs to be included in the figure legend. C, E, A, and B are immediately obvious.

      The naming key was included in the figure legend.

      Methods: additional information on the flow cytometry gating strategy/controls should be included.

      The gating strategy was added on line 351-356.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript addresses a fundamental question about how different types of communication signals differentially affect brain state and neurochemistry. In addition, their manuscript  highlights the various processes that modulate brain responses to communication signals, including prior experience, sex, and hormonal status. Overall, the manuscript is well-written and the research is appropriately contextualized.

      That being said, it remains important for the authors to think more about their analytical approaches. In particular, the effect of normalization and the explicit outlining and interpretations of statistical models. As mentioned in the original review, the normalization of neurochemical data seems unnecessary given the repeated-measures design of their analysis and by normalizing all data to the baseline data and including this baseline data in the repeated measures analysis,   one artificially creates a baseline period with minimal variation that dramatically differs in variance from other periods (akin to heteroscedasticity). If the authors want to analyze how a stimulus changes neurochemical concentrations, they could analyze the raw data but depict normalized data in their figures (similar to other papers). Or they could analyze group differences in the normalized data of the two stimulus periods (i.e., excluding the baseline period used for normalization).

      We appreciate the reviewer’s point on the difference in variance caused by including the 100% baseline values in the analysis. After consulting with our statistician, we chose the latter of the two approaches suggested by the reviewer. Specifically, we reran the analysis to exclude the baseline and focus only on the playback windows and the group differences. The text in the results, the significance signs in the figures, and the discussion are corrected accordingly. Despite these changes, our major conclusions remains as before.

      We also followed this reviewer’s suggestions to clarify the statistical model in studying the experience effect. After further consultation with our statistician, we reran the analysis on experience effect, including all the groups of EXP and INEXP animals together. We have corrected text in the figure captions, results, discussion, and data analysis sections of the manuscript related to the effect of experience and its interactions. This has not changed the conclusion made related to the experience effect in the dataset.

      It would also be useful for the authors to provide further discussion of the potential contributions of different types of experiences (mating vs. restraint) to the change in behavior and neurochemical responses to the vocalization playbacks and to try to disentangle sensory and  motor contributions to neurochemical changes.

      We have acknowledged in the Discussion that previous studies suggest that the effect of experience involving stress could be generalized. We believe that this is an important area of future research. Our Discussion acknowledges that the relationship between sensory and motor contributions to neurochemical changes remains an area of interest. We further point out that the time resolution of microdialysis data renders the suggested discussion highly speculative. We plan to use other methods to assess this in future experiments.

      Reviewer #3 (Public Review):

      The work by Ghasemahmad et al. has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the basolateral amygdala (BLA) while an animal listens to social vocalizations.

      Ghasemahmad et al. made changes to the manuscript that have significantly improved the work. In particular, the transparency in showing the underlying levels of Ach, DA, and 5HIAA is excellent. My previous concerns have been adequately addressed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the authors responses to my previous queries (and to the comments by other reviewers). The introduction does a better job contextualizing the data, and the additional details in the results and Methods sections help readers digest the material. I continue to think the topic  is interesting and the manuscript is potentially impactful. However, I continue to be concerned about their analytical approaches and other aspects of the revised manuscript.

      (a) Normalization

      In my original review I wrote: "The normalization of neurochemical data seems unnecessary   given the repeated-measures design of their analysis and could be problematic; by normalizing     all data to the baseline data (p. 24), one artificially creates a baseline period with minimal   variation (all are "0"; Figures 2, 3 & 5) that could inflate statistical power." I continue to feel that an analysis of normalized data that includes the baseline data is inappropriate because of the minimal variation in the normalized data for the baseline period. When the normalized data for   the baseline period is included in the analysis, there is clearly variation in the extent of variability within each of the time periods (no variability at baseline, variability during periods 1 & 2; analogous to heteroscedasticity). For example, when analyzing the RAW DATA about the change in ACh release in experienced males listening to restraint vocalizations (thank you for releasing the raw data), there was a non-significant effect of time (baseline, period 1, and period 2; linear mixed effects model; F(2,12)=3.2, p=0.0793). However, when the normalized data for  this dataset was analyzed (with baseline values being set at 100% for each mouse), there was a statistically significant effect (F(2,12)=4.5, p=0.0352). This example is just to illustrate how normalization can affect (e.g., inflate) statistical power.

      That being said, I do think that it is reasonable to analyzed normalized data if the period used for normalization is NOT included in the analysis (see Figure 3 of one of the paper the authors listed in their response to reviewers: Galvez-Marquez et al., 2022). However, from the reading of this manuscript, it does seem like normalized baseline data are analyzed to assess how stimuli affect neurochemical concentrations.

      We appreciate the reviewer’s point on the difference in variance caused by including the 100% baseline values in the analysis. After consulting with our statistician, we chose one of the two approaches suggested by the reviewer. Specifically, we reran the analysis to exclude the baseline and focus only on the playback windows and the group differences. The text in the results, the significance signs in the figures, and the discussion are corrected accordingly. Despite these changes, our major conclusions remains as before. We have included some descriptive statistics in the text because we think these are informative.

      We decided to take this approach because the inter-individual variability in the raw data levels, caused by non-experimental factors, is too great to be useful. As we have stated before, these values are affected by probe placement, collection process, or differences in the HPLC or LC/MS runs. These effects are widely recognized in the field.

      It is worth pointing out a few things about the papers listed by the authors. Li et al. (2023) does depict normalized microanalysis data but it isn't clear that any analysis of the normalized data is conducted. The same can be said about Holly et al. (2016). Further, in Bagley et al (2011), the authors depict normalized data in the figures but conduct analyses on the raw data ("After  chronic morphine treatment, systemic naloxone injection increased GABA outflow in PAG by 41% (from 24.6 {plus minus} 2.9 nM to a peak of 34.8 {plus minus} 3.8 nM, n = 6, P = 0.016), but did not alter GABA levels after vehicle treatment (39.8 {plus minus} 8.3 to 38.6 {plus  minus} 7.4 nM with naloxone at matched peak time, n = 4; Fig. 3a)". This latter approach (analyzing raw data in a repeated-measures manner and depicted normalized data) seems reasonable for the authors of the current study.

      (b) Clarification and modification of statistical models

      When analyzing the effect of experience on neuromodulator release, the authors analyze the experienced and inexperienced mice independently (e.g., figure 3 vs. 6). The ideal way to assess the effects of experience is to create a factorial model. For example, one could analyze a full factorial model with experience (exp vs. inexp), stimulus time (mating vs. restraint) and time  (baseline, period 1 vs period 2, assuming raw data are used). If one wanted to exclude the  baseline period because group differences in baseline are not informative, conducting a factorial analysis of normalized data with just the data from period 1 and 2 seems fine. I believe an analysis like this will help increase the legitimacy of the analysis. For example, when analyzing the normalized data (periods 1 and 2) of experienced and inexperienced males in response to mating or restraint vocalizations, you find a significant interaction between experience and stimulus type. Finding an effect of experience in an analysis that includes both experienced and inexperienced mice is ideal from an analytical framework.

      In Figure 6, it is not clear what the statistical model is and what the interactions mean. For example, in the figure legend for figure 6, the authors report time*context and time*sex interactions. However, in this analysis there are two groups of inexperienced males (males that   are listening to restraint vocalizations, males that are listening to mating vocalizations) and one group of females (females that are listening to mating vocalizations); in other words, this is an unbalanced analysis. So, when the authors indicate a time*context interaction, does that mean  they are comparing the male-restraint group to the combination of males and females listening to mating vocalizations? And when they talk about a time*sex interaction, are they analyzing how males listening to either mating or restraint vocalizations differ from females listening to a   mating vocalization? This all seems peculiar to me.

      - A similar set of questions could be raised about interaction effects depicted in Figure 4.

      Overall, I would like this manuscript to be reviewed by a statistician to provide additional input on how best to analyze the data.

      We followed the reviewer’s suggestions to clarify the statistical model in studying the experience effect. After further consultation with the statistician, we reran the analysis on experience effect, including all the groups of EXP and INEXP animals together.

      Design: Intercept + Sex +Context + Experience+ Sex* Experience + Context* Experience.

      The model is not full factorial as recommended by the statistician, because we don’t have females in the restraint group and that would make an unbalanced design. Therefore, running GLM based on the above model and included factors, as advised by the statistician, is the best way of approaching the analysis for the current dataset.

      We have corrected text in the figure captions, results, discussion, and data analysis sections of the manuscript related to the effect of experience and its interactions. The GLM models are clarified for all the figures in the “data analysis” section of the manuscript. We have clarified that the major effect of experience on neuromodulators was seen in the ACh data.

      (c) Analysis of post-stimulus period

      I agree with Reviewer 3 that analyzing the post-stimulus period would be useful. As mentioned     in the original review, these data could serve as an opportunity to show that the neurochemical levels returned to baseline and add further support for the model described in Figure 6. In   addition, these data could help reveal the link  between  neurochemical  release,  auditory responses, and behavior. If neurochemical changes reflect auditory responses, then these should back to baseline during the post-stimulus period. In addition, if behavioral variation (e.g.,    between mice hearing mating vs. restraint stimuli) persists following the termination of playback, then one could similarly assess whether neurochemical variation persists following playback. If   the latter is the case, then the neurochemical release could be more related to the behavior than to the playback stimulus itself.

      We did not change this analysis. Our response to Reviewer 3’s comment is shown below.

      “We decided not to include analyses of the post-stimulus period because this period is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback. We agree that the general question is of interest to the field, but we don’t think our study is best designed to answer that question.”

      This was accepted by Reviewer 3. We also note that release patterns have multiple time courses (e.g., Aitta-aho et al., 2018 for ACh), and thus may not support an assumption that levels should return to baseline shortly after playback offset.

      Minor comments:

      Page 7, line 15: I suggest changing "vocalization-dependent" to "stimulus-dependent" because the former could connote patterns of release related to the animal itself vocalizing.

      Changed to: “There were also distinct patterns of ACh and DA release into the BLA depending on the type of vocalization playback (Fig 3C,D).”

      Discussion section: The authors should point out a few caveats with their experiments in the Discussion section. First, experienced animals received both mating (social) and restraint experiences, and it is not clear to what degree each type of experience affected neural and behavioral responses (i.e., specificity of experience effects). For example, mating experience can lead to a wide range of physiological changes, including a resilience to stress (e.g., Leuner et al., PLoS One, 2010; Arnold et al., Hormones and Behavior, 2019), so it is possible that mating experiences by themselves could have induced these changes. Or it could be that experiencing restraint stress affects responses to mating stimuli. This could be added to the first paragraph in page 16. (The authors could also discuss which aspects of the sexual encounters might be most important for the behavioral and neural plasticity.)

      We have added text to raise this issue, stating that it is unknown wither the experience effects are specific and citing the above references concerning the generalized effects of certain experiences.

      Discussion section: It would also be useful for the authors to discuss the extent to which behavior might be driving the neurochemical changes. Some of the analyses suggest that the release is independent of the behavior (e.g., reflects a sensory responses) but this could be emphasized    more in the Discussion.

      We believe that we have addressed this issue sufficiently in our previous response to related issues raised by this reviewer. As we note, there are limitations in the time resolution of microdialysis data that render the suggested discussion highly speculative. We plan to use other methods to assess this in future experiments.

      Figure 2, legend: Please note that the text above the images describes the stimulus played back to these animals and their hormonal state, and not the type of experienced they underwent (i.e.,  clarify the titles)

      Changed as requested.

      I also agree with Reviewer 3 that "mating experience" is a misnomer for this manuscript. "Social experience with a female" is a more accurate descriptor. If they wanted to specifically provide mating experience, males should have only been tested with estrus (receptive females). I don't think this wording change detracts from their findings.

      We have not changed this term. As noted in our previous response to Reviewer #3, we stated: “In the mating experience, mounting or attempted mounting was required for the animal to be included in subsequent testing.” Due to this requirement, the term “mating behavior” is informative and appropriate. In our view, “Social experience with a female” does not adequately describe our inclusion criterion or the experience.

      Reviewer #3 (Recommendations For The Authors):

      The work by Ghasemahmad et al. has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the basolateral amygdala (BLA) while an animal listens to social vocalizations.

      Ghasemahmad et al. made changes to the manuscript that have significantly improved the work. In particular, the transparency in showing the underlying levels of Ach, DA, and 5HIAA is excellent. My previous concerns have been adequately addressed. I only have a few minor suggestions for the text and one figure.

      Minor suggestions:

      Page 2, Ln 9: add adult before male and female mice

      Changed as requested

      Page 4, Ln 10: add a period after Tsukano et al., 2019)

      Changed as requested

      Page 6, Ln 9: what did you mean by "their interaction"? Being more specific, but concise, would help the readers.

      We revised the wording to clarify that the neuromodulatory systems interact in the emission of positive and negative vocalizations.

      Page 6, Ln 17: You mention Stim 1 and Stim 2, but the stimuli are not defined at this point. The clear explanation is provided in the following paragraph. Maybe consider switching the order  and define the stimuli before you describe the liquid chromatography/mass spectrometry technique.

      We have revised and merged these paragraphs so that Stim 1 and Stim 2 are defined on first use. We also revised our description of the depiction and analysis of neurochemical data.

      Page 11, Ln 12: replace well-proven with well-documented

      Changed as requested

      Figure 2: There are two arrows pointing towards a single track. I assume one of the arrows is a duplicate. If so, delete one of the arrows. If not, please explain what the second arrow represents.

      Arrow removed

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (Inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B).

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We measured the viability of C2C12 cells after 24 hours of treatment with 15d-PGJ2 using the MTT assay and observed that the viability of cells was decreased after treatment with 15d-PGJ2 (10 µM) but not with 15d-PGJ2 (1 µM, 2 µM, 4 µM, or 5 µM) (see Fig. S2A of the updated manuscript). The results and figures of the manuscript have been updated accordingly.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death.

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ­2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears.

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15d-PGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15d-PGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation.

      The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      (1) I still think the novelty is limited by previous published findings. The authors themselves noted that the accumulation of 15d-PGJ2 in senescent cells has been reported in various cell types, including human fibroblasts, HEPG2 hepatocellular carcinoma cells, and HUVEC endothelial cells (PMCID: PMC8501892). Although the current study observed similar activation of 15d-PGJ2 in myoblasts, it appears to be additive rather than fundamentally novel. The covalent adduct of 15d-PGJ2 with Cys-184 of H-Ras was reported over 20 years ago (PMID: 12684535), and the biochemical principles of this interaction are likely universal across different cell types. The regulation of myogenesis by both HRas and 15d-PGJ2 has also been previously extensively reported (PMID: 2654809, 1714463, 17412879, 20109525, 11477074). The main conceptual novelty may lie in the connection between these points in myoblasts. But as discussed in another comment, the use of C2C12 cells as a model for senescence study is questionable due to the lack of the key regulator p16. The findings in C2C12 cells may not accurately represent physiological-relevant myoblasts. It is recommended that these findings be validated in primary myoblasts to strengthen the study's conclusions.

      This is the first study to show a molecular mechanism where activation of HRas signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of HRas inhibits the differentiation of skeletal myoblasts.

      (2) The C2C12 cell line is not an ideal model for senescence study.

      C2C12 cells are a well-established model for studying myogenesis. However, their suitability as a model for senescence studies is questionable. C2C12 cells are immortalized and do not undergo normal senescence like primary cells as C2C12 cells are known to have a deleted p16/p19 locus, a crucial regulator of senescence (PMID: 20682446). The use of C2C12 cells in published studies does not inherently validate them as a suitable senescence model. These studies may have limitations, and the appropriateness of the C2C12 model depends on the specific research goals.

      Several reports have shown that cells undergo senescence independent of p16 expression. MCF7 human breast adenocarcinoma cells have been shown to undergo DNA damage mediated and Oncogene induced senescence as seen after treatment with Doxorubicin (PMID: PMC7025418) and expression of constitutively active HRas (PMID: 17135242), despite the homozygous deletion of p16 locus (ISBN 9780124375512 Chapter 17 Table 2) by upregulation of cell cycle inhibitor protein p21. In this study, we observe an increase in the senescence markers in C2C12 cells after treatment with Doxo (Fig. 1). We also observed an increase in the markers of DNA damage-mediated senescence in MCF7 after treatment with Doxo (Data will be included in the revised manuscript). Based on these observations, we have concluded that C2C12 cells undergo senescence despite lacking the p16/p19 locus.

      In the study by Moustogiannis et al. (PMID: 33918414), they claimed to have aged C2C12 cells through multiple population doublings. However, the SA-β-gal staining in their data, which is often used to confirm senescence, showed almost fully confluent "aged" C2C12 cells. This confluent state could artificially increase SA-β-gal positivity, suggesting that these cells may not truly represent senescence. Moreover, the "aged" C2C12 cells exhibited normal proliferation, which contradicts the definition of senescence. Similar findings were reported in another study of C2C12 cells subjected to 58 population doublings (PMID: 21826704), where even at this late stage, the cells were still dividing every 2 or 3 days, similar to younger cells at early passages. More importantly, I do know how the p16 was detected in that paper since the locus was already mutated. In terms of p21, there was no difference in the proliferative C2C12 cells at day 0.

      In the study by Moiseeva et al. in 2023 (PMID: 36544018), C2C12 cells were used for senescence modeling for siRNA transfection. However, the most significant findings were obtained using primary satellite cells or confirmed with complementary data.

      In conclusion, while molecular changes observed in studies using C2C12 cells may be valid, the use of primary myoblasts is highly recommended for senescence studies due to the limitations and questionable senescence characteristics of the C2C12 cell line.

      (3) Regarding source of increased PGD in the conditioned medium, I want to emphasize that it's unclear whether the PGD or its metabolites increase in response to DNA damage or the senescence state. Thus, using a different senescent model to exclude the possibility of DNA damage-induced increase will be crucial.

      Though Senescence can be induced by several stress stimuli like DNA damage, Oncogene expression, ROS, Mitochondrial Dysfunction, etc., DNA damage remains critical for the induction of the SASP (reviewed in PMID: 20078217). Also, other models of senescence, like Oncogene Induced Senescence (reviewed in PMID: 17671427), ROS Induced Senescence (PMID: 24934860), Mitochondrial Dysfunction Associated Senescence (MiDAS) (PMID: 26686024) have shown upregulation of DNA damage-associated signaling pathways. In this study, we have explored the SASP of cells undergoing senescence upon chemotherapy drug Doxorubicin-mediated DNA damage.

      (4) Similarly for the in vivo Doxorubicin (Doxo) injection, both reviewers have raised concerns about the potential side effects of Doxo, including inflammation, DNA damage, and ROS generation. These effects could potentially confound the results of the study. The physiological significance of this study will heavily rely on the in vivo data. However, the in vivo senescence component is confounded by the side effects of Doxo.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (5) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of conditioned medium. The author took it for granted that the conditioned medium from senescent cells would inhibit myogenesis, relying on previous publications (PMID: 37468473). However, that study was conducted in the context of myotonic dystrophy type 1. To support the inhibitory effect in the current experimental settings, direct evidence is required. It would be necessary to include another control with conditioned medium from normal, proliferative C2C12 cells.

      Conditioned medium of senescent cells of several types, like senescent myoblasts in case of DM1 (PMID: 37468473), adipocytes undergoing senescence due to H2O2 treatment, Insulin Resistance, and Replicative senescence (PMID: 37321332), has been shown to inhibit the differentiation of myoblasts. Therefore, in this study, we measured the effect of prostaglandin PGD2 and its metabolites on the differentiation of myoblasts by inhibiting the biosynthesis of PGD2 in senescent myoblasts by treatment with AT-56. We inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment.

      (6) Statistical analyses problems.

      Only t-test was used throughout the study even when there are more than two groups. Please have a statistician to evaluate the replicates and statistical analyses used.

      In experiments with more than two groups, the t-test was used for column-wise comparison of the experiment samples to the control sample. Multiple sample comparisons using one-way or two-way ANOVA were avoided as experimental samples were individually compared to the control sample.

      For the 15d-PGJ2/cell concentration measurements in Figure 1F, there were only two replicates, which was provided in the supplementary table after required. Was that experiment repeated with more biological replicates?

      Additional replicates of the experiment will be included in the revised manuscript.

      For figure 1C, Fig 1F, 1G, 1J, 2C, 2E, 3A, 3E, 3F, 4D, 4E, please include each data points in bar graphs as used in Fig 1D, or at least provide how many biological replicates were used for each experiment?

      Appropriate revisions will be made in the figure legends of the revised manuscript.

      There is no error bar in a lot of control groups (Fig 2C, 2E, 3EF, 4E, S4B).

      There are no error bars for the control groups in the figures 2C, 2E, 3E, 3F, 4E, and S4B as the experimental samples of each replicate were normalized to the corresponding control sample, rendering the values for the control sample of each replicate to 1.

      For qPCR data in Figure 1C, the author responded in that the data in was plotted using 2-ΔCT instead of 2-ΔΔCT to show the variability in the expression of mRNAs isolated from animals treated with Saline. This statement does not align with the method section. Please revise.

      Appropriate revisions will be made to the method sections of the revised manuscript.

      (7) For Figure 1, the title may not be appropriate as there is insufficient data to support the inhibition of myoblast differentiation.

      Appropriate revisions will be made to the revised manuscript.

      Recommendations for the authors:

      After careful review, the editors advise you to carefully address the following concerns.

      (1) There were concerns that in the revised manuscript, the DMSO and Doxo experiments depicted in Figure 1H appeared quite homogenous despite the author's description to the contrary. This leads to concerns about the type of statistics employed and the possible low number of replicates of experiments shown in Fig. 1.

      (2) Experiments in Figure 1F, 1I, and 1J had as few as n=2 experiments. Figures 1C, 1D, 1F, 1G, and 1J, the statistics used a two-tailed student's t-test; for all other experiments, they marked N/A for statistics. Using a t-test for multi-group comparisons (as indicated in the figure legend) and relying on only 2 replicates for many experiments are not appropriate.

      Additional replicates for the experiments shown in figures 1F, 1I, and 1J have been done and the data will be revised along with updated statistical tests during the revision of the manuscript.

      (3) In several experiments, the difference between technical replicates is too high.

      Reviewer #1 (Recommendations For The Authors):

      Most of my concerns were addressed in the revised manuscript.

      We thank the reviewer for their time in reviewing the manuscript and consideration of the author’s response to their comments in during the previous round of review.

      Reviewer #2 (Recommendations For The Authors):

      Validating the findings in a primary myoblast is highly recommended for senescence studies due to the limitations and questionable senescence characteristics of the C2C12 cell line.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      Validate the finding in a different senescent model to exclude the possibility of DNA damage-response.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      For Fig 2A, add another control with a conditioned medium from normal, proliferative C2C12 cells.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      Please have a statistician to evaluate the replicates and statistical analyses used.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      For the barplots (figure 1C, Fig 1F, 1G, 1J, 2C, 2E, 3A, 3E, 3F, 4D, 4E), please include each data points, or at least provide how many biological replicates were used for each experiment.

      Appropriate revisions will be made in the figure legends of the revised manuscript.

      For Figure 1, the title may not be appropriate as there is insufficient data to support the inhibition of myoblast differentiation.

      Appropriate revisions will be made to the revised manuscript.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides useful information about the lipid metabolite 15d-PGJ2 as a potential regulator of myoblast senescence. The authors provide experimental evidence that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas. However, the manuscript is incomplete in its current form, as it lacks robust support from the data regarding the main conclusions related to senescence and technical concerns related to the senescence models used in this study.

      We are grateful to the editors and the reviewers for their time and comments in sharpening the science and the writing of the manuscript. We have attached a detailed response to emphasize that the manuscript does include robust evidence regarding the claims, which could have been missed during the review process. We have provided a better context for these points now.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B).

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We measured the viability of C2C12 cells after 24 hours of treatment with 15d-PGJ2 using the MTT assay and observed that the viability of cells was decreased after treatment with 15d-PGJ2 (10 µM) but not with 15d-PGJ2 (1 µM, 2 µM, 4 µM, or 5 µM) (see Fig. S2A of the updated manuscript). The results and figures of the manuscript have been updated accordingly.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death. 

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears. 

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15dPGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15dPGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation.

      The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      The novelty of the study is compromised as the activation of PGD and 15d-PGJ2, as well as the regulation of HRas and cell proliferation, have been previously reported. 

      Literature does not support this statement, and it is important to clarify this misimpression for the field as a whole. 

      Let us clarify- 

      Covalent modification of HRas by 15d-PGJ2 has been reported only twice in the literature(Luis Oliva et al., 2003; Yamamoto et al., 2011) in fibroblasts and neurons respectively. 

      Interaction between Hras and 15d-PGJ2 in skeletal muscles has not been shown before, even though both Hras and 15d-PGJ2 are shown to be key regulators of muscle homeostasis. 

      Activation of Hras by 15d-PGJ2 was reported first by Luis Oliva et al (Luis Oliva et al., 2003). However, this study does not comment on the functional implications of activation of Hras signaling. 

      Recently, our lab contributed to a study where the functional implication of activation of Hras signaling due to covalent modification by 15d-PGJ2 was shown in the maintenance of senescence phenotype (Wiley et al., 2021). 

      15d-PGJ2 was shown to inhibit the differentiation of myoblasts by Hunter et al (Hunter et al., 2001). This study hypothesized that the inhibition of myoblast differentiation is via 15d-PGJ2 mediated activation of the PPARγ signaling, the study also showed inhibition of myoblast differentiation independent of PPARγ activity, suggesting the presence of other mechanisms.

      This is the first study to show a molecular mechanism where activation of Hras signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of Hras inhibits the differentiation of skeletal myoblasts.

      Additionally, there are major technical concerns related to the senescence models, limiting data interpretation regarding the relevance to senescent cells.

      Major concerns:

      (1) The C2C12 cell line is not an ideal model for senescence study due to its immortalized nature and lack of normal p16 expression. A more suitable myoblasts model is recommended, with a more comprehensive characterization of senescence features.

      C2C12 is a good model for DNA damage-based senescence that is used in this manuscript. Several reports in the literature have shown the induction of senescence in C2C12 cells. Moiseeva et al 2023 show induction of senescence in C2C12 cells after etoposide-mediated DNA damage. Moustogiannis et al 2021 show the induction of replicative senescence in C2C12 cells. In this study, we show that C2C12 cells undergo DNA damage-mediated senescence after treatment with Doxo. We measured the induction of senescence in C2C12 cells upon DNA damage using several physiological (Nuclear Size, Cell Size, and SA β-gal) and molecular markers (mRNA levels of p21 and SASP factors (IL6 and TGFβ), protein levels of p21) of senescence (see Fig. 1 of the updated manuscript). The results and the figures in the manuscript have been updated accordingly.

      (2) The source of increased PGD or its metabolites in the conditioned medium is unclear. Including other senescence models, such as replicative or oncogeneinduced senescence, would strengthen the study.

      Fig. 1E shows time-dependent increase in the expression of PGD2 biosynthetic enzymes in senescent C2C12 cells. Fig. 1F shows an increase in the levels of 15dPGJ2 secreted by senescent C2C12 cells in the conditioned medium. This data shows that senescent C2C12 cells are the source of PGD and its metabolites in the conditioned medium.

      Again, C2C12 is not suitable for replicative senescence due to its immortalized status.

      We and others have shown that C2C12 cells undergo senescence, and this manuscript only used DNA damage induced senescence.

      (3) In the in vivo part, it is unclear whether the increased expression of PTGS1, PTGS2, and PTGDS is due to senescence or other side effects of DOXO.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (4) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of a conditioned medium.

      Figure 2A tests the effect of prostaglandin PGD2 and its metabolites secreted by the senescent cells on the differentiation of myoblasts. Therefore, we inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment, whereas differentiation of C2C12 cells without any treatment serves as a positive control.

      There is no explanation of how differentiation was quantified or how the fusion index was calculated.

      The fusion index was calculated using a published myotube analyzer software (Noë et al., 2022). Appropriate information has been added to the materials and methods section of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 3: Expand SA in "SA β-gal".

      The manuscript has been updated accordingly (See line 3).

      Line 68: HRas is highly regulated by lipid modifications.

      The manuscript has been updated accordingly (See line 67).

      Figures

      Figure S1A seemed incomplete (maybe some processing issue).

      The Figure has been updated in the revised manuscript (See Fig. S1A).

      Figure S1B-H are mislabeled.

      The figure has been updated in the revised manuscript (See Fig. S1C, D, E, and F).

      Figures S1E-H are not mentioned in the manuscript.

      The manuscript has been updated accordingly (See line 120).

      Many supplementary figures are not cited in the article.

      The manuscript has been updated accordingly. (See lines 85, 120, 123, 166, 225, 356, 364, 412, and 413)

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarify the injection method for Doxorubicin in B6J mice on line 83 (IP or IM).

      Mice were injected intraperitoneally with Doxorubicin (as mentioned in the materials and methods, see lines 83 and 794)

      (2) Address missing information in figures or figure legends.

      There is missing piece in Sup Fig 1A.

      The figure has been updated in the revised manuscript (See Fig. S1A).

      Correct labels in Sup Fig 1C and 1D.

      The figure has been updated in the revised manuscript (See Fig. S1C, D, E, and F).

      How would the authors explain the dramatic differences in the morphology of C2C12 cells treated with DOXO between bright field and SA-beta-gal staining images in Sup Fig 1B and 1C.

      The SA β-gal image after treatment with Doxo does show a flattened cell morphology. Another field of view from the same experiment has been added in the figure to show the difference in the cell morphology more prominently in the revised manuscript (See Fig. 1H).

      Provide explanations for Sup Fig 1E-1G, including the meaning of the y-axis and the blue dots and red lines.

      We have provided an explanation for the multiple reaction monitoring mass spectrometry used to measure the concentration of 15d-PGJ2 in the conditioned medium in the revised manuscript (see lines 119-130 and the legends of Fig. S1C, D, and E)

      (3) Please review the calculation of qPCR data in Figure 1C for correctness, ensuring reference samples with an average expression level of 1.

      The data in Fig. 1C was plotted using 2-ΔCT instead of 2-ΔΔCT to show the variability in the expression of mRNAs isolated from animals treated with Saline.

      (4) Please explain the calculation of 15d-PGJ2/cell concentration in Figure 1F and provide raw data for review, considering the substantial changes and small error bars. The method or result section lacks an explanation of how this calculation was performed. Additionally, there is no mention of the cell number count.

      All the raw values (concentration of 15d-PGJ2 measured using mass spec and cell numbers counted at the time of collection of conditioned medium) are provided in the supplementary table 1. The standard curve to calculate the concentration of 15dPGJ2 in the conditioned medium is shown in Fig. S1F. The cell number was counted after trypsinization using a hemocytometer on the day of collection of the conditioned medium.

      (5) Please clarify how cell number normalization and doubling time calculation were done in Fig 2B. Consider replacing the figure with a growth curve showing confluence on the y-axis for easier interpretation.

      Cells were counted every 24 hours and the normalization was done to the number of cells counted on day 0 of the treatment (to consider attaching efficiency and other cell culture parameters). Doubling time was calculated as the reciprocal of the slope of the graph of log2(normalized cell number) vs time.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      The paper is overall convincing. However, a little more attention to data presentation and possibly the addition of at least another technique (see below) would greatly strengthen the findings.

      As we hope to demonstrate below, we have taken steps to improve our manuscript on both fronts (data presentation and experimental evidence).

      The absence of statistics catches immediately the eye. I am sure that the shown differences are statistically significant (thanks to the number of analyzed cells), but reporting the result of some statistical test would help the reader in identify the relevant data in a plot. This is somehow necessary considering that sometimes in the text something is deemed to be "significant" or "not significant", and I felt that I really needed that when looking at the plot in Fig. 3D.

      To facilitate the interpretation of figures that contain data from multiple strains (such as the one mentioned by the reviewer), we have carried out a nonparametric single-step multiple comparison test (Games-Howell) to identify mutants whose means differ significantly from each other. To avoid overcrowding the figures, we have graphically summarized the p-values of all pairwise comparisons in a small matrix within the corresponding panel, and provided 99% confidence intervals and p-values of all differences in the Supplement.

      Related to the previous point: for every N/C distribution analysis, a number of analyzed cells is reported. By the way it is written, it seems that the replication relies solely by the cells in that specific population, i.e.: each cell is treated as a replicate. At least I could not find if that is not the case in the legends or in the methods. I wonder what the results would be (and their significance) if each replicate would be a new assay on another population.

      Cell populations exhibit significant variability in their phenotypic characteristics. Consequently, the quantification of a specific feature (e.g., the Sfp1 nuclear/cytoplasmic ratio) across a sample of cells from a given population results in a distribution rather than a single fixed value. For each quantification, we report the number of cells that were used to construct the corresponding distribution, i.e. the sample size. To compare samples from different populations (e.g., different Sfp1 mutant strains), we run them in parallel during microscopy experiments and compare their means, as described above. Throughout our study, we have tried to ensure that we quantify a sufficiently large number of cells to overcome cell-to-cell variability and enhance the reliability of our results.

      In this context, the question of the reviewer is not entirely clear to us, as individual measurements of a sample are not replicates. However, one can replicate the entire experiment on a different day by re-growing the different strains, running microscopy, quantifying the new movies etc. In this sense, the experiments shown in the manuscript consist of single replicates, i.e. experiments that were carried out on the same day, with all the relevant mutants and controls quantified together. However, we have monitored many of our mutants multiple times over the course of our work. For example, Fig. 1 below shows replicates of the Sfp1 N/C ratio distributions at steady-state in the analog-sensitive (A) and wild-type (B) background, which were quantified several times across various experiments. While day-to-day variability in the empirical distributions of the same mutant exists to a small extent, it is quite small.

      The scale of x axes in N/C ratio plots. Besides not being consistent throughout the figures, it originates from 1, visually enhancing the differences.

      We believe the reviewer was referring to the y-axes, as the x-axes represent time. Summarizing the N/C ratio dynamics of different Sfp1 mutants has been challenging. First, the average N/C ratios at steady-state vary considerably across different mutants, as shown in the panels that summarize steady-state N/C ratios. To compare the magnitude and features of their responses, normalization is necessary. We chose to normalize the time series of each mutant to have a mean of 1 prior to the onset of a perturbation. This allows the normalized time series to represent the percentage-wise changes in the Sfp1 N/C ratio upon perturbation.

      Using a common y-axis scale for all plots of N/C ratio dynamics not ideal, as some responses are subtler than others. Additionally, we do not believe that N/C dynamics across different figures need to (or should) be compared to each other. However, within a figure, panels that require comparison are placed in the same row and share the same y-axis scale. We believe that this approach optimizes data visualization and facilitates important visual comparisons.

      Related to the previous point: it is evident from the plots that the N/C ratio is always positive, even in the most deficient of the analyzed mutants. This implies that a relevant fraction of Sfp1 is still nuclear. I thus wonder what the impact of these mutations would be on the actual function of Sfp1. For this reason, I feel that qPCR evaluation of transcripts of Sfp1 target genes is particularly needed. Since lack of Sfp1 is known to yield some of the smallest cells possible, it would also be cool to have an estimate of the size of mutants where Sfp1 is less nuclear. These analyses could confer phenotypical relevance to the data, but would also help in assessing a currently unexplored possibility, that phosphorylation events by PKA influence Sfp1 function besides its localization, i.e.: the still somehow nuclear fraction is not as functional as wt Sfp1 in promoting transcription.

      It is indeed the case that the recorded N/C ratios are larger than 1 in all strains that we have monitored. We have never observed an N/C ratio smaller than 1 using widefield microscopy for two main reasons: first, out-of-focus light from the cytosol above and below the nucleus is added to the nuclear signal, causing the nuclear signal to always be non-zero, even for predominantly cytosolic proteins. Second, both in- and out of focus vacuoles are devoid of the fluorescent protein fusions that we quantify, which reduces the average brightness of the cytosol. For these reasons, even when a protein is largely cytosolic, the average N/C ratio over a cell population is no lower than around 1.5. Keeping these points in mind, one can observe that our most delocalized Sfp1 mutants have an N/C ratio that is around 1.6-1.7, which is very close to the lower limit. This means that these Sfp1 mutants are largely cytosolic, and the nuclear fraction (if non-zero) is quite small.

      We agree that assessing the phenotypic relevance of Sfp1 mutations is of interest. However, this was impossible with our original strains, as we introduced each Sfp1 mutant as an extra copy in the HO locus while leaving the endogenous Sfp1 locus intact. This was done in order to avoid any phenotypic changes that might result from changes in Sfp1 activity.

      To address the suggestion of the reviewer, we therefore deleted the endogenous Sfp1 copy in strains carrying sfp1PKA2A, sfp1PKA2D and sfp113A, leaving only the mutated Sfp1 copy at the HO locus. Surprisingly, the growth rate and drug sensitivity (determined by halo assays) of these single-copy mutants did not differ much in comparison to the mutants carrying the functional Sfp1 copy and from the wild-type (Supp. Figs. 4J and 7). This observation aligns with findings for the single-copy sfp1-1 mutant in [Lempiäinen et al. 2009], which corresponds to sfp1TOR7A in our work. [Lempiäinen et al. 2009] had suggested that Sch9 compensates for the loss of Sfp1 activity via a feedback mechanism, which could explain our results as well. If this is the case, acute depletion of wild-type Sfp1 could unveil transient changes in cell growth, before the compensatory effect of Sch9 was established. Unfortunately, we were unable to efficiently degrade wild-type Sfp1 carrying a C-terminal auxin-inducible degron. Instead, we followed the same approach with [Lempiäinen et al. 2009] and deleted SCH9.

      As we describe in the last section of Results, the difference was dramatic for sfp113A __mutants, which were extremely slow-growing in the absence of Sch9 (doubling time was around 4 hours, but it was hard to estimate because we could not grow the cells consistently). Interestingly, SCH9 deletion had a negative impact on sfp1__PKA2D __but not sfp1__PKA2A __cells (__Supp. Fig. 7). Overall, these results demonstrate that Sch9 can compensate for loss of Sfp1 activity, which makes it challenging to study the impact of Sfp1 mutations on cellular phenotypes.

      To further understand to what extent Sch9 compensates for loss of Sfp1 phosphorylation, we carried out RNA-seq on WT and cells carrying a single copy of sfp113A (with the endogenous SFP1 copy removed). Despite the fact that sfp113A __grow as well as WT, RNA-seq picked up several differentially expressed genes related to amino acid biosynthesis. This surprising finding is presented in the last section of Results, and in __Supplementary Figures 8, 9 and 10. We explore the relevance of these results and their connection with past literature on Sfp1 and Sch9 in the Discussion section.

      I found some typos here and there, and it would greatly help to report them if in the manuscript line numbers were included.

      We apologize for the typos. We have tried to eliminate them, and we have also added line numbers to the manuscript.

      Reviewer 2

      There is no biochemical evidence presented that the putative PKA sites (S105 and S136) are genuinely phosphorylated by PKA. The fact that they match the PKA consensus motif, alone, does not guarantee this. In order to claim that they are looking at the effect of PKA by mutagenizing these residues, the authors have to demonstrate the PKA-dependency of S105 and S136 phosphorylation by, for example, mass spec experiments or western blotting with phospho-specific antibodies (Cell Signaling Technology #9624 for example). Also, does the band-shift caused by PKA inhibition (Fig 3C) is canceled by the S105A/S136A mutation?

      We took several actions to demonstrate that the putative PKA sites are indeed phosphorylated by PKA. We first tried to detect Sfp1 phosphorylation using the antibody mentioned by the reviewer, but failed as the sensitivity of this antibody appears to be quite low. On the other hand, mass spectrometry did not produce the right fragments to detect the sites of interest. We therefore resorted to an in vitro kinase assay using [γ-32P]ATP together with purified PKA and Sfp1. Unfortunately, bacterial overexpression of MBP-tagged Tpk1, Tpk2 and Tpk3 (the catalytic subunits of PKA) was quite challenging and we were unable to produce soluble protein. We therefore resorted to commercially available bovine PKA (bPKA, PKA catalytic subunit, Sigma-Aldrich 539576), which shows high homology to the yeast Tpk kinases [Toda et al. 1987]. Moreover 87% of bPKA substrates have been shown to also be Tpk1 substrates [Ptacek et al. 2005], and bPKA has been used to identify new Tpk substrates in budding yeast [Budovskaya et al. 2005__]. As we show in the revised manuscript, bovine PKA does phosphorylate Sfp1. Moreover, phosphorylation is reduced by 50% in the double S105A, S136A mutant (Fig.1F), and becomes undetectable in the 13A mutant__ (Supp Fig. 6). Together with the rapid response of Sfp1 localization to acute PKA inhibition which we had already reported, we believe that these results provide strong evidence that Sfp1 is a direct PKA substrate, and that the two phosphosites that we identified are functional.

      As the above in vivo experiments do not exclude S105/S136 phosphorylation by other kinases downstream of PKA, in order to claim the direct phosphorylation, the authors need in vitro PKA kinase assay. These biochemical experiments are not trivial, but I think absolutely necessary for this story.

      One cannot exclude that S105/S136 are also phosphorylated by other kinases of the AGC family (note that [Lempiäinen et al. 2009] has already excluded Sch9). However, as we hope to have shown, PKA indeed phosphorylates Sfp1. Examining if other kinases besides PKA and TORC1 target Sfp1 is a very interesting question that should be addressed in future work.

      The authors only look at the localization of Sfp1. To assess its functionality and so physiological impact, it would be informative to measure the mRNA level of target ribosomal genes in various Sfp1 mutants they created.

      As we described in our response to Reviewer 1 above, we did perform RNA-seq on WT and cells carrying a single copy of sfp113A. We observed a notable absence of differentially expressed ribosomal genes and ribosome-related categories in the GO analysis (Supp. Figs. 8, 9 and 10). Together with our observations on SCH9 deletion (Supp. Fig. 7), these results suggest that Sch9 can largely compensate for the loss of Sfp1 activity. On the other hand, the emergence of differentially expressed amino acid biosynthesis genes is a finding that merits further investigation, as it connects with previous observations made with Sch9 deletion mutants and the [ISP+] prion form of Sfp1 (cf. Discussion).

      In the experiments using analog-sensitive PKA (Fig 1D and E for example), they directly compare wildtype-PKA versus analog sensitive-PKA, or with 1-NM-PP1 versus without 1-NM-PP1. This makes interpretation difficult, particularly because 1-NM-PP1 itself has a significant impact even in the wild PKA strain. The real question is the difference between wild-type Sfp1 versus mutant Sfp1. In the current form, they compare Fig 1D versus 1E, these two do not look like a single, side-by-side experiment. They should compare wild-type Sfp1 versus mutant Sfp1 side-by-side.

      Figure 1D shows that 1-NM-PP1 has a transient off-target effect on Sfp1 localization in WT cells, which could also affect Sfp1 mutants. This observation prompted us to use wild-type PKA as a control when testing the effect of 1-NM-PP1 on sfp1PKA2D in cells carrying PKAas (Figure 1E). As Fig. 1E shows, the effect of 1-NM-PP1 on sfp1PKA2D localization in PKAas cells is quite similar to the off-target effect in cells carrying sfp1__PKA2D __and wild-type PKA. This behavior of sfp1__PKA2D __is clearly different from the response of wild-type Sfp1 to PKAas inhibition, which results in sustained delocalization. We have made the latter observation repeatedly, both in this study and our previously published work [Guerra et al. 2021].

      In Figure 3, the argument around the additive effects of PKA and TORC1 is confusing. The authors say they are additive referring Figure 3E, but say they are not additive referring Figure 3B. Which is true? In fact, Figure 3B appears to show an additive effect as well.

      We did not use the word "additive" in the text, because we find it difficult to interpret. Instead, we state that PKA and TORC1 appear to control Sfp1 phosphorylation independently of each other. PKA and TORC1 phosphorylation converges to the same response, affecting Sfp1 localization. It appears that loss of either kinase delocalizes Sfp1, while loss of both kinases may only have a small additional effect.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study identifies new types of interactions between Drosophila gustatory receptor neurons (GRNs) and shows that these interactions influence sensory responses and behavior. The authors find that HCN, a hyperpolarization-activated cation channel, suppresses the activity of GRNs in which it is expressed, preventing those GRNs from depleting the sensillum potential, and thereby promoting the activity of neighboring GRNs in the same sensilla. HCN is expressed in sugar GRNs, so HCN dampens the excitation of sugar GRNs and promotes the excitation of bitter GRNs. Impairing HCN expression in sugar GRNs depletes the sensillum potential and decreases bitter responses, especially when flies are fed on a sugar-rich diet, and this leads to decreased bitter aversion in a feeding assay. The authors' conclusions are supported by genetic manipulations, electrophysiological recordings, and behavioral assays.

      Strengths:

      (1) Non-synaptic interactions between neurons that share an extracellular environment (sometimes called "ephaptic" interactions) have not been well-studied, and certainly not in the insect taste system. A major strength of this study is the new insight it provides into how these interactions can impact sensory coding and behavior.

      We appreciate the reviewer’ view that our findings may allow researchers to better understand sensory coding and behavior. However, we respectfully disagree that the SP homeostasis in Drosophila gustation we describe here pertains to ephaptic interaction. Although SP reduction was proposed as the basis of post-ephaptic hyperpolarization in Drosophila olfaction, we find that SP changes are found to be too slow to mediate the fast action of ephaptic inhibition in gustation, reported in the ref#17. We observed a slow, sweet-dependent SP depletion (Fig. 5B, revised), which takes more than one hour. The real-time change of SP was also slow even upon contact with 200-mM sucrose; this result was set aside for another manuscript in preparation. Therefore, we believe the main findings in this paper concern the homeostatic preservation of SP for the maintenance of gustatory function, not ephaptic interaction.

      (2) The authors use many different types of genetic manipulations to dissect the role of HCN in GRN function, including mutants, RNAi, overexpression, ectopic expression, and neuronal silencing. Their results convincingly show that HCN impacts the sensillum potential and has both cell-autonomous and nonautonomous effects that go in opposite directions. There are a couple of conflicting or counterintuitive results, but the authors discuss potential explanations.

      (3) Experiments comparing flies raised on different food sources suggest an explanation for why the system may have evolved the way that it did: when flies live in a sugar-rich environment, their bitter sensitivity decreases, and HCN expression in sugar GRNs helps to counteract this decrease.

      Weaknesses/Limitations:

      (1) The genetic manipulations were constitutive (e.g. Ih mutations, RNAi, or misexpression), and depleting Ih from birth could lead to compensatory effects that change the function of the neurons or sensillum. Using tools to temporally control Ih expression could help to confirm the results of this study.

      We attempted to address this point by using the tub-Gal80ts system. The result is now included as Fig. 1-figure supplement 2. At 29C, a non-permissive temperature for GAL80ts which allows GAL4-dependent expression Ih-RNAi, we observed that bGRN responses were decreased and sGRN responses were increased compared to the control maintained at 18°C, and this is in parallel with the result in Fig. 1C,D. For this experiment, we inserted “To exclude the possibility that Ih is required for normal gustatory development, we temporally controlled Ih RNAi knockdown to occur only in adulthood, which produced similar results (Fig. 1-figure supplement 2).” (~line 113).

      (2) The behavioral experiment shows a striking loss of bitter sensitivity, but it was only conducted for one bitter compound at one concentration. It is not clear how general this effect is. The same is true for some of the bitter GRN electrophysiological experiments that only tested one compound and concentration.

      We conducted additional behavioral experiments with other bitters such as lobeline and theophylline (Fig. 5-figure supplement 1), which showed sensitivity losses in Ih mutants similar to caffeine. For these results, the following is inserted at ~line 274: “These results were recapitulated with other bitters, lobeline and theophylline (Fig. 5-figure supplement 1).”

      We also added single sensillum recording data with bitters, berberine, lobeline, theophylline and umbelliferone, which yielded results similar to those obtained with caffeine (Fig. 1-figure supplement 1). This is described with the sentence at ~line 105 “Other bitter chemical compounds, berberine, lobeline, theophylline, and umbelliferone, also required Ih for normal bGRN responses (Fig. 1-figure supplement 1).”

      (3) Several experiments using the Gal4/UAS system only show the Gal4/+ control and not the UAS/+ control (or occasionally neither control). Since some of the measurements in control flies seem to vary (e.g., spiking rate), it is important to compare the experimental flies to both controls to ensure that any observed effects are in fact due to the transgene expression.

      We appreciate the reviewers for raising this point. Indeed, there was a small logical flaw with the controls. We have now included all the necessary controls for Fig. 1C-F, Fig. 2I,J, Fig. 4E, and Fig. 5D, as reviewers suggested. These experiments remained statistically significant after including the new control groups.

      (4) I was surprised that manipulations of sugar GRNs (e.g. Ih knockdown, Gr64a-f deletion, or Kir silencing) can impact the sensillum potential and bitter GRN responses even in experiments where no sugar was presented.

      We are afraid there is a misunderstanding on the early part of the paper. We suspected that the manipulations impacted bGRNs and SP due to the sweetness in the regular cornmeal food, as stated in lines 214-220 “Typically, we performed extracellular recordings on flies 4-5 days after eclosion, during which they were kept in a vial with fresh regular cornmeal food containing ~400 mM D-glucose. The presence of sweetness in the food would impose long-term stimulation of sGRNs, potentially requiring the delimitation of sGRN excitability for the homeostatic maintenance of gustatory functions. To investigate this possibility, we fed WT and Ihf03355 flies overnight with either non-sweet sorbitol alone (200 mM) or a sweet mixture of sorbitol (200 mM) + sucrose (100 mM).”

      I believe the authors are suggesting that the effects of sugar GRN activity (e.g., from consuming sugar in the fly food prior to the experiment) can have long-lasting effects, but it wasn't entirely clear if this is their primary explanation or on what timescale those long-lasting effects would occur. How much / how long of a sugar exposure do the flies need for these effects to be triggered, and how long do those effects last once sugar is removed?

      We attempted to address this point with additional experiments (Fig. 5A,B). The reduction of SP could be observed in WT and HCN-deficient mutants with similar degrees 1 hr after the flies were transferred from nonsweet sorbitol-containing vials to sweet sucrose-containing ones. Moreover, the mutants, but not WT, showed further depression of SP when the sweetness persisted in the media for 4 hrs and overnight. This long-term exposure to sweetness longer than 1 hr may simulates the feeding on the regular sweet cornmeal food. The recovery of SP was also tested by removing flies from the sweet media after overnight-long sweet exposure and placing them in sorbitol food. SPs of WT and the mutants were recovered to the similar levels 1 hr after separating the animals from sweetness, although the HCN-lacking mutants showed much lower SP right after overnight sweetness exposure. The unimpaired recovery of the mutants suggests that HCN is independent of generating transepithelial potential itself. Therefore, regardless of HCN, SP changes are not fast even in the presence of strong sweetness, and SP is much better guarded when sGRNs express HCN in a sweet environment.

      We inserted the following at ~line 260 to describe the newly added recovery experiment: “Following overnight sweet exposure, SPs of WT and Ihf03355 were recovered to similar levels after 1-hr incubation with sorbitol only food. However, it was after 4 hrs on the sorbitol food that the two lines exhibited SP levels similar to those achieved by overnight incubation with sorbitol only food (Fig. 5B). These results indicate that SP depletion by sweetness is a slow process, and that the dysregulated reduction and recovery of SPs in Ihf03355 manifest only after long-term conditioning with and without sweetness, respectively.”.

      (5) The authors mention that HCN may impact the resting potential in addition to changing the excitability of the cell through various mechanisms. It would be informative to record the resting potential and other neuronal properties, but this is very difficult for GRNs, so the current study is not able to determine exactly how HCN affects GRN activity.

      On this point, we cannot but rely on previous studies of biophysical and electrophysiological characterization on mammalian HCN channels and a heterologous expression study that revealed a robust hyperpolarization-activated cation current from Drosophila HCN channels (PMID: 15804582).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors start by showing that HCN loss-of-function mutation causes a decrease in spiking in bitter GRNs (bGRN) while leaving sweet GRN (sGRN) response in the same sensillum intact. They show that a perturbation of HCN channels in sweet-sensing neurons causes a similar decrease while increasing the response of sugar neurons. They were also able to rescue the response by exogenous expression. Ectopic expression of HCN in bitter neurons had no effect. Next, they measure the sensillum potential and find that sensillum potential is also affected by HCN channel perturbation. These findings lead them to speculate that HCN in sGRN increases sGRN spiking which in turn affects bGRNs. To test this idea that carried out multiple perturbations aimed at decreasing sGRN activity. They found that decreasing sGRN activity by either using receptor mutant or by expressing Kir (a K+ channel) in sGRN increased bGRN responses. These responses also increase the sensillum potential. Finally, they show that these changes are behaviorally relevant as conditions that increase sGRN activity decrease avoidance of bitter substances.

      Strengths:

      There is solid evidence that perturbation of sweet GRNs affects bitter GRN in the same sensillum. The measurement of transsynaptic potential and how it changes is also interesting and supports the authors' conclusion.

      Weaknesses:

      The ionic basis of how perturbation in GRN affects the transepithelial potential which in turn affects the second neuron is not clear.

      We speculate that HCN-dependent membrane potential regulation, rather than ionic composition change, is responsible for the observed SP preservation, as further discussed as an author response in the section of “Recommendations for the authors”. The transepithelial potential can be dissipated by increased conductance through receptor-linked ion channels following gustatory receptor activation in GRNs. The volume of the sensillum lymph is very small according to electron micrographs of horizontally sliced bristles (PMID: 11456419). Therefore, robust excitation of a gustatory neuron may easily deplete the extracellular potential built as a form of polarized ion concentrations across the tight junction. When the consumption is too strong and extended, the neighboring neuron, which share TEP with the activated GRN, can be negatively affected. We propose that HCN suppresses overexcitation of sGRNs by means of membrane potential stabilization. This stabilization prevents sGRNs from excessively reducing the TEP, thereby protecting the activity of neighboring bGRNs.

      Reviewer #3 (Public Review):

      Ephaptic inhibition between neurons housed in the same sensilla has been long discovered in flies, but the molecular basis underlying this inhibition is underexplored. Specifically, it remains poorly understood which receptors or channels are important for maintaining the transepithelial potential between the sensillum lymph and the hemolymph (known as the sensillum potential), and how this affects the excitability of neurons housed in the same sensilla.

      Although a reduction of sensillum potential was proposed to underlie membrane hyperpolarization of post-ephaptic olfactory neurons in Drosophila, our preliminary data (not shown due to a manuscript in preparation) and the results included in the paper (Fig. 5B) strongly suggest that SP reduction is not a requisite for ephaptic inhibition at least in GRNs. Ephaptic inhibition is expected to be instantaneous, whereas we find that SP reduction in gustation is very slow. Therefore, we would like to indicate that the findings we report in this manuscript are not directly related to ephaptic inhibition.

      Lee et al. used single-sensillum recordings (SSR) of the labellar taste sensilla to demonstrate that the HCN channel, Ih, is critical for maintaining sensillum potential in flies. Ih is expressed in sugar-sensing GRNs (sGRNs) but affects the excitability of both the sGRNs and the bitter-sensing GRNs (bGRNs) in the same sensilla. Ih mutant flies have decreased sensillum potential, and bGRNs of Ih mutant flies have a decreased response to the bitter compound caffeine. Interestingly, ectopic expression of Ih in bGRNs also increases sGRN response to sucrose, suggesting that Ih-dependent increase in sensillum potential is not specific to Ih expressed in sGRNs. The authors further demonstrated, using both SSR and behavior assays, that exposure to sugars in the food substrate is important for the Ih-dependent sensitization of bGRNs. The experiments conducted in this paper are of interest to the chemosensory field. The observation that Ih is important for the activity in bGRNs albeit expressed in sGRNs is especially fascinating and highlights the importance of non-synaptic interactions in the taste system.

      Despite the interesting results, this paper is not written in a clear and easily understandable manner. It uses poorly defined terms without much elaboration, contains sentences that are borderline unreadable even for those in the narrower chemosensory field, and many figures can clearly benefit from more labeling and explanation. It certainly needs a bit of work.

      We would like to revise the language aspect of the manuscript after finalizing the scientific revision.

      Below are the major points:

      (1) Throughout the paper, it is assumed that Ih channels are expressed in sugar-sensing GRNs but not bitter-sensing GRNs. However, both this paper and citation #17, another paper from the same lab, contain only circumstantial evidence for the expression of Ih channels in sGRNs. A simple co-expression analysis, using the Ih-T2A-GAL4 line and Gr5a-LexA/Gr66a-LexA line, all of which are available, could easily demonstrate the co-expression. Including such a figure would significantly strengthen the conclusion of this paper.

      We did conduct confocal imaging with Ih-T2A-Gal4 in combination with GRN Gal4s (ref#17 version2). The expression is very broad, including both neurons and non-neuronal cells. We observed much stronger sGRN expression than bGRN expression. But the promiscuous expression of the reporter in many cells hindered us from clearly demonstrating the void of the reporter in bGRNs. However, the functional and physiological examination of Ih-T2A-Gal4 with the neuronal modifiers such as TRPA1 and Kir2.1 in ref#17 indicates the strong and little expression of Ih in sGRNs and bGRNs, respectively. Furthermore, the RNAi kd results present another line of evidence that HCN expressed in sGRNs regulates SP and bGRN activity (Fig. 1C,D, Fig. 1-figure supplement 2). Ih-RNAi expression in bGRNs did not result in any statistically significant changes in the activities of sGRNs and bGRNs compared to controls (Fig. 1C,D, revised), advocating that Ih acts in sGRNs for the functional homeostasis of SP and GRNs, as we claim.

      (2) Throughout this paper, it is often unclear which class of labellar taste sensilla is being recorded. S-a, S-b, I-a, and I-b sensilla all have different sensitivities to bitters and sugars. Each figure should clearly indicate which sensilla is being recorded. Justification should be provided if recordings from different classes of sensilla are being pooled together for statistics.

      We mainly performed SSR (single sensillum recording) on i-type bristles as they have the simplest composition of GRNs compared to s- and L-type bristles. As single s-types also contain each of s- and bGRN, we measured SP also for s-types (Figs. 2, 3F and 4D). In case of Fig.3-figure supplement 1, L-types were tested for the relationship between water cell activity and SP. Now all the panels are labelled with the tested bristle types.

      (3) In many figures, there is a lack of critical control experiments. Examples include Figures 1C-F (lacking UAS control), Figure 2I-J (lacking UAS control), Figure 4E (lacking the UAS and GAL4 control, and it is also strange to compare Gr64f > RNAi with Gr66a > RNAi, instead of with parental GAL4 and UAS controls.), and Figure 5D (lacking UAS control). Without these critical control experiments, it is difficult to evaluate the quality of the work.

      Thank you for pointing this out. We appreciate the feedback and have addressed these concerns by including all the requested controls in the figures. Specifically, we have added the UAS controls for Figs 1C-F and 2I-J, as well as the UAS and GAL4 controls for Fig. 4E. We have also included the UAS control for Fig. 5D.

      (4) Figure 2A could benefit from more clarification about what exactly is being recorded here. The text is confusing: a considerable amount of text is spent on explaining the technical details of how SP is recorded, but very little text about what SP represents, which is critical for the readers. The authors should clarify in the text that SP is measuring the potential between the sensillar lymph, where the dendrites of GRNs are immersed, and the hemolymph. Adding a schematic figure to show that SP represents the potential between the sensillar lymph and hemolymph would be beneficial.

      SP was defined at lines 55-56 in the first paragraph of introduction, which also contains the background information for SP as a transepithelial potential. As reviewer suggested, we now also included a sentence describing SP (“SP is known as a transepithelial potential between the sensillum lymph and the hemolymph, generated by active ion transport through support cells”, line 126) and a drawing to illustrate the concept of SP (Fig. 2A), and revised the legend.

      (5) The sGRN spiking rate in Figure 4B deviates significantly from previous literature (Wang, Carlson, eLife 2022; Jiao, Montell PNAS 2007, as examples), and the response to sucrose in the control flies is not dosage-dependent, which raises questions about the quality of the data. Why are the responses to sucrose not dosage-dependent? The responses are clearly not saturated at these (10 mM to 100 mM) concentrations.

      Our recordings show different spiking frequencies from others’ work, because the frequencies are from 5-sec bins not only first 0.5 sec. This lowers the frequencies, as spikes are relatively more frequent in the beginning of the recording (Fig. 4-figure supplement 1).

      Why are the responses to sucrose not dosage-dependent? The responses are clearly not saturated at these (10 mM to 100 mM) concentrations.

      We were also puzzled with the flat dose dependence to sucrose. This result may suggest the existence of another mechanism moderating sucrose responses of sGRNs. This flat curve reappeared with other genotypes with the same concentration range (5-50 mM) in Fig. 4E. However, 1-mM sucrose produced much lower spiking frequencies (Fig. 4E), suggesting that sGRN responses are saturated at 5 mM sucrose with our recording/analysis condition.

      (6) In Figure 4C, instead of showing the average spike rate of the first five seconds and the next 5 seconds, why not show a peristimulus time histogram? It would help the readers tremendously, and it would also show how quickly the spike rate adapts to overexpression and control flies. Also, since taste responses adapt rather quickly, a 500 ms or 1 s bin would be more appropriate than a 5-second bin.

      Taste single sensillum recording starts by contacting stimulants, which bars us from recording pre-stimulus responses of GRNs. Therefore, we showed post-stimulus graphs with 1-sec bins (Fig. 4-figure supplement 1) as we reviewer suggested.

      (7) Lines 215 - 220. The authors state that the presence of sugars in the culture media would expose the GRNs to sugar constantly, without providing much evidence. What is the evidence that the GRNs are being activated constantly in flies raised with culture media containing sugars? The sensilla are not always in contact with the food.

      We agree with reviewer. We replaced “long-term stimulation of sGRNs” with “strong and frequent stimulation of sGRNs for extended period”. The word long-term may be interpreted to be constant.

      (8) Line 223. To show that bGRN spike rates in Ih mutant flies "decreased even more than WT", you need to compare the difference in spike rates between the sorbitol group and the sorbitol + sucrose group, which is not what is currently shown.

      The data were examined by ANOVA and a multiple comparison test (Dunn’s) between all the groups regardless of genotypes and conditions in the panel (all the groups sharing the y axis). Therefore, the differences were statistically examined. However, the cited expression we used read like it was about the slope or extent of the decrease. We intended to indicate the difference in the absolute values of spiking frequencies after overnight sweet exposure between the genotypes, while bGRN activities were statistically indifferent between WT and Ih mutants when they were kept only on sorbitol food. We revised it to “decreased to the level significantly lower than WT”. We also changed the graph style to effectively present the trend of changes in bGRN sensitivity with comparison between genotypes. Again, the groups were statistically examined together regardless of the genotypes and conditions.

      (9) To help readers better understand the proposed mechanisms here, including a schematic figure would be helpful. This should show where Ih is expressed, how Ih in sGRNs impacts the sensillum potential, how elevated sensillum potential increases the electrical driving force for the receptor current, and affects the excitability of the bGRNs in the same sensilla, and how exposure to sugar is proposed to affect ion homeostasis in the sensillum lymph.

      As reviewer suggested, we included two panels to show working model for gustatory homeostasis via SP maintenance by HCN (Fig. 5E,F).

      Reviewer #1 (Recommendations For The Authors):

      (1) The relationship between this paper and the authors' bioRxiv preprint posted last year is not clear. In the introduction they made it seem like this paper is a follow-up that builds on the preprint, but most or all of the experiments in this paper were already performed in the preprint. I guess the authors are planning to divide the original paper into two papers. I would suggest updating the preprint to avoid confusion.

      Thank you for the comment. We updated the preprint to be without a part of Fig.6 and entire Fig.7 along with associated texts. As reviewer pointed out, our eLife paper was spun off from the part of the preprint paper, because we feel that the two stories could confuse readers when presented together.

      (2) Have the authors considered testing responses of water GRNs? They reside in the same sensilla as sugar neurons, so are they also increased affected by Ih mutation or RNAi in sugar neurons? This would strengthen the evidence that the indirect (non-cell autonomous) effects of Ih are due to the sensillum potential and not some specific interaction between sweet and bitter cells.

      As reviewer proposed, we appraised water GRN activity in the L-type bristles of WT, Ihf03355 and a genomic rescue line for Ihf03355. Spiking responses in water GRNs were evoked by hypo-osmolarity of electrolyte (0.1 mM tricholine citrate-TCC). Interestingly, the Ih mutant showed reduced 0.1 mM TCC-provoked spiking frequencies compared to WT. This impairment was rescued by the genomic fragment containing an intact Ih locus (Figure 3-figure supplement 1A).

      Additionally, SPs in L-type bristles were reduced by Ih deficiencies but increased in Gr64af, suggesting that HCN regulates sGRNs in L-type bristles as well (Figure 3-figure supplement 1B). Again, the bristles of animals with both mutations together exhibited SPs similar to those of WT.

      Furthermore, when we conducted cDNA rescue experiments in L bristles, introduction of Ih-RF cDNA in sGRNs restored SPs, while expressing it in bGRNs did not unlike the results from the i- and s-bristles (Fig. 2K,L), likely because L-bristles lack bGRNs. These cDNA rescue and genetic interaction experiments were conducted using flies fed on fresh cornmeal food with strong sweetness, suggesting that the sweetness in the media is the likely key factor producing the genetic interaction and necessitating HCN, consistent with other results in the manuscript. Therefore, SP regulation by HCN is observed in the L-type bristles.

      Minor comments:

      Line 52: typo, "Many of"

      Thank you. Corrected

      Line 95: typo, "sensilla do an sGRN"

      Corrected

      Line 98: typo, "we observed reduced the spiking responses"

      Corrected

      Line 206: typo, "a relatively low sucrose concentrations"

      Corrected

      Line 260: "inverse relationship between the two GRNs in excitability" - I am not exactly sure what data you are referring to.

      Although alleles did not show increased sGRN activities, knockdown of Ih decreased bGRN activity but increased sGRN activity (Fig. 1C,D, Fig.1-figure supplement 2B), while suppression of sGRNs increased bGRN activity (Fig. 3). To clarify this point, we revised the phrase to “the inverse relationship between the two GRNs in excitability observed in Fig. 1C,D, Fig. 1-figure supplement 2B, and Fig. 3”.

      Methods: typo, "twenty of 3-5 days with 10 males and 10 females"

      Corrected to “Twenty flies, aged 3-5 days and consisting of 10 males and 10 females,”

      Methods: typo, "Kim's wipes" should be "Kimwipes"

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) More clarification is necessary on Transepithelial potential (TEP). TEP is typically created by having pumps and tight junctions between the sensillar lymph and the hemolymph.

      We have an introduction to TEP or SP in the context of sensory functions (lines 40-57) with relevant references. The involvement of pumps and tight junction was mentioned in the same paragraph; “Glia-like support cells exhibit close physical association with sensory receptor neurons, and conduct active transcellular ion transport, which is important for the operation of sensory systems” (line 40) and “Tight junctions between support cells separate the externally facing sensillar lymph from the internal body fluid known as hemolymph” (line 53).

      It is not clear how HCN channels in one of the neurons might change the composition of the sensillum lymph. An explanation of their model of how TEP depends on HCN is necessary.

      Although the ionic composition of the sensillum lymph is a contributing factor to the sensillum potential, it is more conceptually relevant to describe our findings with the perspective of membrane potential regulation given the role of HCN in membrane potential stabilization as discussed in our manuscript.

      We speculate that HCN controls the membrane potential at rest and/or in motion to modulate sGRN activity towards saving SP despite the sweetness in the niche. We positioned our results in relation to SP in discussion; “Our results provide multiple lines of evidence that HCN suppresses HCN-expressing GRNs, thereby sustaining the activity of neighboring GRNs within the same sensilla. We propose that this modulation occurs by restricting SP consumption through HCN-dependent neuronal suppression rather than via chemical and electrical synaptic transmission.” (lines 252-255). Moreover, it is unclear whether HCN is localized to the dendrite bathed in the sensillum lymph to influence the ionic composition of the lymph. It would be very interesting to study in future whether the ionic flow through HCN channels itself is critical for the function of HCN in this context, and whether HCN is exclusively present in the dendrite to support the postulation. However, we would like to remind reviewer that Kir2.1 and HCN channels in sGRNs showed similar effects on SP and bGRNs, while they differ in Na+ conductance.

      In the initially submitted manuscript (lines 325-343), we discussed the potential mechanism by which Kir2.1 and HCN channels commonly increase SP in terms of how the membrane potential regulation in the soma can control the SP consumption in the dendrite of sGRNs.

      Another point about the TEP that needs some explanation is that these sensilla are open to the environment as tastants must flow in and are different from mechanical sensilla in that sense.

      This is a very important question regarding the general physiology of the taste sensilla, as the sensillum lymph is in contact with the external environment through the pore of the sensillum. It is indeed interesting to consider how the composition and potential of the lymph are maintained despite the relatively vast volume of food the sensilla encounter during gustation and the continuous evaporation to air between episodes of gustation. However, we believe that this question, while important, is distinct from the primary focus of our manuscript.

      Are the TEP measurements in Figure 2 under control conditions where there are no tastants?

      There is no tastant in the SP-measuring glass electrode other than the electrolyte. We apologize that we did not specify the recording electrode condition. We inserted a clause in the method; “For SP recordings, the recording electrode contained 2 mM TCC as the electrolyte, and…”

      Does the TEP change dynamically as sGRN is activated?

      SP does shift in response to sweets. Please see Fig. 5B. Also, we showed SP changes by mechanical stimuli, which depended on the mechanoreceptor, NompC (Fig. 2D-F). Mechanoreceptor neurons share the sensillum lymph with GRNs.

      (2) More clarification on the potential transduction mechanism and how TEP affects one neuron differentially. Essentially, sGRN perturbation affects sGRN activity and it affects the TEP. More explanation is needed for the potential ionic mechanism of each.

      Our results strongly suggest that HCN lowers the activity of HCN-expressing GRNs, mitigating SP consumption. This modulation is crucial because the SP serves as a driving force for neuronal activation within the sensillum. HCN is particularly necessary in sGRNs because of the flies’ sweet feeding niche, which is expected to result in frequent and strong activation of sGRNs. The SP saved by HCN-dependent delimitation of sGRNs can be used to raise the responsibility of bGRNs.

      (3) The authors refer to their own unreviewed paper (Reference 17). This paper is on a similar topic and there seems to be some overlap. Clarification on this point would be important.

      We revised the biorxiv preprint, so that the preprint version 2 does not contain the parts overlapping with this eLife paper. This eLife paper was originally part of the preprint paper, but it was separated to clarify the messages of the two stories. As we explained in Discussion (lines 276-297), HCN provides resistance to both hyperpolarization and depolarization of the membrane potential. Simply put, one paper focuses on the role of HCN in resisting hyperpolarization, while the other (this paper in eLife) focuses on resisting depolarization.

      (4) Methods are sparse. Many details on the method are necessary. For example, Sensilla recordings are being done by the tip-dip method (I assume). What does "number of experiments" mean in Figure 1? Is it the number of animals or the number of sensilla? How many trials/sensilla?

      We indicated the extracellular recording was performed by the tip-dip method; “In vivo extracellular recordings were performed by the tip-dip method as detailed previously”. We also added a statement on the number of experiments; “The number of experiments indicated in figures are the number of naïve bristles tested. The naïve bristles were from at least three different animals.”

      (5) Figure 1: I understand the author's interpretation. But if one compares WT in Figure 1A to Gr64a-IhRNAi in 1C, we can come to the conclusion that there is no change. In other words, the control in Figure 1C (grey) has a much higher response than WT. Similar conclusions can be made for other experiments. Is the WT response stable enough to make the conclusions made here?

      The genetic background of each genotype may influence GRN activity to some extent. RNAi knockdown experiments are well-known for their hypomorphic nature, and their effects should be evaluated by comparison with their parental controls such as Gal4 and UAS lines. As all reviewers pointed out, we added the results from UAS control. This effort confirms that Gr89a>Ih RNAi is statistically indifferent to UAS control as well as Gr64f-Gal4 control in bGRN spiking evoked by 2-mM caffeine, while Gr64f>Ih RNAi showed reduced bGRN responses to 2 mM caffeine compared to all the controls.

      (6) Figure 3: Why is bGRN spiking not plotted against sensillum potential to observe the dependence more directly?

      This is a very interesting suggestion. We are not, however, equipped to measure spiking and sensillum potential simultaneously. Therefore, they are independent experiments, and we treated them accordingly.

      (7) Figure 4: Why bGRN response is only affected at high caffeine concentrations is not clear.

      We were also surprised by the differences in the dose dependence results of b- and sGRNs, genetically manipulated to mis-express and over-express HCN in Fig. 4A and 4E, respectively. Each gustatory neuron likely has distinct sets of players and parameters that set its own membrane potential and excitability.

      We can think of a possibility that there might be a range of membrane potentials within which HCN does not engage. In bGRNs, the resting membrane potential may lie low within this range, so that some degrees of membrane depolarization by low concentrations of caffeine do not significantly close HCN channels, thus preventing their hyperpolarizing effects. On the other hand, the membrane potential of sGRNs may be high within this range, showing suppressive effects at all tested sucrose concentrations. However, we find this explanation is too speculative to include in the main text, while we stated in the original manuscript, “implying a complex cell-specific regulation of GRN excitability.” (line 210).

      (8) Minor:

      L98 - there is a small typo

      Corrected

      L274: "funny" !?

      “Funny” currents, denoted If, were initially observed by electrophysiologists and later attributed to HCN channels, now indicated by Ih (thus the gene name Ih in Drosophila). These currents were termed "funny" due to their unusual properties compared to other currents. For more detailed information, please refer to the cited references.

      L257: Neuropeptide seemed to be abrupt

      We attempted to discuss possible mechanisms that mediate excitability changes across GRNs beyond the mechanism by SP shifts. Neuropeptides, which are chemical neurotransmitters along with small neurotransmitters, were mentioned following the discussion on synaptic transmission to suggest alternative pathways for excitability regulation. This inclusion is meant to provide a comprehensive overview of potential mechanisms influencing GRN activity.

      Reviewer #3 (Recommendations For The Authors):

      Congratulations on your fascinating research! The results are certainly of interest to the chemosensory field. However, I suggest using academic editing services to enhance the clarity of your text and ensure that the terminology and jargon align with standard usage in the field. The current choice of words may not be consistent with commonly used terms. As it is now, the writing might not fully showcase the compelling story and the effort behind your study, and is underselling your interesting results. Proper refinement could make sure your valuable findings are appropriately recognized.

      We appreciate your comments and apologize for any difficulties reviewers faced during the review process. We are currently prioritizing the review of scientific content and plan to address language issues in a subsequent revision. It would be very helpful for future revisions if the problematic sentences or expressions could be indicated in detail after this revision. This will allow us to ensure that our terminology and expression align with standard usage in the field, and that our findings are clearly and effectively communicated.

      Minor points:

      (1) Line 110: what is Ih-RF?

      We apologize that we relied on a reference in describing the cDNA. The following clause was inserted with additional reference and the Flybase id: “(Flybase id: FBtr0290109), which previously rescued Ih deficiency in other contexts17,26 ,”  

      (2) Line 158: Gr64af mutant flies still have Gr5a and a residual response to fructose and sucrose (Slone, Amrein 2007).

      We revised the line to “is severely impaired in sucrose and glucose sensing”, since there is a substantial loss of sucrose and glucose sensing in both Gr64af from Kim et al 2018 and DGr64 from Slone et al 2007, when they were examined by the proboscis extension reflex assay. This was also confirmed in the study by Jiao et al 2009. We also deleted “sugar-ageusic” and instead describe the mutant “impaired in sucrose and glucose sensing” in Fig. 3 legend.

      (3) Lines 264-273 seem unnecessary. This paper is not about the function of HCN in mammals, and these discussions seem largely irrelevant.

      We feel that it is important to position our results within a broader context by discussing the potential implications of our findings for sensory systems of other animals. As we stated, HCN channels have been localized in mammalian sensory systems, but their roles are often not well understood. By including this discussion, we aim to highlight the relevance of our findings beyond the model organism used in our study and suggest possible areas for future research in mammalian systems.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Overall, the manuscript is very well written, the approaches used are clever, and the data were thoroughly analyzed. The study conveyed important information for understanding the circuit mechanism that shapes grid cell activity. It is important not only for the field of MEC and grid cells, but also for broader fields of continuous attractor networks and neural circuits.

      We appreciate the positive comments.

      (1) The study largely relies on the fact that ramp-like wide-field optogenetic stimulation and focal optogenetic activation both drove asynchronous action potentials in SCs, and therefore, if a pair of PV+ INs exhibited correlated activity, they should receive common inputs. However, it is unclear what criteria/thresholds were used to determine the level of activity asynchronization, and under these criteria, what percentage of cells actually showed synchronized or less asynchronized activity. A notable percentage of synchronized or less asynchronized SCs could complicate the results, i.e., PV+ INs with correlated activity could receive inputs from different SCs (different inputs), which had synchronized activity. More detailed information/statistics about the asynchronization of SC activity is necessary for interpreting the results.

      The short answer here is that spiking responses from the pairs of SCs that we sampled appear asynchronous. We now show this in the form of cross-correlograms for all recorded pairs of SCs (Figure 2, Figure Supplement 1). The correlograms lack peaks that would indicate synchronous activation. Thus, while our dataset is not large enough to rule out occasional direct synchronisation of SCs, this appears unlikely to account for synchronised input to PV+INs.

      This conclusion is consistent with consideration of mechanisms that could in principle synchronise SCs:

      First, if responses to ramping light inputs was fully deterministic, then this could lead to fixed relative timing of spikes fired by different SCs. This is unlikely given the influence of stochastic channel gating on SC spiking (Dudman and Nolan 2009) and is inconsistent with trial to trial variability in spike timing (Figure 2, Figure Supplement 2).

      Second, as SCs are glutamatergic they could excite one another. However, excitatory connections between stellate cells are rare (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016) and when detected they have low amplitude (mean < 0.25 mV; (Winterer et al. 2017)). Our finding that spiking by pairs of SCs is not correlated is consistent with this.

      Third, strong interaction between stellate cells mediated by local inhibitory pathways (Pastoll et al. 2013; Couey et al. 2013) could coordinate their activity. The lack of correlation between spiking of pairs of SCs suggests that such coordination is rarely recruited by our ramping protocols. Nevertheless, recruitment of inhibition may happen to some extent as experiments in Figure 4 show that correlated input from SCs to more distant, but not nearby PV+INs, is reduced by blocking inhibitory synapses. Given that we don't find evidence for synchronised spiking of SCs, this additional common input to widely separated PV+INs is instead best explained by recruitment of interneurons that act directly on the target SCs. We have modified Figure 8 to make this clear.

      Thus, for experiments with ramping light stimuli, synchronous activation of SCs is unlikely to explain common input to PV+INs. Input from the same SC best explains correlated responses of nearby PV+IN inhibitory populations, while recruitment of an additional inhibitory pathway may contribute to correlated responses of more distant PV+INs.

      For experiment using focal stimulation, substantial trial-to-trial variation in SC spike timing argues strongly against deterministic coordination. Indirect coordination of presynaptic neurons is also extremely unlikely given that focal activation is sparse and brief, while inputs from many presynaptic SCs are required to drive a postsynaptic interneuron to spike (e.g. (Pastoll et al. 2013; Couey et al. 2013)). Results from these experiments thus corroborate results from experiments using ramping light stimulation.

      In revising the manuscript we have tried to ensure these arguments are clear (e.g. p 5, para 3; p 6, para 2; p 10, para 1).

      (2) The hypothesis about the "direct excitatory-inhibitory" synaptic interactions is made based on the GABAzine experiments in Figure 4. In the Figure 8 diagram, the direct interaction is illustrated between PV+ INs and SCs. However, the evidence supporting this "direct interaction" between these two cell types is missing. Is it possible that pyramidal cells are also involved in this interaction? Some pieces of evidence or discussions are necessary to further support the "direction interaction".

      Indirect connections between stellate cells mediated via fast spiking inhibitory interneurons are well established by previous studies (e.g. (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016), and so were not addressed here. Previous work also establishes that connections from stellate cells to pyramidal cells are extremely rare (Winterer et al. 2017). Because the Sim1:Cre mouse line is specific to stellate cells and does not drive transgene expression in pyramidal cells (Sürmeli et al. 2015), it's therefore unlikely that pyramidal cells play a role.

      To make these points clearer we have modified the text in the discussion (p 5, para 3; p 10, paras 1 & 2). We have also modified Figure 8 to highlight that the indirect interaction may be best accounted for by inhibitory pathways onto PV+INs rather than via SCs (which our new cross-correlation analyses indicate is unlikely).

      Reviewer #2 (Public Review):

      In this study, Huang et al. employed optogenetic stimulation alongside paired whole-cell recordings in genetically defined neuron populations of the medial entorhinal cortex to examine the spatial distribution of synaptic inputs and the functional-anatomical structure of the MEC. They specifically studied the spatial distribution of synaptic inputs from parvalbumin-expressing interneurons to pairs of excitatory stellate cells. Additionally, they explored the spatial distribution of synaptic inputs to pairs of PV INs. Their results indicate that both pairs of SCs and PV INs generally receive common input when their relative somata are within 200-300 ums of each other. The research is intriguing, with controlled and systematic methodologies. There are interesting takeaways based on the implications of this work to grid cell network organization in MEC.

      We appreciate the positive comments.

      (1) Results indicate that in brain slices, nearby cells typically share a higher degree of common input. However, some proximate cells lack this shared input. The authors interpret these findings as: "Many cells in close proximity don't seem to share common input, as illustrated in Figures 3, 5, and 7. This implies that these cells might belong to separate networks or exist in distinct regions of the connectivity space within the same network.". Every slice orientation could have potentially shared inputs from an orthogonal direction that are unavoidably eliminated. For instance, in a horizontal section, shared inputs to two SCs might be situated either dorsally or ventrally from the horizontal cut, and thus removed during slicing. Given the synaptic connection distributions observed within each intact orientation, and considering these distributions appear symmetrically in both horizontal and sagittal sections, the authors should be equipped to estimate the potential number of inputs absent due to sectioning in the orthogonal direction. How might this estimate influence the findings, especially those indicating that many close neurons don't have shared inputs?

      Given we find high probabilities of correlated inputs to nearby cells in both planes, our conclusion that nearby cells are likely to receive common inputs appears to be independent of the slice plane. For cells further apart, where the degree of correlated input becomes more variable, it is possible that cell pairs that have low input correlations measured in one slice plane would have high input correlations if measured in a different plane. An argument against this is that as the cell pairs are further apart, it is less likely that an orthogonal axon would intersect dendritic trees of both cells. Nevertheless, we can't rule this out given the data here. We have amended the discussion to highlight this possibility (p 10, para 1). We agree it would be interesting to address this point further with quantitative analyses but this will be difficult without detailed reconstructions of the circuit.

      (2) The study examines correlations during various light-intensity phases of the ramp stimuli. One wonders if the spatial distribution of shared (or correlated) versus independent inputs differs when juxtaposing the initial light stimulation phase, which begins to trigger spiking, against subsequent phases. This differentiation might be particularly pertinent to the PV to SC measurements. Here, the initial phase of stimulation, as depicted in Figure 7, reveals a relatively sparse temporal frequency of IPSCs. This might not represent the physiological conditions under which high-firing INs function. While the authors seem to have addressed parts of this concern in their focal stim experiments by examining correlations during both high and low light intensities, they could potentially extract this metric from data acquired in their ramp conditions. This would be especially valuable for PV to SC measurements, given the absence of corresponding focal stimulation experiments.

      We understand the gist of the question here as being can differences in correlation scores between initial vs later phases of responses to ramping light inputs be used to infer spatial organisation? These differences are likely to reflect heterogeneity in the spiking of the input neurons, for example through differences in spike threshold, spike frequency adaptation and saturation of spiking (e.g. Figure 2, Figure Supplement 1A, and also see (Pastoll et al. 2020)). We don't expect these differences to have any spatial organisation along the mediolateral axis, and while spike threshold follows a dorsoventral organisation there is nevertheless substantial local variation between neurons (Pastoll et al. 2020). It's therefore unlikely we can use differences in early versus late correlations to make the inferences proposed by the reviewer.

      With respect to PV to SC measurements, similar heterogeneity is likely. We note that we were unable to carry out focal stimulation experiments for PV to SC connections as PV neurons did not spike in response to focal optogenetic stimulation.

      With respect to physiological conditions, our aim here is simply to assess connectivity in well controlled conditions, e.g. voltage-clamp, minimal spontaneous activity, known neuronal locations, etc. It's not clear that physiological activation patterns would improve on these tests and quite likely data would be noisier and harder to interpret.

      (3) Re results from Figure 2: Please fully describe the model in the methods section. Generally, I like using a modeling approach to explore the impact of convergent synaptic input to PVs from SCs that could effectively validate the experimental approach and enhance the interpretability of the experimental stim/recording outcomes. However, as currently detailed in the manuscript, the model description is inadequate for assessing the robustness of the simulation outcomes. If the IN model is simply integrate-and-fire with minimal biophysical attributes, then the findings in Fig 2F results shown in Fig 2F might be trivial. Conversely, if the model offers a more biophysically accurate representation (e.g., with conductance-based synaptic inputs, synapses appropriately dispersed across the model IN dendritic tree, and standard PV IN voltage-gated membrane conductances), then the model's results could serve as a meaningful method to both validate and interpret the experiments.

      We appreciate the simulation descriptions were insufficient and have modified the manuscript to include additional details and clarification (p 14, paras 1-3).

      We're not sure we follow the logic here with respect to model types. The experiments were carried out in the voltage-clamp recording configuration with the goal of identifying correlated inputs independently from how they are integrated by the postsynaptic neuron. Given that membrane potential doesn't change (and so the CdVm/dt term of the membrane equation = 0), integrate and fire and point conductance-based models both simplify down to summing of input currents. We achieve this by convolving spike times with experimentally measured synaptic current waveforms. An assumption of our approach is that we achieve a reasonable space clamp. We believe this is justified given that stellate cells and PV interneurons are reasonably electrotonically compact, and that our analysis relies on consistent correlations rather than absolute amplitudes or time constants of the postsynaptic response and so should tolerate moderate space clamp errors.

      Reviewer #3 (Public Review):

      This paper presents convincing data from technically demanding dual whole-cell patch recordings of stellate cells in medial entorhinal cortex slice preparations during optogenetic stimulation of PV+ interneurons. The authors show that the patterns of postsynaptic activation are consistent with dual recorded cells close to each other receiving shared inhibitory input and sending excitatory connections back to the same PV neurons, supporting a circuitry in which clusters of stellate cells and PV+IN interact with each other with much weaker interactions between clusters. These data are important to our understanding of the dynamics of functional cell responses in the entorhinal cortex. The experiments and analysis are quite complex and would benefit from some revisions to enhance clarity.

      These are technically demanding experiments, but the authors show quite convincing differences in the correlated response of cell pairs that are close to each other in contrast to an absence of correlation in other cell pairs at a range of relative distances. This supports their main point of demonstrating anatomical clusters of cells receiving shared inhibitory input.

      We appreciate the positive comments.

      The overall technique is complex and the presentation could be more clear about the techniques and analysis. In addition, due to this being a slice preparation they cannot directly relate the inhibitory interactions to the functional properties of grid cells which was possible in the 2-photon in vivo imaging experiment by Heys and Dombeck, 2014.

      We have modified the manuscript to try to improve the presentation (specific changes are detailed below). We agree that an important future challenge is to relate our findings to in vivo observations (p 11, para 2).

      Reviewer #1 (Recommendations For The Authors):

      Major points

      (1) The study largely relies on the fact that ramp-like wide-field optogenetic stimulation and focal optogenetic activation both drove asynchronous action potentials in SCs, and therefore, if a pair of PV+ INs exhibited correlated activity, they should receive common inputs. In Figure 2 and its supplementary figures, the authors also showed examples of asynchronized activity. However, it is unclear to me what criteria/thresholds were used to determine the level of activity asynchronization, and under these criteria, what percentage of cells actually showed synchronized or less asynchronized activity. A notable percentage of synchronized or less asynchronized SCs could complicate the results, i.e., PV+ INs with correlated activity could receive inputs from different SCs (different inputs), which had synchronized activity. Related to this concern, it would also be important to simulate what level of activity asynchronization in SCs could still lead to correlated PV+ IN activity above shuffle, and among the recorded SCs, what percentage of cells belong to this synchronized/less asynchronized category.

      We address this point in our response to the public review. In brief, we have added additional cross-correllograms showing that ramp activation of SC pairs does not cause detectable synchronous activation. We also clarify that sensitivity of correlations of some widely separated pairs to GABA-blockers is suggestive of SCs activating common inhibitory inputs to cell pairs.

      (2) The above concern is more relevant to the focal stimulation experiments, in which the authors tried to claim that a pair of PV+ INs with correlated activity could receive inputs from the same SCs neurons. The authors also showed that the stimulation patterns leading to the activation of PV+ INs were more similar if PV+ INs had correlated activity (Figure 5D). However, if nearby SCs were more synchronized than distal SCs within this stimulation scale, even though a pair of PV+ INs showed correlated activity, they could still receive inputs from different but nearby SCs. In this case, it would be helpful to quantify the relationship between the level of activity synchronization of SCs and their distances. In Figure 5 Supplementary Figure 1, the data were only provided for 8 cells. If feasible, collecting data from more cells would be needed for the proposed analysis.

      We explain in our responses to point 1 above and in the public review that direct synchronisation of SCs is unlikely. This is particularly unlikely for focal stimulation experiments as the timing of responses of individual SCs is extremely variable between trials. Thus, even if there were strong synaptic connections between SCs, which the evidence suggests there is not (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016), then this would be unlikely to result in reliably timed coordinated firing.

      (3) It is unclear what the definition of "common inputs" is. Do they refer to inputs from the same group of cells? If different groups of cells provide synchronized inputs, will the inputs be considered "common inputs" or "different inputs"?

      We used "common" in an attempt to be consistent with classic work by Yoshimura et al. and in an attempt to be succinct. Thus, by common input we are referring to cell pairs for which a proportion of their input is from the same presynaptic neuron(s), as opposed to cell pairs for which their input is from different neurons and therefore have no common input. We have attempted to make sure this is clear in the revised manuscript (e.g description of simulations on p 4, para 2).

      (4) In the introduction and abstract, it was mentioned that "dense, but specific, direct excitatory-inhibitory synaptic interactions may operate at the scale of grid cell clusters". It is unclear to me how "dense" was demonstrated in the data. Can the authors clarify?

      Thanks for flagging this, we were insufficiently clear. We have revised the text to refer to cell pairs for which a proportion of their input is from the same presynaptic neurons (e.g. p 3, para 1), and separately about indirect coordination, by which we mean inputs to cell pairs that appear correlated because of coordination between upstream neurons.

      (5) The hypothesis about the "direct excitatory-inhibitory" synaptic interactions is made based on the GABAzine experiments in Figure 4. In the Figure 8 diagram, the direct interaction is illustrated between PV+ INs and SCs. Is there any evidence supporting this "direct interaction"?

      The direct interaction from SCs to PV+INs and from PV+INs to SCs were previously demonstrated by experiments with recordings from pairs of neurons (e.g. (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016; Winterer et al. 2017). Our results in Figures 3-5, which show that exciting SCs by light activation of ChR2 leads to excitation of PV+INs, and in Figure 7, which show that light activation of PV+INs expressing ChR2 leads to inhibition of SCs, are consistent with these previous conclusions. We have modified the manuscript to make sure this is clear (p 2, para 3).

      Is it possible that pyramidal cells are also involved in this interaction? If this is unlikely, the author may provide some pieces of evidence (e.g., timing of responses after optogenetic stimulation) or some discussions.

      This is unlikely given that previous studies indicate that connections from stellate to pyramidal cells are weak or absent (Winterer et al. 2017). We now clarify this in the Discussion (p 10, para 1).

      Minor points (1) Page 4: the last paragraph: the author claimed that CCpeakmean was reduced and CClagvar increased with cell separation. Although the trends are visible in the figures, the author may provide appropriate statistics to support this statement, such as a correlation between cell separation and CCpeakmean CClagvar./

      We have inserted summaries of linear model fits into the legends for Figure 3E-F, Figure 5F-H and Figure 7D.

      (2)  If I understood correctly, in the second last paragraph on page 6, "pairs of SCs" should be changed to "pairs of PV+ INs".

      Thanks. Corrected.

      (3)  Page 9: the 7th line to the end: where is Figure S4?

      Corrected to 'Figure 3, Figure Supplement 2'.

      (4)  Page 27: at the end of figure caption B: two ".

      Corrected.

      (5)  Figures 3A and B: what are the red vertical rectangles?

      These are the regions shown on an expanded time base in C and D. This is now clarified in the legend.

      (6)  Page 28 Figure caption of D and E: (C) and (D) should be (D) and (E).

      Corrected.

      (7)  The first sentence of the third paragraph in INTRODUCTION: 'later' should be 'layer'.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      - Some related work has been done by Beed et al. 2013 to map the spatial distribution of inputs to neurons in MEC. Certainly, there are differences in the approaches and the key questions, but the contribution of this study would benefit from a more detailed comparison of the results from Beed vs the current study and should be included in the discussion.

      It's hard to include a detailed comparison of results, at least without losing focus, as the two studies address different questions with different approaches. We already noted that 'Local optical activation of unidentified neurons has also been used to infer connectivity principles but with a focus on responses of single postsynaptic neurons (Beed et al., 2013, 2010)'. In addition, we now note that 'Our focal optogenetic stimulation approach also offers insight into the spatial organization of presynaptic neuronal populations, with the advantage, compared to focal glutamate uncaging previously used to investigate connectivity in the MEC (Beed et al., 2013, 2010), that the identity of the presynaptic cell population is genetically defined'.

      - There are a few places where the language is ambiguous or needs a more detailed description for clarity. • 3rd paragraph under "Focal activation of SCs generates common input to nearby PV+Ins". The correlation probability description in this paragraph and a similar sentence in the methods are very hard to understand. I had to look up the analysis in Yoshimura et al. 2005 to understand what was done here. It's a nice analysis, but the manuscript could benefit from a more detailed description of this measure in the methods.

      We agree, it is a somewhat complex metric and is challenging to explain. In the interests of keeping the main text succinct, we have left the bare bones explanation as it was in the Results, but have expanded the explanation in the Methods. We hope this is now clear.

      - " Alternatively, if there is no clear spatial organization of SC to PV+INs connections, then the similarity between stimulus locations for pairs of SCs should have a random distribution." This sentence is hard to understand. I think the use of the phrase "similarity of stimulus location" is a strange phrasing and is driving the confusion in this sentence.

      We have replaced this with 'correspondence between active stimulus locations'.

      - In the discussion under "Spatial extent and functional organization of L2 circuits" there is a grammatical mistake (seems to be 2x phrasing of "leads to common synaptic input").

      Corrected.

      - Citation in the introduction/discussion. Introduction: in addition to Gu et al. 2018, Heys et al 2014 also showed there are non-random correlations among putative grid cells as a function of their somatic distance. In the discussion section, in addition to Gu et al. 2018, Heys et al. 2014 showed there is anatomical clustering of grid cells in MEC. This earlier work investigating functional correlations among neurons in the superficial aspect of MEC in vivo should be cited and is particularly relevant in these two sections of the manuscript.

      Thanks, we apologise for the oversight. We're well aware of this important study and have now cited it.

      -Typo - Paragraph 3 of the intro; "later" should be layer.

      Corrected.

      -Figure 5 (D-E) there is a typo high correlation probability is D and low correlation is E (text says C/D).

      Corrected.

      Reviewer #3 (Recommendations For The Authors):

      The paper is missing the bibliography section. This makes the review somewhat difficult as some cited papers are not immediately familiar based on the citation.

      Thanks and our apologises for making extra work by omitting this. It is now included.

      Page 2 - "cell clusters" - they should also cite the paper by Heys and Dombeck, 2014 that shows a spatial scale of inhibitory interactions computed based on correlations of grid cells recorded using 2-photon calcium imaging.

      Added (see above).

      Page 2 - "later 2 of the MEC" - layer.

      Corrected.

      Page 2 - "synaptic interactions" - again they should mention the work by Heys and Dombeck, 2014 that indirectly measured the spatial scale of inhibition.

      Now cited in this paragraph.

      Page 4 "we simulated responses" and Figure 2E - in each simulation - did they fit the magnitude and time constant of the simulated EPSCs to individual EPSCs in the data? Or did they randomly vary these to find the best fit?

      The parameters for the simulations are given in the Methods and were chosen to correspond to the experimental values. We have rewritten this section to make the simulation methods clearer. Simulations using different time constants within a physiological range support similar conclusions.

      Page 4 - "we identified 35/71" - Are these the cells that appear in yellow as correlated in Figures 3E-F? If so, the text should indicate that these cells are shown in yellow.

      We have added this and have also updated the legends for additional clarification.

      Figure 2, Figure Supplement 1 - B,C - the following phrase is not clear: "when the 4 / 8 of each neurons inputs from SCs also project to the other neuron (B)," Should the "the" be removed? Also, by 4/8 do they mean 50%, or do they mean 4 to 8?

      Thanks, we've reworded to improve the clarity.

      E - "receiving presynaptic inputs consisted of 4 overlapping SCs" - should it say "consisting"?

      Corrected.

      Figure 3, Figure Supplement 1 part E - "the same data as (C )" - should this be the same data as (D)?? I do not see how doing clustering on the shuffled data in (C ) would give two groups, but it makes sense if it is from (D).

      That's right, now corrected.

      Page 5 - "used action potentials" - this is confusing. Is the word "used" supposed to be there?

      Corrected.

      Page 5 - "widefield activation experiments" - they should cite the experiments that they are referring to here.

      Added.

      Page 5 - "effect of blocking" - "Figure 4" - I find it very odd that the agent GABAzine in Figure 4 is not explicitly mentioned in the main text (though it is mentioned in the methods). The main text should indicate that blocking was performed using GABAzine.

      Added.

      Page and page 14 and Figure 5 - "shifted" - do they mean shuffled?

      We do. The classic papers by Yoshimura et al. used shifted so we keep this here so it's clear we've used their approach. We've added additional explanation to try to make sure the meaning is clear.

      Figure 5 A, B, D, and E would benefit from a more detailed description. They should state whether the labels "1a" and "1b" and "2a" and "2b" refer to different recorded neurons in each pair. They should indicate that 2a and 2b are a different pair? Are the x, y axes of the images corresponding to anatomical position? Does "B" indicate the location of recordings shown in Figure 5B? The authors probably think this is all obvious, but it is not immediately obvious to the reader.

      We have added additional clarification.

      Page 8 - "Beed et al." - These papers by Beed ought to be cited in the introduction as well as they are highly relevant.

      We now cite Beed et al. 2013 in the Introduction when we discuss local inhibitory input to SCs. While the Beed et al. 2010 paper is an important contribution to understanding about pathways from deep to superficial layers, the introduction focuses on communication between identified pre- and postsynaptic populations within layer 2 and therefore we haven't found a way to cite it without losing focus. We do cite this paper multiple times elsewhere.

      Page 10 - "Excitatory-inhibitory interactions" - this summary of attractor models ought to cite the paper by Burak and Fiete as well.

      The discussion focuses on models with excitatory-inhibitory connectivity and cites an important paper from the Fiete group. The model by Burak and Fiete, while also important, is purely inhibitory and so is not well constrained by the known circuitry, and therefore could not be correctly cited here.

      Page 10 - "be consistent with models…or that focus on pyramidal neurons have also been proposed" - this seems ungrammatical as if two different sentences were merged.

      Corrected.

      References

      Couey, Jonathan J, Aree Witoelar, Sheng-Jia Zhang, Kang Zheng, Jing Ye, Benjamin Dunn, Rafal Czajkowski, et al. 2013. “Recurrent Inhibitory Circuitry as a Mechanism for Grid Formation.” Nat. Neurosci. 16 (3): 318–24. https://doi.org/10.1038/nn.3310.

      Dudman, Joshua T, and Matthew F Nolan. 2009. “Stochastically Gating Ion Channels Enable Patterned Spike Firing through Activity-Dependent Modulation of Spike Probability.” Plos Comput. Biol. 5 (2): e1000290. https://doi.org/10.1371/journal.pcbi.1000290.

      Fuchs, Elke C, Angela Neitz, Roberta Pinna, Sarah Melzer, Antonio Caputi, and Hannah Monyer. 2016. “Local and Distant Input Controlling Excitation in Layer II of the Medial Entorhinal Cortex.” Neuron 89 (1): 194–208. https://doi.org/10.1016/j.neuron.2015.11.029.

      Pastoll, Hugh, Derek L Garden, Ioannis Papastathopoulos, Gülşen Sürmeli, and Matthew F Nolan. 2020. “Inter- and Intra-Animal Variation in the Integrative Properties of Stellate Cells in the Medial Entorhinal Cortex.” Elife 9 (February). https://doi.org/10.7554/eLife.52258.

      Pastoll, Hugh, Lukas Solanka, Mark C W van Rossum, and Matthew F Nolan. 2013. “Feedback Inhibition Enables Theta-Nested Gamma Oscillations and Grid Firing Fields.” Neuron 77 (1): 141–54. https://doi.org/10.1016/j.neuron.2012.11.032.

      Sürmeli, Gülşen, Daniel Cosmin Marcu, Christina McClure, Derek L F Garden, Hugh Pastoll, and Matthew F Nolan. 2015. “Molecularly Defined Circuitry Reveals Input-Output Segregation in Deep Layers of the Medial Entorhinal Cortex.” Neuron 88 (5): 1040–53. https://doi.org/10.1016/j.neuron.2015.10.041.

      Winterer, Jochen, Nikolaus Maier, Christian Wozny, Prateep Beed, Jörg Breustedt, Roberta Evangelista, Yangfan Peng, Tiziano D’Albis, Richard Kempter, and Dietmar Schmitz. 2017. “Excitatory Microcircuits within Superficial Layers of the Medial Entorhinal Cortex.” Cell Rep. 19 (6): 1110–16. https://doi.org/10.1016/j.celrep.2017.04.041.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      While there are many models for sequence retrieval, it has been difficult to find models that vary the speed of sequence retrieval dynamically via simple external inputs. While recent works [1,2] have proposed some mechanisms, the authors here propose a different one based on heterogeneous plasticity rules. Temporally symmetric plasticity kernels (that do not distinguish between the order of pre and post spikes, but only their time difference) are expected to give rise to attractor states, asymmetric ones to sequence transitions. The authors incorporate a rate-based, discrete-time analog of these spike-based plasticity rules to learn the connections between neurons (leading to connections similar to Hopfield networks for attractors and sequences). They use either a parametric combination of symmetric and asymmetric learning rules for connections into each neuron, or separate subpopulations having only symmetric or asymmetric learning rules on incoming connections. They find that the latter is conducive to enabling external inputs to control the speed of sequence retrieval.

      Strengths:

      The authors have expertly characterised the system dynamics using both simulations and theory. How the speed and quality of retrieval varies across phases space has been well-studied. The authors are also able to vary the external inputs to reproduce a preparatory followed by an execution phase of sequence retrieval as seen experimentally in motor control. They also propose a simple reinforcement learning scheme for learning to map the two external inputs to the desired retrieval speed.

      Weaknesses:

      (1) The authors translate spike-based synaptic plasticity rules to a way to learn/set connections for rate units operating in discrete time, similar to their earlier work in [5]. The bio-plausibility issues of learning in [5] carry over here, for e.g. the authors ignore any input due to the recurrent connectivity during learning and effectively fix the pre and post rates to the desired ones. While the learning itself is not fully bio-plausible, it does lend itself to writing the final connectivity matrix in a manner that is easier to analyze theoretically.

      We agree with the reviewer that learning is not `fully bio-plausible’. However, we believe that extending the results to a model in which synaptic plasticity depends on recurrent inputs is beyond the scope of this work. We have added a mention of this issue in the Discussion in the revised manuscript.

      (2) While the authors learn to map the set of two external input strengths to speed of retrieval, they still hand-wire one external input to the subpopulation of neurons with temporally symmetric plasticity and the other external input to the other subpopulation with temporally asymmetric plasticity. The authors suggest that these subpopulations might arise due to differences in the parameters of Ca dynamics as in their earlier work [29]. How these two external inputs would connect to neurons differentially based on the plasticity kernel / Ca dynamics parameters of the recurrent connections is still an open question which the authors have not touched upon.

      The issue of how external inputs could self-organize to drive the network to retrieve sequences at appropriate speeds is addressed in the Results section, paragraph `Reward-driven learning’. These inputs are not `hand-wired’ - they are initially random and then acquire the necessary strengths to allow the network to retrieve the sequences at different speeds thanks to a simple reinforcement learning scheme. We have rewritten this section to clarify this issue.

      (3) The authors require that temporally symmetric and asymmetric learning rules be present in the recurrent connections between subpopulations of neurons in the same brain region, i.e. some neurons in the same brain region should have temporally symmetric kernels, while others should have temporally asymmetric ones. The evidence for this seems thin. Though, in the discussion, the authors clarify 'While this heterogeneity has been found so far across structures or across different regions in the same structure, this heterogeneity could also be present within local networks, as current experimental methods for probing plasticity only have access to a single delay between pre and post-synaptic spikes in each recorded neuron, and would therefore miss this heterogeneity'.

      We agree with the reviewer that this is currently an open question. We describe this issue in more detail in the Discussion of the revised manuscript.

      (4) An aspect which the authors have not connected to is one of the author's earlier work:

      Brunel, N. (2016). Is cortical connectivity optimized for storing information? Nature Neuroscience, 19(5), 749-755. https://doi.org/10.1038/nn.4286 which suggests that the experimentally observed over-representation of symmetric synapses suggests that cortical networks are optimized for attractors rather than sequences.

      We thank the reviewer for this suggestion. We have added a paragraph in the discussion that discusses work on statistics of synaptic connectivity in optimal networks. We expect that in networks that contain two subpopulations of neurons, the degree of symmetry should be intermediate between a network storing fixed point attractors exclusively, and a network storing sequences exclusively.

      Despite the above weaknesses, the work is a solid advance in proposing an alternate model for modulating speed of sequence retrieval and extends the use of well-established theoretical tools. This work is expected to spawn further works like extending to a spiking neural network with Dale's law, more realistic learning taking into account recurrent connections during learning, and experimental follow-ups. Thus, I expect this to be an important contribution to the field.

      We thank the reviewer for the insightful comments.

      Reviewer #2 (Public Review):

      Sequences of neural activity underlie most of our behavior. And as experience suggests we are (in most cases) able to flexibly change the speed for our learned behavior which essentially means that brains are able to change the speed at which the sequence is retrieved from the memory. The authors here propose a mechanism by which networks in the brain can learn a sequence of spike patterns and retrieve them at variable speed. At a conceptual level I think the authors have a very nice idea: use of symmetric and asymmetric learning rules to learn the sequences and then use different inputs to neurons with symmetric or asymmetric plasticity to control the retrieval speed. The authors have demonstrated the feasibility of the idea in a rather idealized network model. I think it is important that the idea is demonstrated in more biologically plausible settings (e.g. spiking neurons, a network with exc. and inh. neurons with ongoing activity).

      Summary

      In this manuscript authors have addressed the problem of learning and retrieval sequential activity in neuronal networks. In particular, they have focussed on the problem of how sequence retrieval speed can be controlled?

      They have considered a model with excitatory rate-based neurons. Authors show that when sequences are learned with both temporally symmetric and asymmetric Hebbian plasticity, by modulating the external inputs to the network the sequence retrieval speed can be modulated. With the two types of Hebbian plasticity in the network, sequence learning essentially means that the network has both feedforward and recurrent connections related to the sequence. By giving different amounts of input to the feed-forward and recurrent components of the sequence, authors are able to adjust the speed.

      Strengths

      - Authors solve the problem of sequence retrieval speed control by learning the sequence in both feedforward and recurrent connectivity within a network. It is a very interesting idea for two main reasons: 1. It does not rely on delays or short-term dynamics in neurons/synapses 2. It does not require that the animal is presented with the same sequences multiple times at different speeds. Different inputs to the feedforward and recurrent populations are sufficient to alter the speed. However, the work leaves several issues unaddressed as explained below.

      Weaknesses

      - The main weakness of the paper is that it is mostly driven by a motivation to find a computational solution to the problem of sequence retrieval speed. In most cases they have not provided any arguments about the biological plausibility of the solution they have proposed e.g.:

      - Is there any experimental evidence that some neurons in the network have symmetric Hebbian plasticity and some temporally asymmetric? In the references authors have cited some references to support this. But usually the switch between temporally symmetric and asymmetric rules is dependent on spike patterns used for pairing (e.g. bursts vs single spikes). In the context of this manuscript, it would mean that in the same pattern, some neurons burst and some don't and this is the same for all the patterns in the sequence. As far as I see here authors have assumed a binary pattern of activity which is the same for all neurons that participate in the pattern.

      There is currently only weak evidence for heterogeneity of synaptic plasticity rules within a single network, though there is plenty of evidence for such a heterogeneity across networks or across locations within a particular structure (see references in our Discussion). The reviewer suggests another interesting possibility, that the temporal asymmetry could depend on the firing pattern on the post-synaptic neuron. An example of such a behavior can be found in a paper by Wittenberg and Wang in 2006, where they show that pairing single spikes of pre and post-synaptic neurons lead to LTD at all time differences in a symmetric fashion, while pairing a pre-synaptic spike with a burst of post-synaptic spikes lead to temporally asymmetric plasticity, with a LTP window at short positive time differences. We now mention this possibility in the Discussion, but we believe exploring fully this scenario is beyond the scope of the paper.

      - How would external inputs know that they are impinging on a symmetric or asymmetric neuron? Authors have proposed a mechanism to learn these inputs. But that makes the sequence learning problem a two stage problem -- first an animal has to learn the sequence and then it has to learn to modulate the speed of retrieval. It should be possible to find experimental evidence to support this?

      Our model does not assume that the two processes necessarily occur one after the other. Importantly, once the correct external inputs that can modulate sequence retrieval are learned, sequence retrieval modulation will automatically generalize to arbitrary new sequences that are learned by the network.

      - Authors have only considered homogeneous DC input for sequence retrieval. This kind of input is highly unnatural. It would be more plausible if the authors considered fluctuating input which is different from each neuron.

      We have modified Figure 1e and Figure 2c to show the effects of fluctuating inputs on pattern correlations and single unit activity. We find that these inputs do not qualitatively affect our results.

      - All the work is demonstrated using a firing rate based model of only excitatory neurons. I think it is important that some of the key results are demonstrated in a network of both excitatory and inhibitory spiking neurons. As the authors very well know it is not always trivial to extend rate-based models to spiking neurons.

      I think at a conceptual level authors have a very nice idea but it needs to be demonstrated in a more biologically plausible setting (and by that I do not mean biophysical neurons etc.).

      We have included a new section in the discussion with an associated figure (Figure 7) demonstrating that flexible speed control can be achieved in an excitatory-inhibitory (E-I) spiking network containing two excitatory populations with distinct plasticity mechanisms.

      Reviewer #1 (Recommendations For The Authors):

      In the introduction, the authors state: 'symmetric kernels, in which coincident activity leads to strengthening regardless of the order of pre and post-synaptic spikes, have also been observed in multiple contexts with high frequency plasticity induction protocols in cortex [21]'. To my understanding, [21]'s final model 3, ignores LTD if the post-spike also participates in LTP, and only considers nearest-neighbour interactions. Thus, the kernel would not be symmetric. Can the authors clarify what they mean and how their conclusion follows, as [21] does not show any kernels either.

      In this statement, we were not referring to the model in [21], but rather the experimentally observed plasticity kernels at different frequencies. In particular, we were referring to the symmetric kernel that appears in the bottom panel of Figure 7c in that paper.

      The authors should also address the weaknesses mentioned above. They don't need to solve the issues but expand (and maybe indicate resolutions) on these issues in the Discussion.

      For ease of reproducibility, the authors should make their code available as well.

      We intend to publish the code required to reproduce all figures on Github.

      Reviewer #2 (Recommendations For The Authors):

      -  Show the ground state of the network before and after learning.

      We have decided not to include such a figure, as we have not analyzed the learning process, but instead a network with a fixed connectivity matrix which is assumed to be the end result of a learning process.

      -  Authors have only considered a network of excitatory neurons. This does not make sense. I think they should demonstrate a network of both exc. and inch. neurons (spiking neurons) exhibiting ongoing activity.

      See our comment to Reviewer #2 in the previous section.

      -  Show how the sequence dynamics unfolds when we assume a non-zero ongoing activity.

      We are not sure what the reviewer means by `non-zero ongoing activity. We show now the dynamics of the network in the presence of noisy inputs, which can represent ongoing activity from other structures (see Fig 1e and 2c).

      -  From the correlation (==quality) alone it is difficult to judge how well the sequence has been recovered. Authors should consider showing some examples so that the reader can get a visual estimate of what 0.6 quality may mean. High speed is not really associated with high quality (Fig 2b). So it is important to show how the sequence retrieval quality is for non-linear and heterogeneous learning rules.

      We believe that some insight into the relationship between speed and quality for the case of non-linear and heterogeneous learning rules is addressed by the correlation plots for chosen input configurations (see Fig. 3a and and 5b). We leave a full characterization for future work.

      -  Authors should show how the retrieval and quality of sequences change when they are recovered with positive input, or positive input to one population and negative to another. In the current version sequence retrieval is shown only with negative inputs. This is a somewhat non-biological setting. The inhibitory gating argument (L367-389) is really weak.

      We would like to clarify that with the parameters chosen in this paper, the transfer function has half its maximal rate at zero input. This is due to the fact we chose the threshold to be zero, using the fact that any threshold can be absorbed in the external inputs. Thus, negative inputs really mean sub-threshold inputs, and they are consistent with sub-threshold external excitatory inputs. We have clarified this issue in the revised manuscript.

      -  Authors should demonstrate how the sequence retrieval dynamics is altered when they assume a fluctuating input current for sequence retrieval instead of a homogeneous DC input.

      See our comment to Reviewer #2 in the previous section.

      -  Authors should show what are the differences in synaptic weight distribution for the two types of learning (bi-linear and non-linear). I am curious to know if the difference in the speed in the two cases is related to the weight distribution. In general I think it is a good idea to show the synaptic weight distribution before and after learning.

      As mentioned above, we do not study any learning process, but rather a network with a fixed connectivity matrix, assumed to represent the end result of learning. In this network, the distribution of synaptic weights converges to a Gaussian in the large p and cN limits, independently of the functions f and g, because of the central limit theorem, if there are no sign constraints on weights. In the presence of sign constraints, the distribution is a truncated Gaussian.

      -  I suggest the use of a monochromatic color scale for figure 2b and 3b.

      Figure 3: The sentence describing panel 2 seems incomplete.

      Also explain why there is non-monotonic relationship between I_s and speed for some values of

      I_a in 3b

      There is a non-monotonic relationship for retrieval quality, not speed. We have clarified this in the manuscript text, but don’t currently have an explanation for why this phenomenon occurs for these specific values of I_a.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Additional Discussion Points

      (1) There is not much exploration of potential mechanisms, i.e., the impact of PV neuron activity on the broader circuit. Additionally, the study exclusively focuses on PV cells and does not explore the role of other prefrontal populations, particularly those known to respond to cueevoked fear states. The discussion should consider how PV activity might impact the broader circuit and whether the present findings are specific to PV cells or applicable to other interneuron subtypes.

      We have added an extensive discussion of potential mechanisms and the potential contributions of other interneuron subtypes:

      “For example, PV neurons aid in improving visual discrimination through sharpening response selectivity in visual cortex (Lee et al., 2012). In prefrontal cortex, PV neurons are critical for task performance, particularly during performance of tasks that require flexible behavior such as rule shift learning (Cho et al., 2020) and reward extinction (Sparta et al., 2014). Further, PV neurons play an essential role in the generation of cortical gamma rhythms, which contribute to synchronization of selective populations of pyramidal neurons (Sohal et al., 2009; Cardin et al., 2009). Courtin et al (2014) showed that brief suppression of dorsomedial prefrontal (dmPFC) PV neural activity enhanced fear expression, one of the main functions of the dmPFC, by synchronizing the spiking activity of dmPFC pyramidal neurons (Courtin et al., 2014). This result is potentially relevant to our findings, but likely involves different circuit mechanisms because of the difference in timescale, targeted area, and downstream projection targets (Vertes, 2004). These and other studies support the idea that PV neural activity supports the execution of a behavior by shaping rather than suppressing cortical activity, potentially by selecting among conflicting behaviors by the synchronization of different pyramidal populations (Warden et al., 2012; Lee et al., 2014).

      The roles of other inhibitory neural subtypes (such as somatostatin (SOM)-expressing and vasoactive intestinal peptide (VIP)-expressing IL GABA neurons) in avoidance behavior are currently unknown, but are likely important given the role of SOM neurons in gamma-band synchronization (Veit et al., 2017), and the role of VIP neurons in regulating PV and SOM neural activity (Cardin, 2018).” 

      (2) There is some discordance between changes in neural activity and behavior. For example, in Figure 4C, the relationship between PV neuron activity and movement emerges almost immediately during learning, but successful active avoidance emerges much more gradually. Why is this?

      We have added extensive text to the discussion that addresses this issue:

      “Interestingly, the rise in IL PV neural activity during movement does not require avoidance learning. IL PV neurons begin to respond during movement immediately after the animal has received a single shock in an environment, but learning to cross the chamber to avoid the signaled shock takes tens of trials. Why is there a discordance between the emergence of the IL PV signal during movement and avoidance learning?

      The components underlying active avoidance have been debated over the years, but are thought to involve at least two essential behaviors – suppressing freezing, and moving to safety (LeDoux et al., 2017). Freezing is the default response of mice upon hearing a shock-predicting tone, and can be learned in a single trial (Ledoux, 1996; Fanselow, 2010; Zambetti et al., 2022). When a predator is in the distance, freezing can increase the chance of survival by reducing the chances of detection. However, a strategic avoidance behavior may prevent a future encounter with the predator altogether. The importance of IL PV neural activity in defensive behavior may be to suppress reactive defensive behaviors such as freezing in order to permit a flexible goaldirected response to threat.

      The freezing suppression and avoidance movement components of the avoidance response are dissociable, both because freezing precedes avoidance learning, and because animals intermittently move prior to avoidance learning. Our finding that the rise in PV activity during movement emerges immediately after receiving a single shock, tens of trials before animals have learned the avoidance behavior, suggests that the IL PV signal is associated with the suppression of freezing. Further, IL PV neurons do not respond during movement toward cued rewards because in reward-based tasks there is no freezing response in conflict with reward approach behavior.” 

      (3) vmPFC was defined here as including the infralimbic (IL) and dorsal peduncular (DP) regions. While the role of IL has been frequently characterized for motivated behavior, relatively few studies have examined DP. Perhaps the authors are just being cautious, given the challenges involved in the viral targeting of the IL region without leakage to nearby regions such as DP. But since the optical fibers were positioned above the IL region, it is possible that DP did not contribute much to either the fiber photometry signals or the effects of the optogenetic manipulations. Perhaps DP should be completely omitted, which is more consistent with the definitions of vmPFC in the field.

      Yes, we included DP to be cautious as our viral expression sometimes leaks into DP, though the optic fiber targets IL. We have replaced vmPFC with IL throughout the manuscript. 

      (4) In the Discussion, the authors should consider why PV cells exhibit increased activity during both movement initiation and successful chamber crossing during avoidance. While the functional contribution of the PV signal during movement initiation was tested with optogenetic inhibition, some discussion on the possible role of the additional PV signal during chamber crossing is of interest readers who are intrigued by the signaling of two events. Is the chamber crossing signal related to successful avoidance or learned safety (e.g., see Sangha, Diehl, Bergstrom, Drew 2020)?

      IL PV neural activity starts to increase at movement initiation, peaks at chamber crossing (when movement speed is highest), and decreases after chamber crossing (Figure 1E). Thus, the increase in PV neural activity at movement initiation and at chamber crossing are different phases of the same event. 

      We think this signal is unlikely to be a safety signal, and have added text to the discussion to clarify this issue:

      “We think the IL PV signal is unlikely to be a safety signal (Sangha et al., 2020). First, the PV signal rises during movement not only in the avoidance context, but during any movement in a “threatening” context (i.e. a context where the animal has been shocked). For example, PV neural activity rises during movement during the intertrial interval in the avoidance task. Further, the emergence of the PV signal during movement happens quickly – after the first shock – and significantly before the animal has learned to move to the safe zone. This suggests a close association with enabling movement in a threatening environment, when animals must suppress a freezing response in order to move. Additionally, the rise in PV activity was specifically associated with movement and not with tone offset, the indicator of safety in this task. Finally, if IL PV neural activity reflects safety signals one would expect the response to be enhanced by learning, but the amplitude of the IL PV response was unaffected by learning after the first shock.”

      (5) The primary conclusion here that PV cells control the fear response should be considered within the context of prior findings by the Herry laboratory. Courtin et al (2014) demonstrated a select role of prefrontal PV cells in the regulation of fear states, accomplished through their control over prefrontal output to the basolateral amygdala. The observations in this paper, which used both ChR2 and Arch-T to address the impact of vmPFC PV activity on reactive behavior, are highly relevant to issues raised both in the Introduction and Discussion.

      Courtin et al (2014)’s finding is very important. We did not discuss this paper originally because Courtin et al. is about dmPFC, which has a different role in fear processing than IL/vmPFC. We have added text about this finding to the discussion:

      “Courtin et al (2014) showed that brief suppression of dorsomedial prefrontal (dmPFC) PV neural activity enhanced fear expression, one of the main functions of the dmPFC, by synchronizing the spiking activity of dmPFC pyramidal neurons (Courtin et al., 2014). This result is potentially relevant to our findings, but likely involves different circuit mechanisms because of the difference in timescale, targeted area, and downstream projection targets (Vertes, 2004).

      Additional analyses

      (1) As avoidance trials progress (particularly on days 2 and 3), do PFC PV responses attenuate? That is, does continued unreinforced tone presentations lead to reduced reliance of PV cellmediated suppression in order for successful avoidance to occur?

      We added Figure 1—Figure supplement 1M and 1N and a sentence on page 5: “IL PV neural activity during the avoidance movement was not attenuated by learning or repeated reinforcement (Figure 1—Figure supplement 1M and N, N = 8 mice, p = 0.8886, 1-way ANOVA).” We only included data from days 1 and 2, since we started to introduce short and long tone trials on day 3 which might interfere. 

      (2) In Figure 3D, it would be very informative and further support the claim of "no role for movement during reward" if the response of these cells during the "initiation of movement during reward-approach" was shown (similar to Figure 1F for threat avoidance).

      Thank you for the question. We added Figure 3—Figure supplement 1B and C to show IL PV neural activity aligned to initiation of movement during reward-approach. IL PV activity decreased after movement initiation for reward approach (N = 6 mice, p=0.0382, paired t-test). This further solidifies our claim that IL PV neuron activity only increases for threat avoidance.   

      Reviewer 1 (Recommendations For The Authors):

      (1) Fig1G shows the average response of PV cells during chamber crossing on an animal-toanimal basis. It would be informative to also see a similar plot for movement initiation.

      We have added the suggested figure in Figure 1—Figure supplement 1B.  

      (2) In the Results section (Page 5), there is a small issue with the logic. It says: "As vmPFC inactivation impairs avoidance behavior, the activity of inhibitory vmPFC PV neurons might be predicted to be low during successful avoidance trials." As opposed to "low", it should say "high", right? If inhibition impairs avoidance, then high responding by these cells would be presumed to drive the avoidance response, as supported by your findings.

      We have re-worded the text in this section. Based on prior findings that IL inactivation impairs avoidance (Moscarello et al., 2013), we predicted that inhibitory PV neurons would be less active during avoidance, because activating these neurons could suppress IL. However, we found that they were selectively active during avoidance.

      (3) In the caption/legend for Fig1E, it says that the "black ticks" indicate "tone onset". But it should say "movement initiation".

      We thank the reviewer for pointing out this error. The ticks do indicate tone onset, and we have corrected the figure to reflect this. 

      Reviewer 2 (Recommendations For The Authors):

      (4) Perhaps replace the term 'good outcomes' with 'reinforcing outcomes' or simply 'reinforcement'.

      Thank you for the suggestion. We have replaced ‘good outcomes’ with ‘reinforcing outcomes’.

      Reviewer 3 (Recommendations For The Authors):

      (5) It would be useful to provide some (perhaps speculative) explanation for the discordance between the PV activity-movement relationship and success of active avoidance in Fig. 4C

      We have added text to the discussion that addresses this issue:

      “Interestingly, the rise in IL PV neural activity during movement does not require avoidance learning. IL PV neurons begin to respond during movement immediately after the animal has received a single shock in an environment, but learning to cross the chamber to avoid the signaled shock takes tens of trials. Why is there a discordance between the emergence of the IL PV signal during movement and avoidance learning?

      The components underlying active avoidance have been debated over the years, but are thought to involve at least two essential behaviors – suppressing freezing, and moving to safety (LeDoux et al., 2017). Freezing is the default response of mice upon hearing a shock-predicting tone, and can be learned in a single trial (Ledoux, 1996; Fanselow, 2010; Zambetti et al., 2022). When a predator is in the distance, freezing can increase the chance of survival by reducing the chances of detection. However, a strategic avoidance behavior may prevent a future encounter with the predator altogether. The importance of IL PV neural activity in defensive behavior may be to suppress reactive defensive behaviors such as freezing in order to permit a flexible goaldirected response to threat.

      The freezing suppression and avoidance movement components of the avoidance response are dissociable, both because freezing precedes avoidance learning, and because animals intermittently move prior to avoidance learning. Our finding that the rise in PV activity during movement emerges immediately after receiving a single shock, tens of trials before animals have learned the avoidance behavior, suggests that the IL PV signal is associated with the suppression of freezing. Further, IL PV neurons do not respond during movement toward cued rewards because in reward-based tasks there is no freezing response in conflict with reward approach behavior.” 

      (6) I don't really understand what is shown in Figure 4D -- exactly what time points does this represent? Was habituation performed everyday?

      Figure 4D shows data from the approach task, not the avoidance task. This data is from welltrained mice, not the first day of training on this task. There was a pre-task recording period every day.

      (7) Why was optogenetic inhibition only delivered from 0.5-2.5 sec after the tone cue?

      We wanted to avoid any possibility that perception of the tone would be disrupted, so we delayed the onset of optogenetic inhibition. We chose 0.5 sec onset because animals typically begin to move ~1 second after tone onset.

      (8) The regression analysis with shuffled time points is not well explained -- some additional methodological details are needed (Fig. 2H).

      We added the following to the methods section to provide a clearer explanation: 

      “DF/F (t) was modeled as the linear combination of all event kernels. Given the event occurrence time points of all event types, we can use linear regression to decompose characteristic kernels for each event type. Kernel coefficients of the model were solved by minimizing the mean square errors between the model and the actual recorded signals. To prove that kernel ki is an essential component for the raw calcium dynamics, we compared the explanation power of the full model to the reduced model where the time points of the occurrence of event ki were randomly assigned. Thus, the kernel coefficients should not reflect the response to the event in the reduced model. 

      Editor's notes:

      -  Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the pvalue is less than 0.05.

      Thank you for pointing this out. We have included all the test statistics and exact p values as suggested.

      -  Please note the sex of the mice and distribution of sexes in each group for each experiment.

      We have added the sex of mice for all experiments in the methods section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work successfully identified and validated TRLs in hepatic metastatic uveal melanoma, providing new horizons for enhanced immunotherapy. Uveal melanoma is a highly metastatic cancer that, unlike cutaneous melanoma, has a limited effect on immune checkpoint responses, and thus there is a lack of formal clinical treatment for metastatic UM. In this manuscript, the authors described the immune microenvironmental profile of hepatic metastatic uveal melanoma by sc-RNAseq, TCR-seq, and PDX models. Firstly, they identified and defined the phenotypes of tumor-reactive T lymphocytes (TRLs). Moreover, they validated the activity of TILs by in vivo PDX modelling as well as in vitro coculture of 3D tumorsphere cultures and autologous TILs. Additionally, the authors found that TRLs are mainly derived from depleted and late activated T cells, which recognize melanoma antigens and tumor-specific antigens. Most importantly, they identified TRLs associated phenotypes, which provide new avenues for targeting expanded T cells to improve cellular and immune checkpoint immunotherapy.

      Strengths:

      Jonas A. Nilsson, et al. has been working on new therapies for melanoma.  The team has also previously performed the most comprehensive genome-wide analysis of uveal melanoma available, presenting the latest insights into metastatic disease. In this work, the authors performed paired sc-RNAseq and TCR-seq on 14 patients with metastatic UM, which is the largest single-cell map of metastatic UM available. This provides huge data support for other  studies of metastatic UM.

      We thank the reviewer for these kind words about our work.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not  directly demonstrated. That is,  insufficient analyses are performed to fully support the key claims in the manuscript by the data presented. In particular:

      The author's description of the overall results of the article should be logical, not just a description of the observed phenomena. For example, the presentation related to the results of TRLs lacked logic. In addition, the title of the article emphasizes the three subtypes of hepatic metastatic UM  TRLs, but these three subtypes are not specifically discussed in the results as well as the discussion section. The title of the article is not a very comprehensive generalization and should be carefully considered by the authors.

      We thank the reviewer for the critical reading of our work. We have added more data and more discussion.

      The authors' claim that they are the first to use autologous TILs and sc-RNAseq to study immunotherapy needs to be supported by the corresponding literature to be more convincing. This can help the reader to understand the innovation and importance of the methodology.

      We have gone through the manuscript and found that we only refer to being first in using PDX models and autologous TILs to study immunotherapy responses by single-cell sequencing. While there are data to be deduced from other studies, we still believe this to be an accurate statement.

      In addition, the authors argue that TILs from metastatic UM can kill tumor cells. This is the key and bridging point to the main conclusion of the article. Therefore, the credibility of this conclusion should be considered.  Metastatic UM1 and UM9 remain responsive to autologous tumors under in vitro conditions with their autologous TILs.

      UM1 responds also in vivo in the subcutaneous model in the paper. We have also finished an experiment where we show that this model also responds in a liver metastasis model. These data have been added in this revised version of the paper. We add two main figures and one supplementary figure where we characterize the response in vivo and also by single-cell sequencing of TILs.

      In contrast, UM22, also as a metastatic UM, did not respond to TIL treatment. In particular, the presence of MART1-responsive TILs. The reliability of the results obtained by the authors in the model of only one case of UM22 liver metastasis should be considered. The authors should likewise consider whether such a specific cellular taxon might also exist in other patients with metastatic UM, producing an immune response to tumor cells. The results would be more comprehensive if supported by relevant data.

      The reviewer has interpreted the results absolutely right, the allogenic and autologous MART1-specific TILs cells while reactive in vitro against UM22, cannot kill this tumor either in a subcutaneous or liver metastases model. We hypothesize this has to do with an immune exclusion phenotype and show weak immunohistochemistry that suggest this. We hope the addition of more UM1 data can be viewed as supportive of tumor-reactivity also in vivo.

      In addition, the authors in that study used previously frozen biopsy samples for TCR-seq, which may be associated with low-quality sequencing data, high risk of outcome indicators, and unfriendly access to immune cell information. The existence of these problems and the reliability of the results should be considered. If special processing of TCR-seq data from frozen samples was performed, this should also be accounted for.  

      We agree with the reviewers and acknowledge we never anticipated the development of single-cell sequencing techniques when we started biobank 2013. We performed dead cell removal before the 10x Genomics experiment. We have also done extensive quality controls and believe that the data from the biopsies should be viewed as a whole and that quantitative intra-patient comparisons cannot be done.

      Reviewer #2 (Public Review):  

      Summary:  

      The study's goal is to characterize and validate tumor-reactive T cells in liver metastases of uveal melanoma (UM), which could contribute to enhancing immunotherapy for these patients. The authors used single-cell RNA and TCR sequencing to find potential tumor-reactive T cells and then used patientderived xenograft (PDX) models and tumor sphere cultures for functional analysis. They discovered that tumor-reactive T cells exist in activated/exhausted T cell subsets and in cytotoxic effector cells. Functional experiments with isolated TILs show that they are capable of killing UM cells in vivo and ex vivo.

      Strengths:  

      The study highlights the potential of using single-cell sequencing and functional analysis to identify T cells that can be useful for cell therapy and marker selection in UM treatment. This is important and novel as conventional immune checkpoint therapies are not highly effective in treating UM. Additionally, the study's strength lies in its validation of findings through functional assays, which underscores the clinical relevance of the research. 

      We thank the reviewer for these kind words about our work.

      Weaknesses:  

      The manuscript may pose challenges for individuals with limited knowledge of single-cell analysis and immunology markers, making it less accessible to a broader audience.

      The first draft of the manuscript (excluding methods) was written by a person (J.A.N) who is not a bioinformatician. It has been corrected to include the correct nomenclature where applicable but overall it is written with the aim to be understandable. We have made an additional effort in this version. 

      Reviewer #1 (Recommendations For The Authors):  

      (1) Firstly, the authors should provide high-resolution pictures to ensure readability for readers. 

      We have converted to pdf ourselves and that improved resolution. We are happy to provide high-resolution to the office if needed for the printing.

      (2) Furthermore, some parts of the article are more colloquial, and the authors should consider the logic and academic nature of the overall writing of the article. For example, authors should double-check whether the relevant expressions in the results are correct. For example, 'TCR' in the fourth part of the results should be 'TRLs'.

      We thank the reviewer for the recommendations and have gone through the manuscript.

      (3) Moreover, UM22 is described several times in the results as a metastatic UM and should be clearly defined in the methodology.

      The UM22 and UM1 samples are described in-depth in Karlsson et al., Nature Communications, 2020, a paper that is cited in the beginning of Results as part of the narrative. The current work can be viewed as an extension of that work.

      (4) Finally, it is recommended that authors describe a part of the results in full before citing the corresponding picture, otherwise, it will lead to confusion among readers.

      We have made an effort in the revised version to describe the new data in more detail.

      Reviewer #2 (Recommendations For The Authors):  

      The manuscript is very interesting and important to understanding key aspects of uveal melanoma immune profile and functionality. However, in my opinion, there are a few aspects that could be addressed.  

      - The manuscript lacks comprehensive details about the samples used, such as their disease progression, response to treatment, or any relevant information that could shed light on potential differences between samples. It would be valuable to know whether these samples were collected before any systemic treatment or if any of the patients underwent immunotherapy post-sample collection, along with the outcomes of such treatments. Providing this information would enrich the manuscript and provide a more holistic view of the research.

      We thank the reviewer for the recommendation and have included a new Supplementary table 7 with information about the samples. We have also pasted in individual samples’ contribution to the UMAP to add further holistic view.  

      - The results presented and discussed in the manuscript seem to indicate that there were no significant differences across the various samples, including comparisons between lymph-node and liver metastases. However, this lack of variation or the reasons for not discussing any observed differences should be clarified. If there are distinctions between the samples, it would be beneficial to discuss these findings in the manuscript.

      We thank the reviewer for the recommendation. Whereas 14 samples are many for a uveal melanoma study it is not really powered to do intra-patient comparisons.

      - The manuscript may pose difficulties for individuals with limited knowledge of single-cell analysis and immunology markers, potentially limiting its accessibility. To make the research more inclusive, the authors might consider presenting the technical aspects of their work in a less descriptive manner and providing explanations for those less familiar with the technology. This would help a broader audience grasp the significance of the study's findings. 

      The manuscript is from a multidisciplinary team where all have read and commented. The draft was written by a tumor biologist and edited by a bioinformatician for accuracy. We honestly think it is more understandable than most studies in this bioinformatics era. But we have tried to describe the new data in an easier way.

    1. if we fail to control our numbers and our appetites well then yes our society will start to to crash in a similar way to that of 00:35:32 easter island only on a worldwide scale and that means the whole industrial civilization will break down and 00:35:45 our descendants will essentially be uh savages to use that term very advisably and savages in the sense that they will have lost 00:35:58 the fruits of civilization and hate us

      for - progress trap - dark futures scenario - like Easter Island but on a global scale

      comment - The potential global breakdown of global industrialized society, rupturing supply chains so that our highly interdependent world becomes the very Achilles Heel that hastens its demise is chilling - It could mean a huge disruption to the most important aspect of civilization - the continuing accruing and inter-generational transmission of knowledge - It would be catastrophic to lose that, but it is entirely possible - As Wright himself famously said, to use a computer metaphor, we humans are like 50,000 year old hardware, running modern software - By that, he meant that our cognitive physiology (brain and sensory processing system) has not changed for tens of thousands of years, yet cultural evolution happens at exponentially faster rates, so much so that our biological systems are not adapted to keep up with the pace, and that spells disaster - When we no longer have the sensory or cognitive apparatus to sense danger, and we are offloading that to AI, we are in an extremely vulnerable situation

      progress trap - Gedanken - Think of our ancestors from 50,000 years ago. - What Wright is saying with his metaphor is that if that child from 50,000 years ago were transported by a time machine to modernity, (s)he would have little problem integrating into modern society - LIKEWISE, if we lose all the knowledge fruits of accumulated over so many thousands of years, it would be like being born into a human tribe 50,000 years ago. - We would likely still have language, but all our technology may have to start from scratch!

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The authors provide solid data on a functional investigation of potential nucleoid-associated proteins and the modulation of chromosomal conformation in a model cyanobacterium. While the experiments presented are convincing, the manuscript could benefit from restructuring towards the precise findings; alternatively, additional data buttressing the claims made would significantly enhance the study. These valuable findings will be of interest to the chromosome and microbiology fields.

      We appreciate editors for taking time for assessment and reviewers for giving critical suggestions. Both reviewers were concerned about our interpretation of 3C data, and Reviewer #2 suggested the biochemistry of cyAbrB2 to reinforce our claim. We agree with the concern and suggest editors add a sentence “How cyAbrB2 affects chromosome structure is still elusive from this study, and the biochemical assays are needed in the future experiment.” to the eLife assessment.

      The major revision points are the following;

      Reconstruction of Figures

      Previous Figure 5E has been omitted

      Additional 3C data on the nifJ region

      Rephrasing the conclusion of 3C data

      Additional discussion on cyAbrB2 and NAPs

      Reviewer #1 (Public Review): 

      Strength: 

      At first glance, I had a very positive impression of the overall manuscript. The experiments were well done, the data presentation looks very structured, and the text reads well in principle.

      Weakness: 

      Having a closer look, the red line of the manuscript is somewhat blurry. Reading the abstract, the introduction, and parts of the discussion, it is not really clear what the authors exactly aim to target. Is it the regulation of fermentation in cyanobacteria because it is under-investigated? Is it to bring light to the transcriptional regulation of hydrogenase genes? The regulation by SigE? Or is it to get insight into the real function of cyAbrB2 in cyanobacteria? All of this would be good of course. But it appears that the authors try to integrate all these aspects, which in the end is a little bit counterintuitive and in some places even confusing. From my point of view, the major story is a functional investigation of the presumable transcriptional regulator cyAbrB2, which turned out to be a potential NAP. To demonstrate/prove this, the hox genes have been chosen as an example due to the fact that a regulatory role of cyAbrB2 has already been described. In my eyes, it would be good to restructure or streamline the introduction according to this major outcome. 

      As you pointed out, the major focus of this study is cyAbrB2 as a potential NAPs. To focus on NAPs, we simplified the first paragraph of the discussion (ll.246-263) and added the section comparing cyAbrB2 with other known NAPs (11.269-299). To emphasize the description of cyAbrB2, we also rearranged the figures and divided the analysis on cyAbrB2 ChIP into two figures. We reduced the first paragraph of the introduction but mostly preserved the composition of the introduction to keep the general to specific pattern, even though the manuscript is blurry.

      Points to consider: 

      The authors suggest that the microoxic condition is the reason for the downregulation of e.g. photosynthesis (l.112-114). But of course, they also switched off the light to achieve a microoxic environment, which presumably is the trigger signal for photosynthesis-related genes. I suggest avoiding making causal conclusions exclusively related to oxygen and recommend rephrasing (for example, "were downregulated under the conditions applied").

      We agree with this point. We rephrased l.114 to “by the transition to dark microoxic conditions from light aerobic conditions” (ll.108-109).

      The authors hypothesized that cyAbrB2 modulates chromosomal conformation and conducted a 3C analysis. But if I read the data in Figure 5B & C correctly, there is a lot of interaction in a range of 1650 and 1700 kb, not only at marked positions c and j. Positions c and j have been picked because it appears that cyAbrB2 deletion impacts this particular interaction. But is it really significant? In the case of position j the variation between the replicates seems quite high, in the case of position c the mean difference is not that high. Moreover, does all this correlate with cyAbrB2 binding, i.e. with positions of gray bars in panel A? If this was the case, the data obtained for the cyabrB2 mutant should look totally different but they are quite similar to WT. That's why the sentence "By contrast, the interaction frequency in Δcyabrb2 mutant were low and unchanged in the aerobic and microoxic conditions" does not fit to the data shown. But I have to mention that I am not an expert in these kinds of assays. Nevertheless, if there is a biological function that shall be revealed by an experiment, the data must be crystal clear on that. At least the descriptions of the 3C data and the corresponding conclusions need to be improved. For me, it is hard to follow the authors' thoughts in this context. 

      According to your suggestion, we again have carefully observed the 3C data. Furthermore, we conducted an additional 3C experiment on nifJ region (Figures 7F-J). Then we admit we had overinterpreted the 3C data. Therefore, we rewrote the result and discussion of the 3C assay in line with the data (ll.220-245) and removed the previous Figure 5E. Following are individual responses.

      Positions c and j have been picked because it appears that cyAbrB2 deletion impacts this particular interaction. But is it really significant?

      We could not find statistically significant differences at locus c and j. Therefore, we added this in the result section “Note that the interaction scores exhibit considerable variability and we could not detect statistical significance at those loci.” (ll.231-232)

      does all this correlate with cyAbrB2 binding, i.e. with positions of gray bars in panel A?

      As you are concerned, interaction frequency and cyAbrB2 binding do not correlate. Therefore, we withdraw the previous claim and stated as follows; “Moreover, our 3C data did not support bridging at least in hox region and nifJ region, as the high interaction locus and cyAbrB2 binding region did not seem to correlate (Figure 7).” (ll.280-282)

      If this was the case, the data obtained for the cyabrB2 mutant should look totally different but they are quite similar to WT.

      We rewrote it as follows; “Then we compared the chromatin conformation of wildtype and cyabrb2∆. Although overall shapes of graphs did not differ, some differences were observed in wildtype and cyabrb2∆ (Figures 7B and 7G); interaction of locus (c) with hox region were slightly lower in cyabrb2∆ and interaction of loci (f’) and (g’) with nifJ region were different in wildtype and cyabrb2∆. Note that the interaction scores exhibit considerable variability and we could not detect statistical significance at those loci.” (ll.228-232)

      That's why the sentence "By contrast, the interaction frequency in Δcyabrb2 mutant were low and unchanged in the aerobic and microoxic conditions" does not fit to the data shown.

      We rewrote the sentence as follow; “While the interaction scores exhibit considerable variability, the individual data over time demonstrate declining trends of the wildtype at locus (c) and (j) (Figure S8). In ∆cyabrb2, by contrast, the interaction frequency of loci (c) and (j) was unchanged in the aerobic and microoxic conditions (Figure 7E). The interaction frequency of locus (c) in ∆cyabrb2 was as low as that in the microoxic condition of wildtype, while that of locus (j) in ∆cyabrb2 was as high as that in the aerobic condition of wildtype (Figures 7B and 7C).” (ll.238-243)

      The figures are nicely prepared, albeit quite complex and in some cases not really supportive of the understanding of the results description. Moreover, they show a rather loose organization that sometimes does not fit the red line of the results section. For example, Figure 1D is not mentioned in the paragraph that refers to several other panels of the same figure (see lines110-128). Panel 1D is mentioned later in the discussion. Does 1D really fit into Figure 1 then? Are all the panels indeed required to be shown in the main document? As some elements are only briefly mentioned, the authors might also consider moving some into the supplement (e.g. left part of Figure 1C, Figure 2A, Figure 3B ...) or at least try to distribute some panels into more figures. This would reduce complexity and increase comprehensibility for future readers. Also, Figure 3 is a way too complex. Panel G could be an alone-standing figure. The latter would also allow for an increase in font sizes or to show ChIP data of both conditions (L+O2 and D-O2) separately. Moreover, a figure legend typically introduces the content as a whole by one phrase but here only the different panels are described, which fits to the impression that all the different panels are not well connected. Of course, it is the decision of the authors what to present and how but may they consider restructuring and simplifying.

      According to the advice, we have rearranged the Figure composition.

      The left side of Figure 1C has been moved to supplement. Instead, representative expression fold changes of “Transient”, “Plateau”, “Continuous”, and “Late” genes are shown for comprehensibility. We left Figure 1D in Figure 1, as this diagram shows our motive to focus on hox and nifJ. We moved Figure 2A to supplement. We did not move Fig3B, as this figure shows the distribution of cyAbrB2 (“long tract of AT-rich DNA”) comprehensively and simply. We agree that Figure 3 was too complex. Therefore, we moved Figures 3F and 3G to a new independent figure (Figure 4). In Figure 4C (former 3G), we show the ChIP data of the L+O2 condition only, and the change of ChIP data under the D-O2 condition is shown in Figure 5. The schematic image showing cyanobacterial chromosome and NAPs (previous Figure 5E) was omitted because it was overinterpreting.

      The authors assume a physiological significance of transient upregulation of e.g. hox genes under microoxic conditions. But does the hydrogenase indeed produce hydrogen under the conditions investigated and is this even required? Moreover, the authors use the term "fermentative gene". But is hydrogen indeed a fermentation product, i.e. are protons the terminal electron acceptor to achieve catabolic electron balance? Then huge amounts of hydrogen should be released. Comment should be made on this.

      This is a very important point; Yes, hydrogenase indeed produces hydrogen under the conditions we investigated, and proton accepts a majority of reducing power under the dark microoxic condition. We wrote in the introduction section as follows; “Hydrogen is generated in quantities comparable to lactate and dicarboxylic acids as the result of electron acceptance in the dark microoxic condition (Akiyama and Osanai 2023; Iijima et al. 2016)” (ll.54-55). The detailed explanation is below, although omitted from the manuscript.

      A recent study (Akiyama and Oasanai 2023) quantified the consumed glycogen and secreted fermentative products (hydrogen, lactate, dicarboxylic acid, and acetate) in the Synechocystis under the dark microoxic condition, the same conditions as we investigated. The system of the study consists of a 10 mL liquid layer and a 10 mL gas layer, cultivated for 3 days under dark microoxic conditions. Then the amounts of lactic acid, dicarboxylic acid, and hydrogen were approximately 2 µmol, 3.5 µmol, and 11µmol (assuming the gas layer was at 1 atm and ignoring aqueous population), respectively. On the other hand, glycogen equivalent to 15µmol of glucose was consumed in the system. This estimate supports hydrogen accounts for a substantial portion of fermentative products during dark microoxic conditions.

      The necessity of hydrogen production under dark microoxic conditions was demonstrated in (Gutekunst et al. 2014). They show hydrogenase activity is required for the mixotrophic growth in the light-dark and microoxic cycle with arginine. The necessity remains unclear in our conditions because we only performed continuous dark microoxic conditions without glucose.

      The authors also mention a reverse TCA cycle. But is its existence an assumption or indeed active in cyanobacteria, i.e. is it experimentally proven? The authors are a little bit vague in this regard (see lines 241-246).

      We misused the Terminology. We mean to mention the “reductive branch of TCA”. Cyanobacteria conduct the branched TCA cycle under microoxic conditions. One of the branches is the reductive branch, which reduces oxaloacetate to produce malate. We corrected “reverse TCA cycle” to “reductive branch of TCA”. (Figure 1D and ll.260-262)

      Reviewer #2 (Public Review): 

      This work probes the control of the hox operon in the cyanobacterium Synechocystis, where this operon directs the synthesis of a bidirectional hydrogenase that functions to produce hydrogen. In assessing the control of the hox system, the authors focused on the relative contributions of cyAbrB2, alongside SigE (and to a lesser extent, SigA and cyAbrB1) under both aerobic and microoxic conditions. In mapping the binding sites of these different proteins, they discovered that cyAbrB2 bound many sites throughout the chromosome repressed many of its target genes, and preferentially bound regions that were (relatively) rich in AT-residues. These characteristics led the authors to consider that cyAbrB2 may function as a nucleoid-associated protein (NAP) in Synechocystis, given its functional similarities with other NAPs like H-NS. They assessed the local chromosome conformation in both wild-type and cyabrB2 mutant strains at multiple sites within a 40 kb window on either side of the hox locus, using a region within the hox operon as bait. They concluded that cyAbrB2 functions as a nucleoid-associated protein that influences the activity of SigE through its modulation of chromosome architecture.

      The authors approached their experiments carefully, and the data were generally very clearly presented and described.

      Based on the data presented, the authors make a strong case for cyAbrB2 as a nucleoid-associated protein, given the multiple ways in which it seems to function similarly to the well-studied Escherichia coli H-NS protein. It would be helpful to provide some additional commentary within the discussion around the similarities and differences of cyAbrB2 to other nucleoid-associated proteins, and possible mechanisms of cyAbrB2 control (post-translational modification; protein-protein interactions; etc.). The manuscript would also be strengthened with the inclusion of biochemical experiments probing the binding of cyAbrB2, particularly focusing on its oligomerization and DNA polymerization/bridging potential.

      We agree with the comment that the biochemical experiments will deepen our insights into the cyAbrB2 and chromatin conformation. As the reviewer pointed out, the biochemical assay will provide valuable information on mechanisms of cyAbrB2 control, such as post-transcriptional modification, cooperation with cyAbrB1, oligomerization, and the structure of cyAbrB2-bound DNA. However, we think those potential findings are worth of new independent research paper, rather than a part of this paper. Therefore, we added a discussion mentioning biochemistry as the future work (ll.275-290; the section of “The biochemistry of cyAbrB2 will shed light on the regulation of chromatin conformation in the future”).

      Previous work had revealed a role for SigE in the control of hox cluster expression, which nicely justified its inclusion (and focus) in this study. However, the results of the SigA studies here suggested that SigA both strongly associated with the hox promoter, and its binding sites were shared more frequently than SigE with cyAbrB2. The focus on cyAbrB2 is also well-justified, given previous reports of its control of hox expression; however, it shares binding sites with an essential homologue cyAbrB1. Interestingly, while the B1 protein appears to bind similar sites, instead of repressing hox expression, it is known as an activator of this operon. It seems important to consider how cyAbrB1 activity might influence the results described here.

      We infer that the minor side of the bimodal SigE peak is the genuine population that contributes to hox transcription, as hox genes are expressed in a SigE-dependent manner (Figure S2). We considered the strong SigA peak upstream of the hox operon binds the promoter of TU1715, the opposite direction of the hox operon. We added a description of the single SigA peak and bimodal SigE peak near the TSS of the hox operon as follows;

      “A bimodal peak of SigE was observed at the TSS of the hox operon in a microoxic-specific manner (Figure 6C bottom panel). The downstream side of the bimodal SigE peak coincides with SigA peak and the TSS of TU1715. Another side of the bimodal peak lacked SigA binding and was located at the TSS of the hox operon (marked with an arrow in Figure 6C), although the peak caller failed to recognize it as a peak.” (ll.206-209)

      The point that cyAbrB1 binds similar sites as cyAbrB2, despite regulating hox expression in the opposite direction, is very interesting. Therefore, we referred to the transcriptome data of the cyAbrB1 knockdown strain and compared the impact of cyAbrB1 knockdown and cyAbrB2 deletion. We described in result and discussion as follows;

      “we referred to the recent study performing transcriptome of cyAbrB1 knockdown strain, whose cyAbrB1 protein amount drops by half (Hishida et al. 2024). Among 24 genes induced by cyAbrB1 knockdown, 12 genes are differentially downregulated genes in cyabrb2∆ in our study (Figure S5D).” (ll.162-165)

      “CyAbrB1, the homolog of cyAbrB2, may cooperatively work, as cyAbrB1 directly interacts with cyAbrB2 (Yamauchi et al. 2011), their distribution is similar, and they partially share their target genes for suppression (Figures 3A S5C and S5D). The possibility of cooperation would be examined by the electrophoretic mobility shift assay of cyAbrB1 and cyAbrB2 as a complex. Despite their similar repressive function, cyAbrB1 and cyAbrB2 regulate hox expression in the opposite directions, and their mechanism remains elusive.” (ll.292-296)

      Hox operon differs from this general tendency. To see if cyAbrB1 behaves differently from cyAbrB2 in the hox operon, we did an additional ChIP-qPCR experiment on cyAbrB1 in the aerobic condition and the dark microoxic condition (Figure 5C). However, we could not find the difference.

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1B: I recommend changing the header in the grey bar to terms like "upregulated" and "downregulated", which are also used in the legend description. Upregulation of genes can also be a result of de-repression, which is why the term "activated" is somewhat misleading.

      Corrected.

      Lines 114-116: It is unclear what the authors exactly mean here. Please clarify. 

      We rephrase the sentence “The enrichment in the butanoate metabolism pathway indicates the upregulation of genes involved in carbohydrate metabolism. We further classified genes according to their expression dynamics.” (ll.110-111)

      Reviewer #3 (Recommendations For The Authors): 

      Major/experimental comments: 

      (1) For the chromosome conformation capture experiments, it is indicated that these were conducted at aerobic (1hr) and microoxic (4 hr) conditions. But the data presented in Figure 1 suggest that 1 hr corresponds to the beginning of microoxic growth, and that time 0 is aerobic. The composite 3C data in Figure 5 show some interesting but specific differences. It is appreciated that the authors presented the profiles for individual samples in Figure S7, and the differences here do not seem to be as compelling. Are the major differences being highlighted significantly (statistically) different (e.g. at the (c) and (j) loci)? Might the differences be starker if an earlier aerobic condition (e.g. time 0) had been used instead of the 1 hr - microoxic - timepoint?

      Previous Figure 5 consisted of three time points (solid line: aerobic condition, dashed line:1hr of microoxic condition, and dotty line:4hr of microoxic condition). We omitted data of 4hr in the main figure (Figure 7) as 4hr in microoxic conditions makes data complicated. Three time points are shown in the profiles of individual loci (Figure S8).

      There is no statistical significance found in (c) and (j) loci by t-test. Therefore, we have toned down the interpretation of 3C data as follows; “Our 3C result demonstrated that cyAbrB2 influences the chromosomal conformation of hox and nifJ region to some extent (Figure 7).” (ll.325-326)

      (2) This is a complicated system that involves multiple regulatory proteins, each of which is differentially affected by the growth conditions (aerobic/microoxic). It is obviously beyond the scope of this work to probe deeply into all of these proteins. The focus here was on cyAbrB2, and to a slightly lesser extent SigE; however, based on the data presented, it seems that SigA and cyAbrB1 may be equally important contributors to hox control/expression, and in the case of cyAbrB1, possibly also to chromosome conformation. cyAbrB1 appears to have the same binding sites as cyAbrB2, and has been reported to interact with cyAbrB2. Given this association, it is possible that the two proteins may affect the binding of each other, and that loss of one might lead to enhanced binding by the other (or binding may require heterooligomerization?). Probing the regulatory interplay between these two proteins (or at least discussing it) feels important. Conducting e.g. mobility shift assays with each protein, both individually and together, could possibly allow for some understanding of how they function together. 

      We agree that the biochemistry of cyAbrB2 and cyAbrB1 may explain why cyAbrB1 and cyAbrB2 bind long tracts of AT-rich genome regions in vitro. We would like to put the biochemistry future plan as we think biochemistry data is beyond the present study.

      The idea that cyAbrB1 and cyAbrB2 cooperate to form heterooligomers and broad binding to the genome is a very rational and interesting prediction. We add this idea to the discussion “Overall, the biochemistry integrating assay conditions (PTM, buffer condition, and cooperation with cyAbrB1) and output (DNA binding, oligomerization, and DNA structure) will deepen the understanding of cyAbrB2 as cyanobacterial NAPs.”(ll.287-290). We also compared our transcriptome of ∆_cyabrb2 with the recent study of cyabrb1 knockdown (ll. 162-165), and concluded “they partially share their target genes for suppression (Figures 3A S5C and S5D)” (l. 293).

      (3) Throughout the manuscript, there is reference made to cyAbrB2 binding becoming 'blurry' or non-specific under microoxic conditions. It is not clear what this means. It appears that when cyAbrB2 binds, any given protected region can be quite extensive, which can be suggestive of polymerization along the chromosome. Are the boundaries for binding sites typically clearly delineated, and this changes when the cultures are growing under microoxic conditions? There is also no mention made anywhere about oligomerization potential for cyAbrB2, which would be important for the polymerization, and bridging suggested for cyAbrB2 in the model presented in Figure 5. Previous publications (Song et al., 2022; Ishi et al., 2008) have suggested that it can exist as a dimer in vivo, but that in vitro it is largely monomeric. The manuscript would benefit from some additional biochemical analyses of cyAbrB2 binding activity, with a particular focus on DNA binding and oligomerization/bridging potential, and some additional discussion about these characteristics as well. 

      Throughout the manuscript, there is reference made to cyAbrB2 binding becoming 'blurry' or non-specific under microoxic conditions. It is not clear what this means.

      In order to clearly describe “cyAbrB2 binding becomes blurry”, we rearranged the figure composition and made an exclusive figure (Figure 5). We also rephrased the description by adopting the reviewer’s word “boundaries for binding sites”, as this phrase well describes the change. “When cells entered microoxic conditions, the boundaries of the cyAbrB2 binding region and cyAbrB2-free region became obscure (Figure 5), “(ll.319-320)

      There is also no mention made anywhere about oligomerization potential for cyAbrB2,

      We added the discussion about oligomerization “DNA-bound cyAbrB2 is expected to oligomerize, based on the long tract of cyAbrB2 binding region in our ChIP-seq data. However, no biochemical data mentioned the DNA deforming function or oligomerization of cyAbrB2 in the previous studies and preference for AT-rich DNA is not fully demonstrated in vitro (Dutheil et al. 2012; Ishii and Hihara 2008; Song et al. 2022)”(ll. 277-280) and “Overall, the biochemistry integrating assay conditions (PTM, buffer condition, and cooperation with cyAbrB1) and output (DNA binding, oligomerization, and DNA structure) will deepen the understanding of cyAbrB2 as cyanobacterial NAPs.” (ll.287-290)

      The manuscript would benefit from some additional biochemical analyses of cyAbrB2 binding activity, with a particular focus on DNA binding and oligomerization/bridging potential, and some additional discussion about these characteristics as well. 

      We added the discussion integrally considering known features of cyAbrB2, novel findings on cyAbrB2, and the comparison with known NAPs (ll.269-290).

      (4) Given that the major take-away for the authors (based on the title) seems to be the nucleoid-associated protein potential for cyAbrB2, the Discussion would benefit from some additional focus in this area. How similar is cyAbrB2 to other nucleoid-associated proteins? (e.g. H-NS, Lsr2) How does counter-silencing work for other nucleoid-associated proteins? Can the authors definitively exclude the possibility of binding site competition/occlusion, given that cyAbrB2 covers the promoter region of hox? What is other nucleoid-associated proteins have been characterized in the cyanobacteria? 

      We agree with the point, so we additionally discussed cyAbrB2 comparing with H-NS and Lsr2, the canonical NAPs (ll. 269-290).

      We did not deny the possibility of the exclusion of RNAP by cyAbrB2, but the previous manuscript insufficiently discussed that. To emphasize that cyAbrB2 excludes RNA polymerase, we simplified Figure 6 and employed mosaic plots showing anti-co-occurrence of cyAbrB2 binding regions and SigE peaks. Furthermore, we added discussion about SigE exclusion by cyAbrB2 (ll. 355-359)

      We mention the possibility of other nucleoid-associated proteins in cyanobacteria in the discussion. “Furthermore, the conformational changes by deletion of cyAbrB2 were limited, suggesting there are potential NAPs in cyanobacteria yet to be characterized.” (ll.336-339)

      (5) Previous work (Song et al., 2022) showed that changing the AT content of cyAbrB2 binding sites did not affect its ability to bind DNA. There are also previous papers suggesting that cyAbrB2 may be subject to diverse post-translational modifications (e.g. phosphorylation - Spat et al., 2023; glutationylation - Sakr et al., 2013), as well as association with cyAbrB1. These collectively suggest there may be other factors that contribute to cyAbrB2 binding specificity/activity. These seem like relevant points to discuss, particularly given the transient nature of the cyAbrB2 effects on some genes.

      We have included the discussion about AT content, post-translational modifications and transient regulations, and association with cyAbrB1 (ll. 284-295)

      (6) Given the major binding site for SigA upstream of the hox operon, it seems that it likely also contributes to hox cluster expression, together with SigE. Is there a sense for the relative contribution of each sigma factor to hox cluster expression? And whether both are subject to the same inhibitory effect of cyAbrB2? 

      As described above response to the public review, the SigA binding site upstream of the hox operon should be assigned to the TSS of TU1715 (Figure 6C). Transcription of hox operon is highly dependent on SigE as shown in Figure S2, and residual transcription in sigE∆ strain is derived from other sigma factors (SigABCD). Estimating the relative contribution of sigma factors other than SigE is difficult at present because SigABCDE can partially compensate for each other.

      As the different impact of NAPs on the primary and alternative sigma factor is observed in H-NS (Shin et al. 2005), whether both the primary sigma factor (SigA) and the alternative sigma factor (SigE) are inhibited by cyAbrB2 to the same extent is a very interesting question.

      We calculated the odds ratio of SigE and SigA being in the cyAbrB2-free region and wrote in the result; “SigE preferred the cyAbrB2-free region in the aerobic condition more than SigA did (Odds ratios of SigE and SigA being in the cyAbrB2-free region were 4.88 and 2.74, respectively).” (ll.193-195) and discussed “The higher exclusion pressure of cyAbrB2 on SigE may contribute to sharpening the transcriptional response of hox and nifJ on entry to microoxic conditions.” (ll.357-359)

      (7) The 3C experiments suggest there are indeed changes in chromosome architecture in the hox region as growth conditions change and when different regulators are present. Across the chromosome, analogous changes are expected; however, it may be premature to draw this conclusion based on changes at one locus. Is there a reason that the authors did not take full advantage of their 3C samples and sequence them, to capture the full chromosome interactome at the two time-points? This would allow broader conclusions to be drawn regarding changes in chromosome structure and the impact of cyAbrB2.

      In response to the suggestion, we performed an additional 3C assay on the nifJ region by utilizing residual 3C samples. Expanding to genome-wide sequence (Hi-C) needs concentration of ligated fragments by the biotinylation, which were omitted in our 3C sample.

      We rewrote the result as obtained from the 3C data of hox and nifJ (ll.220-245) and omitted the schematic image of an entire chromosome of cyanobacteria (previous Figure 5E).

      Editorial comments: 

      (1) The data presentation in Figure 1 is very effective. 

      (2) Line 87: please rephrase - you can have 'high similarity' or 'high levels of identity', but not high levels of homology - genes/proteins are either homologous or not.

      (3) Line 118: classified into four 'groups'? 

      (4) Line 590: remove 'the'. 

      (5) Figure 2S, panel B: please define acronyms in the legend (GT, IP) and write out 'FLAG' in full for AbrB1.

      (2) to (5) have been corrected.

      (6) Please provide information on or a reference for the tagging of SigA for use in the ChIP-seq experiments within the Materials and Methods.

      Added (l.365)

      (7) Line 648: space between 'binding' and 'regions'. 

      corrected.

      (8) Fig 4E: please make the solid lines thicker - they are currently difficult to see.

      We have made Figure 6C (former 4E) larger and the line thicker.

      (9) Line 666: location. 

      (10) Line 673: Individual. 

      (11) Figure S5, panel C graph title: should this be 'Relative'? 

      (12) Figure S7: What is 'GT'? Should this be 'WT'? 

      (9) to (12) have been corrected.

      (13) In addition to the data presented in Figure 3G, it would be nice to have a small table or Venn diagram summarizing the number of cyAbrB2 binding sites that fall into the different categories (full gene/operon; downstream of a gene; within a gene; promoter region). 

      In response to the comment, we noticed the categories we had applied (full gene/operon; downstream of a gene; within a gene; promoter region) were arbitrary. Therefore, we categorized transcriptional units (TUs) according to the extent of occupancy by cyAbrB2. (Figures 4B and 4C)

      (14) Line 280-281: suggest replacing 'mediates' with 'influences'. 'Mediates' sounds like a direct interaction (for which the evidence is not currently strong without some additional biochemical data), but 'influences' could better accommodate both direct and indirect possibilities. 

      (15) Line 410: it is not clear what this means. 

      We have omitted “As a result, DNA ~600-fold condensed DNA than 3C samples were ligated.”, as it does not give any information about the experimental procedure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript builds upon the authors' previous work on the cross-talk between transcription initiation and post-transcriptional events in yeast gene expression. These prior studies identified an mRNA 'imprinting' phenomenon linked to genes activated by the Rap1 transcription factor (TF), a surprising role for the Sfp1 TF in promoting RNA polymerase II (RNAPII) backtracking, and a role for the non-essential RNAPII subunits Rpb4/7 in the regulation of mRNA decay and translation. Here the authors aimed to extend these observations to provide a more coherent picture of the role of Sfp1 in transcription initiation and subsequent steps in gene expression. They provide evidence for (1) a physical interaction between Sfp1 and Rpb4, (2) Sfp1 binding and stabilization of mRNAs derived from genes whose promoters are bound by both Rap1 and Sfp1 and (3) an effect of Sfp1 on Rpb4 binding or conformation during transcription elongation. 

      Strengths: 

      This study provides evidence that a TF (yeast Sfp1), in addition to stimulating transcription initiation, can at some target genes interact with their mRNA transcripts and promote their stability. Sfp1 thus has a positive effect on two distinct regulatory steps. Furthermore, evidence is presented indicating that strong Sfp1 mRNA association requires both Rap1 and Sfp1 promoter binding and is increased at a sequence motif near the polyA track of many target mRNAs. Finally, they provide compelling evidence that Sfp1-bound mRNAs have higher levels of RNAPII backtracking and altered Rpb4 association or conformation compared to those not bound by Sfp1. 

      Weaknesses: 

      The Sfp1-Rpb4 association is supported only by a two-hybrid assay that is poorly described and lacks an important control. Furthermore, there is no evidence that this interaction is direct, nor are the interaction domains on either protein identified (or mutated to address function). 

      Indeed, our two hybrid, immunoprecipitation and imaging results do not allow us to conclusively discern whether the interaction between Rpb4 and Sfp1 is direct or indirect. While the interaction holds significance, we consider the direct versus indirect distinction to be of secondary importance in the context of this paper. In the current text we indicated that 'our two hybrid, immunoprecipitation and imaging results do not differentiate between a direct or indirect interactions' (see page 6, sentences highlighted in blue)

      The contention that Sfp1 nuclear export to the cytoplasm is transcription-dependent is not well supported by the experiments shown, which are not properly described in the text and are not accompanied by any primary data. 

      This section has been re-written for better clarity (see page 7). We note that this assay was originally developed and published by Lee, M. S., M. Henry, and P. A. Silver in their 1996 paper in G&D and has since been reported in numerous subsequent studies. Reassuringly, our conclusion is bolstered by the observation that Sfp1 binds to Pol II transcripts co-transcriptionally, suggesting that Sfp1 is exported in the context of the mRNA.

      The presence of Sfp1 in P-bodies is of unclear relevance and the authors do not ask whether Sfp1-bound mRNAs are also present in these condensates. 

      P-bodies consist of both RNA and proteins (reviewed in doi: 10.1021/acs.biochem.7b01162). The significance of this experiment lies in its contribution to further confirming the co-localization of Sfp1 with mRNAs and Rpb4. This observation could also yield valuable insights for future investigations into the role of Sfp1.

      Further analysis of Sfp1-bound mRNAs would be of interest, particularly to address the question of whether those from ribosomal protein genes and other growth-related genes that are known to display Sfp1 binding in their promoters are regulated (either stabilized or destabilized) by Sfp1. 

      Fig. 4A, C and D show that RP mRNAs become destabilized in sfp1Δ cells.

      The authors need to discuss, and ideally address, the apparent paradox that their previous findings showed that Rap1 acts to destabilize its downstream transcripts, i.e. that it has the opposite effect of Sfp1 shown here. 

      We would like to thank Reviewer 1 for this valuable comment. In the revised paper, we delved into our hypothesis suggesting that Rap1 is likely responsible for regulating the imprinting of other proteins, that, in turn, lead to the destabilization of mRNAs, such as Rpb4. See blue paragraph in page 20.

      Finally, recent studies indicate that the drugs used here to measure mRNA stability induce a strong stress response accompanied by rapid and complex effects on transcription. Their relevance to mRNA stability in unstressed cells is questionable. 

      Half-lives were determined mainly by the GRO analysis of optimally proliferating cells. This  method does not requires any drug or stressful treatment.  The results obtained by this method were consistent with those obtained after thiolutin addition. Using both methods, we discovered that disruption of Sfp1 results in substantial mRNA destabilization. Nevertheless, in our revised manuscript, we show results obtained by subjecting cells to a temperature shift to 42°C, a natural method to inhibit transcription. This approach to determine half-lives has been previously reported in our publications, such as Lotan et al. (2005, 2007) and Goler Baron et al. (2008). This may rule out effects of the drug on half-lives. Indeed, this assay clearly determine HL under heat stress. Thus it can clearly demonstrate that, at least during heat shock, Sfp1 stabilizes mRNAs. Since the results are similar to those obtained by the GRO method at 30oC, we concluded that Sfp1 stabilizes mRNA under optimal and hot conditions.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Kelbert et al. presents results on the involvement of the yeast transcription factor Sfp1 in the stabilisation of transcripts whose synthesis it stimulates. Sfp1 is known to affect the synthesis of a number of important cellular transcripts, such as many of those that code for ribosomal proteins. The hypothesis that a transcription factor can remain bound to the nascent transcript and affect its cytoplasmic half-life is attractive, but the methods used to demonstrate the half-life effects and the association of Sfp1 with cytoplasmic transcripts remain to be fully validated, as explained in my comments on the results below: 

      Comments on methodology and results: 

      (1) A two-hybrid-based assay for protein-protein interactions identified Sfp1, a transcription factor known for its effects on ribosomal protein gene expression, as interacting with Rpb4, a subunit of RNA polymerase II. Classical two-hybrid experiments depend on the presence of the tested proteins in the nucleus of yeast cells, suggesting that the observed interaction occurs in the nucleus. Unfortunately, the two-hybrid method cannot determine whether the interaction is direct or mediated by nucleic acids. 

      Indeed, our two hybrid, immunoprecipitation and imaging results do not allow us to conclusively discern whether the interaction between Rpb4 and Sfp1 is direct or indirect. While the interaction holds significance, we consider the direct versus indirect distinction to be of secondary importance in the context of this paper. In the current text we indicated that 'our two hybrid, immunoprecipitation and imaging results do not differentiate between a direct or indirect interactions' (see page 6)

      (2) Inactivation of nup49, a component of the nuclear pore complex, resulted in the redistribution of GFP-Sfp1 into the cytoplasm at the temperature non-permissive for the nup49-313 strain, suggesting that GFP-Sfp1 is a nucleo-cytoplasmic shuttling protein. This observation confirmed the dynamic nature of the nucleo-cytoplasmic distribution of Sfp1. For example, a similar redistribution to the cytoplasm was previously reported following rapamycin treatment and under starvation (Marion et al., PNAS 2004). In conjunction with the observation of an interaction with Rpb4, the authors observed slower nuclear import kinetics for GFP-Sfp1 in the absence of Rpb4 when cells were transferred to a glucose-containing medium after a period of starvation. Since the redistribution of GFP-Sfp1 was abolished in an rpb1-1/nup49-313 double mutant, the authors concluded that Sfp1 localisation to the cytoplasm depends on transcription. The double mutant yeast cells may show a variety of non-specific effects at the restrictive temperature, and whether transcription is required for Sfp1 cytoplasmic localisation remains incompletely demonstrated. 

      We agree with Reviewer 2 that any heat inactivation of a temperature-sensitive (ts) protein can lead to non-specific effects. It is evident that nup49-313 does not prevent Sfp1 export to the cytoplasm. In the case of rpb1-1, these non-specific effects are expected due to transcriptional arrest, which can eventually result in a reduction in protein content. However, this process takes some time, while the impact on export is more rapid. It is worth noting that this assay was developed and previously published by Pam Silver (Henry and Silver G&D 1996) and has been reported in many subsequent papers. Importantly, our conclusion is supported by the observation that Sfp1 binds both nascent RNA (co-transcriptionally) and mature mRNA (cytoplasmic). These observations, along with the reduced mRNA export upon transcription blocking, are consistent with our proposal that Sfp1 is exported in association with mRNA.

      (3) Under starvation conditions, which led to the presence of Sfp1 in the cytoplasm and have previously been correlated with a decrease in the transcription of Sfp1 target genes, the authors observed that a plasmid-based expressed GFP-Sfp1 accumulated in cytoplasmic foci. These foci were also labelled by P-body markers such as Dcp2 and Lsm1. The quality of the microscopic images provided does not allow to determine whether Rpb4-RFP colocalises with GFP-Sfp1. 

      The submitted PDF figure is of low quality. We believe that high quality figure of the final submission is convincing. 

      (4) To understand to which RNA Sfp1 might bind, the authors used an N-terminally tagged fusion protein in a cross-linking and purification experiment. This method identified 264 transcripts for which the CRAC signal was considered positive and which mostly correspond to abundant mRNAs, including 74 ribosomal protein mRNAs or metabolic enzyme-abundant mRNAs such as PGK1. The authors did not provide evidence for the specificity of the observed CRAC signal, in particular, what would be the background of a similar experiment performed without UV cross-linking. In a validation experiment, the presence of several mRNAs in a purified SFP1 fraction was measured at levels that reflect the relative levels of RNA in a total RNA extract. Negative controls showing that abundant mRNAs not found in the CRAC experiment were clearly depleted from the purified fraction with Sfp1 would be crucial to assessing the specificity of the observed protein-RNA interactions. The NON-CRAC+ selected mRNAs were enriched for genes whose expression was previously shown to be upregulated upon Sfp1 overexpression (Albert et al., 2019). The presence of unspliced RPL30 pre-mRNA in the Sfp1 purification was interpreted as a sign of co-transcriptional assembly of Sfp1 into mRNA, but in the absence of valid negative controls, this hypothesis would require further experimental validation.

      We would like to thank Reviewer 2 for bringing this issue up, as it helped us to clarify it in the revised paper.

      First, we emphasized in the Discussion that many CRAC+ genes do not fall into the category of highly transcribed genes. Please see more detailed discussion below.

      Secondly, we examined various features of the 264 genes - classified as CRAC+ - to estimate their specificity and biological significance. Our various experiments revealed that the CRAC+ genes represent a distinct group with many unique features.

      The biological significance of the 264 CRAC+ mRNAs was demonstrated by various experiments; all are inconsistent with technical flaws. In fact, all the experiments and analyses that we have pursued indicate the unique nature of the CRAC+ genes. Some examples are:

      (1) Fig. 2a and B show that most reads of CRAC+ mRNA were mapped to specific location – close the pA sites.

      (2) Fig. 2C shows that most reads of CRAC+ mRNA were mapped to specific RNA motif located near the 3’ ends of the mRNAs.

      (3) Most RiBi CRAC+ promoter contain Rap1 binding sites (p= 1.9x10-22), whiles the vast majority of RiBi non-CRAC+  promoters do not. (Fig. 3C).

      (4) Fig. 4A shows that RiBi CRAC+ mRNAs become destabilized due to Sfp1 deletion, whereas RiBi non-CRAC+ mRNAs do not. Fig. 4B shows similar results due to Sfp1 depletion.

      (5) Fig. 6B shows that the impact of Sfp1 on backtracking is substantially higher for CRAC+ than for non-CRAC+ genes. This is most clearly visible in RiBi genes.

      (6) Fig. 7A shows that the Sfp1-dependent changes along the transcription units is substantially more rigorous for CRAC+ than for non-CRAC+.

      (7) In Fig. S4B, the chromatin binding profile of Sfp1 is shown to be different for CRAC+ and non-CRAC+ genes.

      Taken together, the many unique features, in fact, any feature that we examined, indicate the specificity and significance of this group, demonstrating that our CRAC results are biologically significant.

      Most importantly, these genes do not all fall into the category of highly transcribed genes.  On the contrary, as depicted in Figure 6A (green dots), it is evident that CRAC+ genes exhibit a diverse range of Rpb3 ChIP and GRO signals. Furthermore, as illustrated in Figure 7A, when comparing CRAC+ to Q1 (the most highly transcribed genes), it becomes evident that the Rpb4/Rpb3 profile of CRAC+ genes behaves differently from the Q1 group. Evidently, despite the heterogeneous transcription of CRAC+ genes (as mentioned above), the Rpb4/Rpb3 profile decreases more substantially than that of the highly transcribed genes (Q1).  Moreover, despite similar expression levels among all RiBi mRNAs, only a portion of them binds Sfp1.

      Thus, all our results indicate that CRAC+ genes represent biologically significant group, irrespective of the expression of it members. In response to this comment, we included a new paragraph discussing the validity of our conclusions. See page 18, blue paragraph.

      (5) To address the important question of whether co-transcriptional assembly of Spf1 with transcripts could alter their stability, the authors first used a reporter system in which the RPL30 transcription unit is transferred to vectors under different transcriptional contexts, as previously described by the Choder laboratory (Bregman et al. 2011). While RPL30 expressed under an ACT1 promoter was barely detectable, the highest levels of RNA were observed in the context of the native upstream RPL30 sequence when Rap1 binding sites were also present. Sfp1 showed better association with reporter mRNAs containing Rap1 binding sites in the promoter region. However, removal of the Rap1 binding sites from the reporter vector also led to a drastic decrease in reporter mRNA levels. Whether the fraction of co-purified RNA is nuclear and co-transcriptional or not cannot be inferred from these results. 

      The proposed co-transcriptional binding of Sfp1 is based on the findings presented in Figure 5C and Figure S2D, as well as the observed binding of Sfp1 to transcripts containing introns, as shown in Figures 2D and 3B.  The results of Fig. 3 led us to the assertion that the "RNA-binding capacity of Sfp1 is regulated by Rap1-binding sites located at the promoter." We maintain our stance on this conclusion. Indeed, the Rap1 binding site does impact mRNA levels, as highlighted by Reviewer 2. However, "construct E," which possesses a promoter with a Rap1 binding site, exhibits lower transcript levels compared to "construct F," which lacks such a binding site in its promoter. Despite this difference in transcript levels, Sfp1 was able to pull down the former transcript but not the latter, even though expression of the former gene is relatively low. Thus, the results appear to be more reliant on the specific capacity of Sfp1 to interact with the transcript rather than on the transcript's expression level.

      (6) To complement the biochemical data presented in the first part of the manuscript, the authors turned to the deletion or rapid depletion of SFP1 and used labelling experiments to assess changes in the rate of synthesis, abundance, and decay of mRNAs under these conditions. An important observation was that in the absence of Sfp1, mRNAs encoding ribosomal protein genes not only had a reduced synthesis rate but also an increased degradation rate. This important observation needs careful validation, as genomic run-on experiments were used to measure half-lives, and this particular method was found to give results that correlated poorly with other measures of half-life in yeast (e.g. Chappelboim et al., 2022 for a comparison). Similarly, the use of thiolutin to block transcription as a method of assessing mRNA half-life has been reported to be problematic, as thiolutin can specifically inhibit the degradation of ribosomal protein mRNA (Pelechano & Perez-Ortin, 2008). Specific repressible reporters, such as those used by Baudrimont et al. (2017), would need to be tested to validate the effect of Sfp1 on the half-life of specific mRNAs. Also, it would be very difficult to infer from the images presented whether the rate of deadenylation is altered by Sfp1.

      Various methods exist for assessing mRNA half-lives (HLs), and each of them carries its own set of challenges and biases. Consequently, it becomes problematic to directly compare HL values of a specific mRNA when different methods are employed. The superiority of one particular method over others remains unclear (in my opinion). However, they exhibit a high degree of reliability when it comes to comparing different strains under the identical conditions using a single method.

      Estimating HLs through the GRO approach is a non-invasive method, applied on optimally proliferating cells, which has been employed in numerous publications. While no method is without its limitations, our experience along the years reassured approach to be among the most dependable. Our HL determination using thiolutin to block transcription provided results that were consistent with the values obtained by the GRO approach.

      Nevertheless, in our revised manuscript, we supplemented the HL data, obtain by thiolutin, with results obtained by subjecting cells to a temperature shift to 42°C, a natural method to block transcription in wild-type (WT) cells. This approach to determine HLs has been previously reported in our publications, such as Lotan et al. (2005, 2007) and Goler Baron et al. (2008). The new results are shown in Fig. S3B. They are consistent with our conclusion that Sfp1 stabilizes mRNAs.

      Using a repressible promoter to determine mRNA HL is, unfortunately, not suitable in this paper because the promoter itself is involved in HL regulation. This observation is supported by Bregman et al. (2011) and depicted in Fig. 3, which illustrates that the promoter is critical for mRNA imprinting, consequently regulating HL.

      (7) The effects of SFP1 on transcription were investigated by chromatin purification with Rpb3, a subunit of RNA polymerase, and the results were compared with synthesis rates determined by genomic run-on experiments. The decrease in polII presence on transcripts in the absence of SFP1 was not accompanied by a marked decrease in transcript output, suggesting an effect of Sfp1 in ensuring robust transcription and avoiding RNA polymerase backtracking. To further investigate the phenotypes associated with the depletion or absence of Sfp1, the authors examined the presence of Rpb4 along transcription units compared to Rpb3. One effect of spf1 deficiency was that this ratio, which decreased from the start of transcription towards the end of transcripts, increased slightly. The results presented are largely correlative and could arise from the focus on very specific types of mRNAs, such as those of ribosomal protein genes, which are sensitive to stress and are targeted by very active RNA degradation mechanisms activated, for example, under heat stress (Bresson et al., 2020). 

      Figure 7A illustrates a significant reduction in Rpb4/Rpb3 ratios along the transcription unit in WT cells. This reduction is notably more pronounced in CRAC+ genes compared to the highly transcribed quartile (Q1), which includes all ribosomal protein (RP) genes, and it is completely absent in sfp1∆ cells. Furthermore, it's important to highlight that the CRAC+ gene group displays a wide range of transcription rates, as measured by either Rpb3 ChIP or GRO (Figure 6A). Given these observations, we do not think that heightened sensitivity of RP mRNA degradation in response to stress is responsible for the pronounced difference in the configuration of the Pol II elongation complex that is detected in CRAC+ genes, mainly because this experiment was performed under standard (non-stress) culture conditions.

      Correlative studies are particularly informative when a gene mutation eliminates a correlation, and this is precisely the type of study depicted in Figure 7B-C. The correlations shown in these panels are dependent on Sfp1. Indeed, RP genes are sensitive to stress. However, we used non-stressed conditions. Furthermore, CRAC+ genes did not display any apparent unusual destabilization but rather exhibited higher (not lower) mRNA stability compared to non-CRAC+ genes (Figure 7C).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The paper combines phenotypic and genomic analyses of the "sheltered load" (i.e. the accumulation of deleterious mutations linked to S-loci that are hidden from selection in the homozygous state) in Arabidopsis. The authors compare results to previous theoretical predictions concerning the extent of the load in dominant vs recessive S-alleles, and further develop exciting theory to reconcile differences between previous theory and observed results.

      Strengths:

      This is a very nice combination of theory and data to address a classical question in the field.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The "genetic load" is a poorly defined concept in general, and its quantification via the number of putatively deleterious mutations is quite difficult. Furthermore counting up the number of derived mutations at fully constrained nucleotides may not be a great estimate of the load, and certainly does not allow for evaluation of recessivity -- a concept critical to ideas concerning the sheltered load. Alternative approaches - including estimating the severity of mutations - could be helpful as well. This imperfection in available approaches to test theory must be acknowledged more strongly by the authors.

      As suggested by the reviewer, we implemented alternative approaches to estimate the severity of deleterious mutations and now report the results of SNPeff and

      SIFT4G analyses in Table S6. The results we obtained with these other metrics were overall very similar to those based on our previous counting of mutations at 0-fold and 4-fold degenerate sites. More generally, we tried to improve the presentation of our strategy to estimate the genetic load (clarified in lines 262-268, 271, 292-295, 297. In particular, we made it clear that our population genetic analysis cannot assess the recessivity of the observed mutations (lines 428-434).

      Reviewer #2 (Public Review):

      Summary:

      This study looks into the complex dominance patterns of S-allele incompatibilities in Brassicaceae, through which it attempts to learn more about the sheltering of deleterious load. I found several weak points in the analyses that diminished my excitement about the results. In particular, the way in which deleterious mutations were classified lacked the ability to distinguish the severity of the mutations and thus their expected associated dominance.

      First, we would like to clarify that our goal with this study is NOT to learn something about dominance of the linked deleterious mutations (we can not). Instead, we compare the accumulation of deleterious mutations linked to dominant vs recessive S-ALLELES, but are agnostic regarding the dominance level of the LINKED mutations themselves. The rationale is that the different intensities of natural selection between dominant vs recessive S-alleles provide a powerful way to examine the process by which deleterious mutations are sheltered in general. We further clarified this aspect on lines 70-73 and 399-401.

      Second, as mentioned above in response to Reviewer 1, we complemented the analysis by predicting the severity of the deleterious mutations by SIFT4G and SNPeff. The results were largely consistent, with the exception that the number of sites included in SIFT4G was low, such that the statistical power was reduced (lines 296-300).

      Furthermore, the simulation approach could have provided this exact sort of insight but was not designed to do so, making this comparison to the empirical data also less than exciting for me.

      As explained above, studying dominance of the linked mutations we observed is an interesting research question (albeit a difficult one), but it was not our goal here. Instead, our study was designed as an empirical test of the predictions presented in Llaurens et al (2009), and we re-analysed some aspects of the model outcome to illustrate our points.

      We now better explain that we based our choice of parameters on the fact that in the theoretical study by Llaurens et al (2009), recessive deleterious mutations are predicted to accumulate in a much more straightforward manner (line 316-318).

      We now dedicate a paragraph of the discussion to explain how our stochastic simulations could be improved, and acknowledge that a full exploration of the interaction between dominance of the S-alleles and dominance of the linked deleterious mutations would be an interesting follow-up - albeit beyond the scope of our study (line 437-441).

      Major and minor comments:

      I think the introduction (or somewhere before we dive into it in the results) of the dominance hierarchy for the S-alleles needs a more in-depth explanation. Not being familiar with this beforehand really made this paper inaccessible to me until I then went to find out more before continuing. I would expect this paper to be broad enough that self-contained information makes it accessible to all readers. For example, lines 110-115 could be in the Introduction.

      We thank the reviewer for this useful remark. We now give a more comprehensive description of the dominance hierarchy and introduce the classes of dominance in A. lyrata already in the introduction, on lines 64-70.

      Along with my above comment, perhaps it is not my place to comment, but I find the paper not of a broad enough scope to be of interest to a broad readership. This S-allele dominance system is more than simple balancing selection, it is a very complex and specific form of dominance between several haplotypes, and the mechanism of dominance does not seem to be genetic. I am not sure that it thus extrapolates to broad comments on general dominance and balancing selection, e.g. it would not be the same as considering inversions and this form of balancing selection where we also expect recessive deleterious mutations to accumulate.

      We disagree with these interpretations by the reviewer, for two reasons:

      First, the mechanism of dominance is actually entirely genetic. In fact, we uncovered some years ago that it is based on the molecular interaction between small non-coding RNAs from dominant alleles and their target sites on recessive alleles (Durand et al. Science 2014, see lines 68-70). If there is something specific with this system, it is that the dominance phenomenon is better understood at the mechanistic level than in most other cases, but the resulting phenomenon in itself (a dominance hierarchy) is rather common.

      Second, the kind of variation in the intensity of linked selection created by this mechanism is actually a general phenomenon, so our results have broad relevance beyond our particular study system. We modified the introduction to explain this point

      more clearly, highlighting in particular the fact that the situation we study closely resembles the case of sex chromosomes, where X (or Z) chromosomes are genetically recessive and Y (or W) chromosomes are genetically dominant. We cite this example in lines 83-87 of the introduction and also several well-studied other examples on lines 480-489 of the discussion.

      It would have been particularly interesting, or a nice addition, to see deleterious mutations classed by something like SNPeff or GERP where you can have different classes of moderate to severe deleterious variants, which we would expect also to be more recessive the more deleterious they are. In line with my next comment on the simulations, I think relative differences between mutations expected to be more or less dominant may be even more insightful into the process of sheltering which may or may not be going on here.

      We agree with the reviewer, and as detailed above we have now integrated such analyses with SNPeff and SIFT4G (Table S6). These new results reinforce our conclusion that while S-allele dominance influences the fixation of deleterious mutations, it has no effect on their total number. See lines 270-272 and 296-300.

      In the simulations, h=0 and s=0.01 (as in Figure 5) for all deleterious mutations seems overly simplistic, and at the convenient end for realistic dominance. I think besides recessive lethals which we expect to be close to h=0 would have a much larger selection coefficient, and other deleterious mutations would only be partially recessive at such an s value. I expect this would change some of the simulation results seen, though to what degree I am not certain. It would be nice to at least check the same exact results for h=0.3 or 0.2 (or additionally also for recessive lethals, e.g. h=0 and s=-0.9). I would also disagree with the statement in line 677, many studies have shown, particularly those on balancing selection, that partially recessive deleterious mutations are not eliminated by natural selection and do play a role in population genetic dynamics. I am also not surprised that extinction was found for higher s values when the mutation rate for such mutations was very high and the distribution of s values was constant. An influx of such highly deleterious mutations is unlikely to ever let a population survive, yet that does NOT mean that in nature, the rare influx of such mutations does lead to them being sheltered. I find overall that the simulation results contribute very little, to none, to this paper, as without something more realistic, like a simultaneous distribution of s and h values, you cannot say which, if any class of these mutations are the ones expected to accumulate because of S-allele dominance.

      We understand that the previous version of our manuscript was confusing between dominance of the S-alleles and dominance of the linked deleterious mutations. We clarified that our study focuses on the effect of the former only (lines 99, 263-264 and 581-583).

      We agree that a complete exploration of the interaction between dominance of the S-alleles and dominance of the linked mutations being sheltered would have been an asset, but as explained above this is not the focus of our study. The previous work by Llaurens et al (2009) has already established that deleterious mutations can fix within S-allele lineages, especially when linked to dominant S-alleles, and when the number of S-alleles is large. Under the conditions they examined, deleterious mutations were much more strongly eliminated if not fully recessive (h=0 vs h=0.2), so for the present study we decided to simulate fully recessive mutations only. We now formally acknowledge the possibility that some complex interaction may take place between dominance of the S-alleles and dominance of the linked deleterious mutations (lines 440-442). However, as explained above we feel that fully exploring this complex interaction would require a detailed investigation, which is clearly beyond the scope of the present study.

      Rather they only show the disappointing or less exciting result that fully recessive, weakly deleterious mutations (which I again think do not even exist in nature as I said above) have minor, to no effect across the classes of S-allele dominance. They provide no insight into whether any type of recessive deleterious mutation can accumulate under the S-allele dominance hierarchy, and that is the interesting question at hand. I would either remove these simulations or redo them in another approach. The authors never mention what simulation approach was used, so I can only assume this is custom, in-house code. Yet I do not find that code provided on the github page. I do not know if the lack of a distribution for h and s values is then a choice or a programming limitation, but I see it as one that should be overcome if these simulations are meant to be meaningful to the results of the study.

      The code we used (in C) was adapted from the previous study by Llaurens et al. (2009), which at the time was not deposited in a data repertory, unfortunately. With the agreement of the authors of that study, this code is now available on Github:

      (https://github.com/leveveaudrey/model_ssi_Llaurens; line 723).

      It is correct that our simulations were not aimed at determining whether “any type of recessive deleterious mutation can accumulate”, but we strongly believe that they help interpreting the observations made in the genomic data.

      Recommendations for the authors:

      Notes from the editor:

      I found Table 1 confusing, with column headings of observed proportion but perhaps numbers reflecting counts.

      Thank you for pointing out this confusion. There was indeed an error in the last column, which we have now corrected.

      I found Figure 2 a bit hard to parse, with the vertical lines being unclear and the x-axis ticks of insufficient resolution to evaluate the physical extent of the signals.

      We increased the size of the label on the x-axis and detailed it on the Figure 2, which is now hopefully more clear. Moreover, we increase the size of the vertical lines.

      Finally, I wonder, given the rapid decay of signal in lyrata, whether 25kb is the right choice for evaluating load and whether the pattern may look different on a smaller scale.

      It is true that the signal decays rapidly in A. lyrata, as can be seen in the haplotype structure analysis and in line with our previous analysis of the same populations Le Veve et al (MBE 2023; in this study we explored the effect of the choice of the size of the chromosomal region analyzed; lines 266-269). However, for the sake of comparison, we prefer to stick to the same window size. The fact that we still see an effect of dominance in spite of the lower statistical power associated with the more rapid decay (because a smaller number of genes is expected to be impacted) actually reinforces our conclusions.

      Reviewer #1 (Recommendations For The Authors):

      I have a few additional suggestions to improve the manuscript.

      (1) How does the load linked to the S-locus compare to that observed in other genomic regions? It would be useful to provide a comparison of the results quantified in Figures three and four to comparable genomic regions unlinked to the S-locus. How severe is the linked load?

      This comparison to the genomic background was actually the core of our previous study (Le Veve et al MBE 2023), which was based on the same populations. This analysis revealed that polymorphism of the 0-fold degenerate sites was more than twice higher in the 25kb immediately flanking the S-locus than in a series of 100 unlinked control regions. Here, the main focus of the present study is on the effect of linkage to particular S-alleles (which was not possible previously because haplotypes had to be phased).

      (2) Details of the GLM for data underlying Figures 3 and 4 are somewhat unclear. Is the key explanatory variable (Dominance) treated as continuous? Categorical? Ordinal etc…

      Dominance is considered as a continuous variable. We specify this in line 162 of the results, in the legends of Figures 3 and 4, in the Material and Method (lines 627 and 660) and in the legend of Table S4.

      (3) I had some trouble understanding the two different p-values in columns five and six of table one. Please provide more detail.

      We understand that the two p-values in Table 1 were confusing. The first was related to the binomial test and the second to the permutation test. To be consistent with the rest of the manuscript, we conserved only the p-value of the permutation test.

      (4) As mentioned in the "weaknesses" above, the authors should be more clear about what they are quantifying. They are explicitly counting the number of variants at 0-fold degenerate sites as a proxy for the genetic load. How good this proxy is is unclear. The most egregious misstatement here was on line 314 in which they make reference to the "total load." However, this limitation should be acknowledged throughout the manuscript and deserves more attention in the methods and discussion.

      As mentioned above, we now integrate additional methods to define and quantify the load (SIFT4G and SNPeff), which reinforced our previous conclusions (lines 271-272, 297-302).

      We clarified our wording and replaced the mention of “total load” by “mean number of linked deleterious mutations per copy of S-allele” (line 324-325). In the discussion we tried to better explain the limitations of approaches to estimate the genetic load (line 431-437).

      Reviewer #2 (Recommendations For The Authors):

      Line 60, it should be specified that this is only for recessive deleterious mutations.

      Non-recessive deleterious mutations would certainly not be expected to accumulate.

      As explained in details above, the question of whether and how non-recessive deleterious mutations can accumulate when linked to the S-locus is difficult and would in itself deserve a full treatment, which is clearly beyond the scope of the present study. We clarified this point on line 56.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major comments (Public Reviews)

      Generality of grid cells

      We appreciate the reviewers’ concern regarding the generality of our approach, and in particular for analogies in nonlinear spaces. In that regard, there are at least two potential directions that could be pursued. One is to directly encode nonlinear structures (such as trees, rings, etc.) with grid cells, to which DPP-A could be applied as described in our model. The TEM model [1] suggests that grid cells in the medial entorhinal may form a basis set that captures structural knowledge for such nonlinear spaces, such as social hierarchies and transitive inference when formalized as a connected graph. Another would be to use eigen-decomposition of the successor representation [2], a learnable predictive representation of possible future states that has been shown by Stachenfield et al. [3] to provide an abstract structured representation of a space that is analogous to the grid cell code. This general-purpose mechanism could be applied to represent analogies in nonlinear spaces [4], for which there may not be a clear factorization in terms of grid cells (i.e., distinct frequencies and multiple phases within each frequency). Since the DPP-A mechanism, as we have described it, requires representations to be factored in this way it would need to be modified for such purpose. Either of these approaches, if successful, would allow our model to be extended to domains containing nonlinear forms of structure. To the extent that different coding schemes (i.e., basis sets) are needed for different forms of structure, the question of how these are identified and engaged for use in a given setting is clearly an important one, that is not addressed by the current work. We imagine that this is likely subserved by monitoring and selection mechanisms proposed to underlie the capacity for selective attention and cognitive control [5], though the specific computational mechanisms that underlie this function remain an important direction for future research. We have added a discussion of these issues in Section 6 of the updated manuscript.

      (1) Whittington, J.C., Muller, T.H., Mark, S., Chen, G., Barry, C., Burgess, N. and Behrens, T.E., 2020. The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell, 183(5), pp.1249-1263.

      (2) Dayan, P., 1993. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4), pp.613-624.

      (3) Stachenfeld, K.L., Botvinick, M.M. and Gershman, S.J., 2017. The hippocampus as a predictive map. Nature neuroscience, 20(11), pp.1643-1653.

      (4) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      (5) Shenhav, A., Botvinick, M.M. and Cohen, J.D., 2013. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 79(2), pp.217-240. Biological plausibility of DPP-A

      We appreciate the reviewers’ interest in the biological plausibility of our model, and in particular the question of whether and how DPP-A might be implemented in a neural network. In that regard, Bozkurt et al. [1] recently proposed a biologically plausible neural network algorithm using a weighted similarity matrix approach to implement a determinant maximization criterion, which is the core idea underlying the objective function we use for DPP-A, suggesting that the DPP-A mechanism we describe may also be biologically plausible. This could be tested experimentally by exposing individuals (e.g., rodents or humans) to a task that requires consistent exposure to a subregion, and evaluating the distribution of activity over the grid cells. Our model predicts that high frequency grid cells should increase their firing rate more than low frequency cells, since the high frequency grid cells maximize the determinant of the covariance matrix of the grid cell embeddings. It is also worth noting that Frankland et al. [2] have suggested that the use of DPPs may also help explain a mutual exclusivity bias observed in human word learning and reasoning. While this is not direct evidence of biological plausibility, it is consistent with the idea that the human brain selects representations for processing that maximize the volume of the representational space, which can be achieved by maximizing the DPP-A objective function defined in Equation 6. We have added a comment to this effect in Section 6 of the updated manuscript.

      (1) Bozkurt, B., Pehlevan, C. and Erdogan, A., 2022. Biologically-plausible determinant maximization neural networks for blind separation of correlated sources. Advances in Neural Information Processing Systems, 35, pp.13704-13717.

      (2) Frankland, S. and Cohen, J., 2020. Determinantal Point Processes for Memory and Structured Inference. In CogSci.

      Simplicity of analogical problem and comparison to other models using this task

      First, we would like to point out that analogical reasoning is a signatory feature of human cognition, which supports flexible and efficient adaptation to novel inputs that remains a challenge for most current neural network architectures. While humans can exhibit complex and sophisticated forms of analogical reasoning [1, 2, 3], here we focused on a relatively simple form, that was inspired by Rumelhart’s parallelogram model of analogy [4,5] that has been used to explain traditional human verbal analogies (e.g., “king is to what as man is to woman?”). Our model, like that one, seeks to explain analogical reasoning in terms of the computation of simple Euclidean distances (i.e., A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript. It is worth noting that, despite the seeming simplicity of this construction, we show that standard neural network architectures (e.g., LSTMs and transformers) struggle to generalize on such tasks without the use of the DPP-A mechanism.

      Second, we are not aware of any previous work other than Frankland et al. [6] cited in the first paragraph of Section 2.2.1, that has examined the capacity of neural network architectures to perform even this simple form of analogy. The models in that study were hardcoded to perform analogical reasoning, whereas we trained models to learn to perform analogies. That said, clearly a useful line of future work would be to scale our model further to deal with more complex forms of representation and analogical reasoning tasks [1,2,3]. We have noted this in Section 6 of the updated manuscript.

      (1) Holyoak, K.J., 2012. Analogy and relational reasoning. The Oxford handbook of thinking and reasoning, pp.234-259.

      (2) Webb, T., Fu, S., Bihl, T., Holyoak, K.J. and Lu, H., 2023. Zero-shot visual reasoning through probabilistic analogical mapping. Nature Communications, 14(1), p.5144.

      (3) Lu, H., Ichien, N. and Holyoak, K.J., 2022. Probabilistic analogical mapping with semantic relation networks. Psychological review.

      (4) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (5) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (6) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      Clarification of DPP-A attentional modulation

      We would like to clarify several concerns regarding the DPP-A attentional modulation. First, we would like to make it clear that ω is not meant to correspond to synaptic weights, and thank the reviewer for noting the possibility for confusion on this point. It is also distinct from a biasing input, which is often added to the product of the input features and weights. Rather, in our model ω is a vector, and diag (ω) converts it into a matrix with ω as the diagonal of the matrix, and the rest entries are zero. In Equation 6, diag(ω) is matrix multiplied with the covariance matrix V, which results in elementwise multiplication of ω with column vectors of V, and hence acts more like gates. We have noted this in Section 2.2.2 and have changed all instances of “weights (ω)” to “gates (ɡ)” in the updated manuscript. We have also rewritten the definition of Equation 6 and uses of it (as in Algorithm 1) to depict the use of sigmoid nonlinearity (σ) to , so that the resulting values are always between 0 and 1.

      Second, we would like to clarify that we don’t compute the inner product between the gates ɡ and the grid cell embeddings x anywhere in our model. The gates within each frequency were optimized (independent of the task inputs), according to Equation 6, to compute the approximate maximum log determinant of the covariance matrix over the grid cell embeddings individually for each frequency. We then used the grid cell embeddings belonging to the frequency that had the maximum within-frequency log determinant for training the inference module, which always happened to be grid cells within the top three frequencies. Author response image 1 (also added to the Appendix, Section 7.10 of the updated manuscript) shows the approximate maximum log determinant (on the y-axis) for the different frequencies (on the x-axis).

      Author response image 1.

      Approximate maximum log determinant of the covariance matrix over the grid cell embeddings (y-axis) for each frequency (x-axis), obtained after maximizing Equation 6.

      Third, we would like to clarify our interpretation of why DPP-A identified grid cell embeddings corresponding to the highest spatial frequencies, and why this produced the best OOD generalization (i.e., extrapolation on our analogy tasks). It is because those grid cell embeddings exhibited greater variance over the training data than the lower frequency embeddings, while at the same time the correlations among those grid cell embeddings were lower than the correlations among the lower frequency grid cell embeddings. The determinant of the covariance matrix of the grid cell embeddings is maximized when the variances of the grid cell embeddings are high (they are “expressive”) and the correlation among the grid cell embeddings is low (they “cover the representational space”). As a result, the higher frequency grid cell embeddings more efficiently covered the representational space of the training data, allowing them to efficiently capture the same relational structure across training and test distributions which is required for OOD generalization. We have added some clarification to the second paragraph of Section 2.2.2 in the updated manuscript. Furthermore, to illustrate this graphically, Author response image 2 (added to the Appendix, Section 7.10 of the updated manuscript) shows the results after the summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for 3 representative frequencies (left, middle and right panels showing results for the lowest, middle and highest grid cell frequencies, respectively, of the 9 used in the model), obtained after maximizing Equation 6 for each grid cell frequency. The color code indicates the responsiveness of the grid cells to different X and Y locations in the input space (lighter color corresponding to greater responsiveness). Note that the dark blue area (denoting regions of least responsiveness to any grid cell) is greatest for the lowest frequency and nearly zero for the highest frequency, illustrating that grid cell embeddings belonging to the highest frequency more efficiently cover the representational space which allows them to capture the same relational structure across training and test distributions as required for OOD generalization.

      Author response image 2.

      Each panel shows the results after summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for a particular frequency, obtained after maximizing Equation 6 for each grid cell frequency. The left, middle, and right panels show results for the lowest, middle, and highest grid cell frequencies, respectively, of the 9 used in the model. Lighter color in each panel corresponds to greater responsiveness of grid cells at that particular location in the 2d space.

      Finally, we would like to clarify how the DPP-A attentional mechanism is different from the attentional mechanism in the transformer module, and why both are needed for strong OOD generalization. Use of the standard self-attention mechanism in transformers over the inputs (i.e., A, B, C, and D for the analogy task) in place of DPP-A would lead to weightings of grid cell embeddings over all frequencies and phases. The objective function for the DPP-A represents an inductive bias, that selectively assigns the greatest weight to all grid cell embeddings (i.e., for all phases) of the frequency for which the determinant of the covariance matrix is greatest computed over the training space. The transformer inference module then attends over the inputs with the selected grid cell embeddings based on the DPP-A objective. We have added a discussion of this point in Section 6 of the updated manuscript.

      We would like to thank the reviewers for their recommendations. We have tried our best to incorporate them into our updated manuscript. Below we provide a detailed response to each of the recommendations grouped for each reviewer.

      Reviewer #1 (Recommendations for the authors)

      (1) It would be helpful to see some equations for R in the main text.

      We thank the reviewer for this suggestion. We have now added some equations explaining the working of R in Section 2.2.3 of the updated manuscript.

      (2) Typo: p 11 'alongwith' -> 'along with'

      We have changed all instances of ‘alongwith’ to ‘along with’ in the updated manuscript.

      (3) Presumably, this is related to equivariant ML - it would be helpful to comment on this.

      Yes, this is related to equivariant ML, since the properties of equivariance hold for our model. Specifically, the probability distribution after applying softmax remains the same when the transformation (translation or scaling) is applied to the scores for each of the answer choices obtained from the output of the inference module, and when the same transformation is applied to the stimuli for the task and all the answer choices before presenting as input to the inference module to obtain the scores. We have commented on this in Section 2.2.3 of the updated manuscript.

      Reviewer #2 (Recommendations for the authors)

      (1) Page 2 - "Webb et al." temporal context - they should also cite and compare this to work by Marc Howard on generalization based on multi-scale temporal context.

      While we appreciate the important contributions that have been made by Marc Howard and his colleagues to temporal coding and its role in episodic memory and hippocampal function, we would like to clarify that his temporal context model is unrelated to the temporal context normalization developed by Webb et al. (2020) and mentioned on Page 2. The former (Temporal Context Model) is a computational model that proposes a role for temporal coding in the functions of the medial temporal lobe in support of episodic recall, and spatial navigation. The latter (temporal context normalization) is a normalization procedure proposed for use in training a neural network, similar to batch normalization [1], in which tensor normalization is applied over the temporal instead of the batch dimension, which is shown to help with OOD generalization. We apologize for any confusion engendered by the similarity of these terms, and failure to clarify the difference between these, that we have now attempted to do in a footnote on Page 2.

      Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

      (2) page 3 - "known to be implemented in entorhinal" - It's odd that they seem to avoid citing the actual biology papers on grid cells. They should cite more of the grid cell recording papers when they mention the entorhinal cortex (i.e. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Giocomo et al., 2011; Brandon et al., 2011).

      We have now cited the references mentioned below, on page 3 after the phrase “known to be implemented in entohinal cortex”.

      (1) Barry, C., Hayman, R., Burgess, N. and Jeffery, K.J., 2007. Experience-dependent rescaling of entorhinal grids. Nature neuroscience, 10(6), pp.682-684.

      (2) Stensola, H., Stensola, T., Solstad, T., Frøland, K., Moser, M.B. and Moser, E.I., 2012. The entorhinal grid map is discretized. Nature, 492(7427), pp.72-78.

      (3) Giocomo, L.M., Hussaini, S.A., Zheng, F., Kandel, E.R., Moser, M.B. and Moser, E.I., 2011. Grid cells use HCN1 channels for spatial scaling. Cell, 147(5), pp.1159-1170.

      (4) Brandon, M.P., Bogaard, A.R., Libby, C.P., Connerney, M.A., Gupta, K. and Hasselmo, M.E., 2011. Reduction of theta rhythm dissociates grid cell spatial periodicity from directional tuning. Science, 332(6029), pp.595-599.

      (3) To enhance the connection to biological systems, they should cite more of the experimental and modeling work on grid cell coding (for example on page 2 where they mention relational coding by grid cells). Currently, they tend to cite studies of grid cell relational representations that are very indirect in their relationship to grid cell recordings (i.e. indirect fMRI measures by Constaninescu et al., 2016 or the very abstract models by Whittington et al., 2020). They should cite more papers on actual neurophysiological recordings of grid cells that suggest relational/metric representations, and they should cite more of the previous modeling papers that have addressed relational representations. This could include work on using grid cell relational coding to guide spatial behavior (e.g. Erdem and Hasselmo, 2014; Bush, Barry, Manson, Burges, 2015). This could also include other papers on the grid cell code beyond the paper by Wei et al., 2015 - they could also cite work on the efficiency of coding by Sreenivasan and Fiete and by Mathis, Herz, and Stemmler.

      We thank the reviewer for bringing the additional references to our attention. We have cited the references mentioned below on page 2 of the updated manuscript.

      (1) Erdem, U.M. and Hasselmo, M.E., 2014. A biologically inspired hierarchical goal directed navigation model. Journal of Physiology-Paris, 108(1), pp.28-37.

      (2) Sreenivasan, S. and Fiete, I., 2011. Grid cells generate an analog error-correcting code for singularly precise neural computation. Nature neuroscience, 14(10), pp.1330-1337.

      (3) Mathis, A., Herz, A.V. and Stemmler, M., 2012. Optimal population codes for space: grid cells outperform place cells. Neural computation, 24(9), pp.2280-2317.

      (4) Bush, D., Barry, C., Manson, D. and Burgess, N., 2015. Using grid cells for navigation. Neuron, 87(3), pp.507-520

      (4) Page 3 - "Determinantal Point Processes (DPPs)" - it is rather annoying that DPP is defined after DPP-A is defined. There ought to be a spot where the definition of DPP-A is clearly stated in a single location.

      We agree it makes more sense to define Determinantal Point Process (DPP) before DPP-A. We have now rephrased the sentences accordingly. In the “Abstract”, the sentence now reads “Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), which we call DPP attention (DPP-A) - a transformation that ensures maximum sparseness in the coverage of that space.” We have also modified the second paragraph of the “Introduction”. The modified portion now reads “b) an attentional objective inspired from Determinantal Point Processes (DPPs), which are probabilistic models of repulsion arising in quantum physics [1], to attend to abstract representations that have maximum variance and minimum correlation among them, over the training data. We refer to this as DPP attention or DPP-A.” Due to this change, we removed the last sentence of the fifth paragraph of the “Introduction”.

      (1) Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

      (5) Page 3 - "the inference module R" - there should be some discussion about how this component using LSTM or transformers could relate to the function of actual brain regions interacting with entorhinal cortex. Or if there is no biological connection, they should state that this is not seen as a biological model and that only the grid cell code is considered biological.

      While we agree that the model is not construed to be as specific about the implementation of the R module, we assume that — as a standard deep learning component — it is likely to map onto neocortical structures that interact with the entorhinal cortex and, in particular, regions of the prefrontal-posterior parietal network widely believed to be involved in abstract relational processes [1,2,3,4]. In particular, the role of the prefrontal cortex in the encoding and active maintenance of abstract information needed for task performance (such as rules and relations) has often been modeled using gated recurrent networks, such as LSTMs [5,6], and the posterior parietal cortex has long been known to support “maps” that may provide an important substrate for computing complex relations [4]. We have added some discussion about this in Section 2.2.3 of the updated manuscript.

      (1) Waltz, J.A., Knowlton, B.J., Holyoak, K.J., Boone, K.B., Mishkin, F.S., de Menezes Santos, M., Thomas, C.R. and Miller, B.L., 1999. A system for relational reasoning in human prefrontal cortex. Psychological science, 10(2), pp.119-125.

      (2) Christoff, K., Prabhakaran, V., Dorfman, J., Zhao, Z., Kroger, J.K., Holyoak, K.J. and Gabrieli, J.D., 2001. Rostrolateral prefrontal cortex involvement in relational integration during reasoning. Neuroimage, 14(5), pp.1136-1149.

      (3) Knowlton, B.J., Morrison, R.G., Hummel, J.E. and Holyoak, K.J., 2012. A neurocomputational system for relational reasoning. Trends in cognitive sciences, 16(7), pp.373-381.

      (4) Summerfield, C., Luyckx, F. and Sheahan, H., 2020. Structure learning and the posterior parietal cortex. Progress in neurobiology, 184, p.101717.

      (5) Frank, M.J., Loughry, B. and O’Reilly, R.C., 2001. Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cognitive, Affective, & Behavioral Neuroscience, 1, pp.137-160.

      (6) Braver, T.S. and Cohen, J.D., 2000. On the control of control: The role of dopamine in regulating prefrontal function and working memory. Control of cognitive processes: Attention and performance XVIII, (2000).

      (6) Page 4 - "Learned weighting w" - it is somewhat confusing to use "w" as that is commonly used for synaptic weights, whereas I understand this to be an attentional modulation vector with the same dimensionality as the grid cell code. It seems more similar to a neural network bias input than a weight matrix.

      We refer to the first paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (7) Page 4 - "parameterization of w... by two loss functions over the training set." - I realize that this has been stated here, but to emphasize the significance to a naïve reader, I think they should emphasize that the learning is entirely focused on the initial training space, and there is NO training done in the test spaces. It's very impressive that the parameterization is allowing generalization to translated or scaled spaces without requiring ANY training on the translated or scaled spaces.

      We have added the sentence “Note that learning of parameter occurs only over the training space and is not further modified during testing (i.e. over the test spaces)” to the updated manuscript.

      (8) Page 4 - "The first," - This should be specific - "The first loss function"

      We have changed it to “The first loss function” in the updated manuscript.

      (9) Page 4 - The analogy task seems rather simplistic when first presented (i.e. just a spatial translation to different parts of a space, which has already been shown to work in simulations of spatial behavior such as Erdem and Hasselmo, 2014 or Bush, Barry, Manson, Burgess, 2015). To make the connection to analogy, they might provide a brief mention of how this relates to the analogy space created by word2vec applied to traditional human verbal analogies (i.e. king-man+woman=queen).

      We agree that the analogy task is simple, and recognize that grid cells can be used to navigate to different parts of space over which the test analogies are defined when those are explicitly specified, as shown by Erdem and Hasselmo (2014) and Bush, Barry, Manson, and Burgess (2015). However, for the analogy task, the appropriate set of grid cell embeddings must be identified that capture the same relational structure between training and test analogies to demonstrate strong OOD generalization, and that is achieved by the attentional mechanism DPP-A. As suggested by the reviewer’s comment, our analogy task is inspired by Rumelhart’s parallelogram model of analogy [1,2] (and therefore similar to traditional human verbal analogies) in as much as it involves differences (i.e A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript.

      (1) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (2) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (10) Page 5 - The variable "KM" is a bit confusing when it first appears. It would be good to re-iterate that K and M are separate points and KM is the vector between these points.

      We apologize for the confusion on this point. KM is meant to refer to an integer value, obtained by multiplying K and M, which is added to both dimensions of A, B, C and D, which are points in ℤ2, to translate them to a different region of the space. K is an integer value ranging from 1 to 9 and M is also an integer value denoting the size of the training region, which in our implementation is 100. We have clarified this in Section 2.1.1 of the updated manuscript.

      (11) Page 5 - "two continuous dimensions (Constantinescu et al._)" - this ought to give credit to the original study showing the abstract six-fold rotational symmetry for spatial coding (Doeller, Barry and Burgess).

      We have now cited the original work by Doeller et al. [1] along with Constantinescu et al. (2016) in the updated manuscript after the phrase “two continuous dimensions” on page 5.

      (1) Doeller, C.F., Barry, C. and Burgess, N., 2010. Evidence for grid cells in a human memory network. Nature, 463(7281), pp.657-661.

      (12) Page 6 - Np=100. This is done later, but it would be clearer if they right away stated that Np*Nf=900 in this first presentation.

      We have now added this sentence after Np=100. “Hence Np*Nf=900, which denotes the number of grid cells.”

      (13) Page 6 - They provide theorem 2.1 on the determinant of the covariance matrix of the grid code, but they ought to cite this the first time this is mentioned.

      We have cited Gilenwater et al. (2012) before mentioning theorem 2.1. The sentence just before that reads “We use the following theorem from Gillenwater et al. (2012) to construct :”

      (14) Page 6 - It would greatly enhance the impact of the paper if they could give neuroscientists some sense of how the maximization of the determinant of the covariance matrix of the grid cell code could be implemented by a biological circuit. OR at least to show an example of the output of this algorithm when it is used as an inner product with the grid cell code. This would require plotting the grid cell code in the spatial domain rather than the 900 element vector.

      We refer to our response above to the topic “Biological plausibility of DPP-A” and second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contain our responses to this issue.

      (15) Page 6 - "That encode higher spatial frequencies..." This seems intuitive, but it would be nice to give a more intuitive description of how this is related to the determinant of the covariance matrix.

      We refer to the third paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (16) Page 7 - log of both sides... Nf is number of frequencies... Would be good to mention here that they are referring to equation 6 which is only mentioned later in the paragraph.

      As suggested, we now refer to Equation 6 in the updated manuscript. The sentence now reads “This is achieved by maximizing the determinant of the covariance matrix over the within frequency grid cell embeddings of the training data, and Equation 6 is obtained by applying the log on both sides of Theorem 2.1, and in our case where refers to grid cells of a particular frequency.”

      (17) Page 7 - Equation 6 - They should discuss how this is proposed to be implemented in brain circuits.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      18) Page 9 - "egeneralize" - presumably this is a typo?

      Yes. We have corrected it to “generalize” in the updated manuscript.

      (19) Page 9 - "biologically plausible encoding scheme" - This is valid for the grid cell code, but they should be clear that this is not valid for other parts of the model, or specify how other parts of the model such as DPP-A could be biologically plausible.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (20) Page 12 - Figure 7 - comparsion to one-hots or smoothed one-hots. The text should indicate whether the smoothed one-hots are similar to place cell coding. This is the most relevant comparison of coding for those knowledgeable about biological coding schemes.

      Yes, smoothed one-hots are similar to place cell coding. We now mention this in Section 5.3 of the updated manuscript.

      (21) Page 12 - They could compare to a broader range of potential biological coding schemes for the overall space. This could include using coding based on the boundary vector cell coding of the space, band cell coding (one dimensional input to grid cells), or egocentric boundary cell coding.

      We appreciate these useful suggestions, which we now mention as potentially valuable directions for future work in the second paragraph of Section 6 of the updated manuscript.

      (22) Page 13 - "transformers are particularly instructive" - They mention this as a useful comparison, but they might discuss further why a much better function is obtained when attention is applied to the system twice (once by DPP-A and then by a transformer in the inference module).

      We refer to the last paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (23) Page 13 - "Section 5.1 for analogy and Section 5.2 for arithmetic" - it would be clearer if they perhaps also mentioned the specific figures (Figure 4 and Figure 6) presenting the results for the transformer rather than the LSTM.

      We have now rephrased to also refer to the figures in the updated manuscript. The phrase now reads “a transformer (Figure 4 in Section 5.1 for analogy and Figure 6 in Section 5.2 for arithmetic tasks) failed to achieve the same level of OOD generalization as the network that used DPP-A.”

      (24) Page 14 - "statistics of the training data" - The most exciting feature of this paper is that learning during the training space analogies can so effectively generalize to other spaces based on the right attention DPP-A, but this is not really made intuitive. Again, they should illustrate the result of the xT w inner product to demonstrate why this work so effectively!

      We refer to the second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (25) Bibliography - Silver et al., go paper - journal name "nature" should be capitalized. There are other journal titles that should be capitalized. Also, I believe eLife lists family names first.

      We have made the changes to the bibliography of the updated manuscript suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      The goal of the current study was to evaluate the effect of neuronal activity on blood-brain barrier permeability in the healthy brain, and to determine whether changes in BBB dynamics play a role in cortical plasticity. The authors used a variety of well-validated approaches to first demonstrate that limb stimulation increases BBB permeability. Using in vivo-electrophysiology and pharmacological approaches, the authors demonstrate that albumin is sufficient to induce cortical potentiation and that BBB transporters are necessary for stimulus-induced potentiation. The authors include a transcriptional analysis and differential expression of genes associated with plasticity, TGF-beta signaling, and extracellular matrix were observed following stimulation. Overall, the results obtained in rodents are compelling and support the authors' conclusions that neuronal activity modulates the BBB in the healthy brain and that mechanisms downstream of BBB permeability changes play a role in stimulus-evoked plasticity. These findings were further supported with fMRI and BBB permeability measurements performed in healthy human subjects performing a simple sensorimotor task. There is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions and the authors have acknowledged the use of only males as a minor limitation of the study that should be addressed in the future. Future studies should also test whether the upregulation of OAT3 plays a role in cortical plasticity observed following stimulation. Overall, this study provides novel insights into how neurovascular coupling, BBB permeability, and plasticity interact in the healthy brain. 

      Reviewer #2 (Public Review): 

      Summary: 

      This study builds upon previous work that demonstrated that brain injury results in leakage of albumin across the blood brain barrier, resulting in activation of TGF-beta in astrocytes. Consequently, this leads to decreased glutamate uptake, reduced buffering of extracellular potassium and hyperexcitability. This study asks whether such a process can play a physiological role in cortical plasticity. They first show that stimulation of a forelimb for 30 minutes in a rat results in leakage of the blood brain barrier and extravasation of albumin on the contralateral but not ipsilateral cortex. The authors propose that the leakage is dependent upon neuronal excitability and is associated with an enhancement of excitatory transmission. Inhibiting the transport of albumin or the activation of TGF-beta prevents the enhancement of excitatory transmission. In addition, gene expression associated with TGF-beta activation, synaptic plasticity and extracellular matrix are enhanced on the "stimulated" hemisphere. That this may translate to humans is demonstrated by a break down in the blood brain barrier following activation of brain areas through a motor task. 

      Strengths: 

      This study is novel and the results are potentially important as they demonstrate an unexpected break down of the blood brain barrier with physiological activity and this may serve a physiological purpose, affecting synaptic plasticity. 

      The strengths of the study are: 

      (1) The use of an in vivo model with multiple methods to investigate the blood brain barrier response to a forelimb stimulation. 

      (2) The determination of a potential functional role for the observed leakage of the blood brain barrier from both a genetic and electrophysiological view point 

      (3) The demonstration that inhibiting different points in the putative pathway from activation of the cortex to transport of albumin and activation of the TGF-beta pathway, the effect on synaptic enhancement could be prevented.  (4) Preliminary experiments demonstrating a similar observation of activity dependent break down of the blood brain barrier in humans. 

      Weaknesses: 

      The authors adequately addressed most of my points. A few remain: 

      (1) Although the reviewers have addressed the possible effects of anaesthesia on neuro-vascular coupling. They have not mentioned or addressed the possible effects of ketamine (an NMDA receptor antagonist) on synaptic plasticity. Indeed, the low percentage of SEP increase following potentiation (10-20%) could perhaps be explained by partial block of NMDA receptors by ketamine.

      We agree and apologize for this oversight. This important issue is now addressed in the Discussion.

      “Notably, the antagonistic effect of ketamine on NMDA receptors might attenuate the magnitude of SEP potentiation recorded in our experiments (Anis et al., 1983; Salt et al., 1988).”

      (2) The experimental paradigms remain unclear to me. Now, it appears that drugs are applied for 50 minutes and that the stimulation occurs during the "washout period". The more conventional approach would be to have the drug application during the stimulation period to determine if the drugs occlude or enhance the effects of stimulation and then washout the drugs. The problem is that drugs variably washout at different rates depending upon their lipid solubility.

      We agree that the more conventional approach would have been to continue applying the drug throughout the experiment and that differential rates of washout may add variability to our experiments. However, despite this limitation, within each treatment group we found that the SEP response at 50 minutes (immediately after the drug application window) does not differ from SEP response at 80 minutes (after 30 minutes of stimulation and washout) [Figure 3H&G]. This suggests that the drug effects were still present despite terminating drug application and performing potentiation-inducing stimulation. Moreover, our analysis showed that animals within each treatment group (except AP5) had similar SEP responses with little intra-group variability.

      (3) It is still not clear to what extent the experimenters and those doing the analysis were blinded to group. If one or both were blind to group, then please put this in the methods.

      Thank you for this comment. We revised the Methods section to clearly confirm that data was collected and analyzed blindly.  

      Reviewer #3 (Public Review): 

      Summary: 

      This study used prolonged stimulation of a limb to examine possible plasticity in somatosensory evoked potentials induced by the stimulation. They also studied the extent that the blood brain barrier (BBB) was opened by the prolonged stimulation and whether that played a role in the plasticity. They found that there was potentiation of the amplitude and area under the curve of the evoked potential after prolonged stimulation and this was long-lasting (>5 hrs). They also implicated extravasation of serum albumin, caveolae-mediated transcytosis, and TGFb signalling, as well as neuronal activity and upregulation of PSD95. Transcriptomics was done and implicated plasticity related genes in the changes after prolonged stimulation, but not proteins associated with the BBB or inflammation. Next, they address the application to humans using a squeeze ball task. They imaged the brain and suggest that the hand activity led to an increased permeability of the vessels, suggesting modulation of the BBB. 

      Strengths: 

      The strengths of the paper are the novelty of the idea that stimulation of the limb can induce cortical plasticity in a normal condition, and it involves opening of the BBB with albumin entry. In addition, there are many datasets and both rat and human data. 

      Weaknesses: 

      The conclusions are not compelling however because of a lack of explanation of methods.

      In the revised paper, we added a section titled ‘study design’ that presents an overview of the experimental approach.

      The explanation of why prolonged stimulation in the rat was considered relevant to normal conditions should be as clear in the paper as it is in the rebuttal.

      We added a new paragraph to the Discussion section explaining this point as we did in the rebuttal:  

      “Our animal experiments show that a 30 min limb stimulation (at 6Hz and 2mA) increases cross-BBB influx, while a 1 min stimulation (of similar frequency and magnitude) does not. We believe that both types of stimulations fall within the physiological range because our continuous electrophysiological recordings showed no signs of epileptiform or otherwise pathological activity. Moreover, the recorded SEP levels were similar to those reported in previous physiological LTP studies in rats (Eckert & Abraham, 2010; Han et al., 2015; Mégevand et al., 2009) and humans (McGregor et al., 2016). In humans, skill acquisition often involves motor training sessions that last ≥30 minutes (Bengtsson et al., 2005; Classen et al., 1998) and result in physiological plasticity of sensory and motor systems (Classen et al., 1998; Draganski et al., 2004; Sagi et al., 2012). Hence, the experimental task in our human study (30 minutes of repetitive squeezing of an elastic stress-ball) is likely to represent physiological activity, with neuronal activation in primarily motor and sensory areas (Halder et al., 2005). Future human and animal studies are needed to explore the BBB modulating effects of additional stimulation protocols – with varying durations, frequencies, and magnitudes. Such studies may also elucidate the temporal and ultrastructural characteristics that differentiate between physiological and pathological BBB modulation. “

      The authors need to ensure other aspects of the rebuttal are as clear in the paper as in the rebuttal too. 

      Thank you for this comment. This was addressed in the revised paper.

      The only remaining concern that is significant is that it is hard to understand the figures. 

      Thank you for this comment. We revised the figures according to the reviewer’s recommendations. We hope that these changes increase the legibility of the figures. 

      Reviewer #3 (Recommendations For The Authors): 

      The manuscript is improved but there are still suggestions that do not appear to have been addressed. More experiments are not involved in addressing these concerns but one wants the paper to be clarified in terms of what was done. 

      Figures. Please use arrows to point to the effect that the reader should see. Please note what the main point is. 

      Major concerns: 

      Please add explanations, exact p values, and other revisions in the rebuttal to the paper. 

      Rebuttal explanations were added to the paper and p values appear in figure legends.

      Fig 1d shows a seizure-like event which the authors don't think is a seizure because it lacks a depolarization ship. This explanation is not convincing because a LFP would not necessarily show a depolarization ship. Another argument of a discussion of the event as a seizure is warranted. Note that expanding the trace might also show it is unlike a seizure. Regarding the idea that 6Hz 2 mA stimuli for 30 min are physiological, the authors make three arguments which are not clear. First, no epileptiform activity was found, but in Fig. 1 it looks like a seizure occurred. Second, memory and skill acquisition in humans open involve a similar training duration - but what about 6Hz 2 mA?

      Rats are known to rhythmically move their whiskers at frequencies ranging between 5 and 15 Hz (Mégevand et al., 2009). We agree that there is no clear way to justify the similarity between the experimental design in humans and rats. However, we believe that both paradigms (paw stimulation in rats and ball squeeze in humans) represent non-pathological input that we found to modulate barrier permeability. This argument was added to the discussion of the paper:

      “We believe that both types of stimulations fall within the physiological range because in rats, activity between 515 Hz represents physiological rhythmic whisker movement during environment exploration (Mégevand et al., 2009).” 

      Seizures are typically induced in rats via direct tetanic stimulation of the brain (at 50 Hz and 0.3-2.5mA) or maximal electroshock test to the cornea (at 50 Hz and 150 mA) (Swinyard et al., 1952). We, therefore, assert that the activity we observe represents physiological responses and not seizures. This argument is beyond the scope of the current paper. 

      Please note a limitation is that the high level of serum albumin is unlikely to be physiological but may not have been as high in the animal because of the low diffusion rate and degradation (please add the refs in the rebuttal). 

      Thank you, we added the following to the Results section: 

      “The relatively high concentration of albumin was chosen to account for factors that lower its effective tissue concentration such as its low diffusion rate and its likelihood to encounter a degradation site or a cross-BBB efflux transporter (Tao & Nicholson, 1996; Zhang & Pardridge, 2001).”

      Fig. 1. 

      Please consider a box in b to show where the expanded traces in the lower row came from. 

      Thank you for the suggestion. We added lines indicating where the trace excerpts were taken from.

      c. Please use arrows to point to the parts that the authors want the reader to note. In the legend, explain what t is, and delta HbT.

      Thank you. We implemented this suggestion.

      d. It is not clear what the double-sided arrows are meant to show compared to the arrow without two sides. 

      We replaced the two-headed arrow with two single ones.

      e. Please explain what the upward lines at the top signify. What does the red asterisk mean? 

      Thank you. We implemented this suggestion.

      f. Is the reader supposed to note the yellow area? Please make it with an arrow or circle if so. 

      Thank you, we added a white circle to mark the area of tracer accumulation.

      g. Please explain what the permeability index is or reference the part of the paper that does. 

      Further to this suggestion, we added a refence to the appropriate methods section to the legend.

      h. Please use arrows to point to the area of interest. 

      Thank you. We implemented this suggestion.

      m-n. Please mark areas of interest with arrows.  m. the top right two images are unclear. I suggest making them say ipsi inset and contra inset instead of using asterisks. 

      Thank you. We added the ipsi and contra labels to panels in m. The images in panel n represent a phenomenon with no particular region of interest, but rather peri-vascular tracer accumulation along the entire depicted blood vessel. We clarified that panel n represents a separate experiment than panel m: “n. In an animal injected with both EB and NaFlu post stimulation, fluorescence imaging shows extravascular accumulation of both tracers along a cortical small vessel in the stimulated hemisphere.”

      Figure 2. 

      (2) a. Middle. What are the vertical lines at the top? The rebuttal states that was explained in the revised legends but I don't see it. 

      Our apologies. We now included an explanation that “an excerpt of the stimulation trace is shown above the middle LFP trace”.

      c and d are very different field potentials in shape and therefore hard to compare. The rebuttal addresses this but the explanation is not in the revised text. 

      We agree that there is variability in SEP responses between animals. We now added a statement acknowledging this in the methods section: “To overcome potential variability in SEP morphology between animals (Mégevand et al., 2009), each animal’s plasticity measures (max amplitude and AUC of post stimulation SEP) were compared to the same measures at baseline.” 

      In d, it is not clear there is potentiation because the traces are not aligned. 

      All panels depicting SEP traces represent raw data with no alignment. The shift observed in panel d exemplifies why we compare post-stimulation parameters of max amplitude and area under curve to baseline in each animal. 

      Exact P values are said to have been added in the rebuttal but they were not. 

      Exact P values appear in Figure legends.

      (3) b. Use arrows to mark the area of interest. 

      Thank you. We added a white circle to mark the area of tracer accumulation similar to Figure 1f.

      d. Why is there an oscillation superimposed on all traces except CNQX? 

      We agree that this is an interesting question. Future studies should determine the source of this SEP pattern.   

      (4) What does the line and the number 2 mean? How were data normalized? What was counted? What area of cortex?

      The number 2 refers to the scale bar line, meaning a log fold change of 2 reflects the size of the scale bar line. 

      The plot shows the log fold change against the mean count of each gene in the contralateral somatosensory cortex between 1 and 24 hours after stimulation.

      The x axis title was changed to “mean expression” and the legend was modified to:

      “Scatter plot of gene expression from RNA-seq in the contralateral somatosensory cortex 24 vs. 1 h after 30 min stimulation. The y axis represents the log fold change, and the x axis represents the mean expression levels (see methods, RNA Sequencing & Bioinformatics). Blue dots indicate statistically significant differentially expressed genes (DEGs) by Wald Test (n=8 rats per group).”

      How were the pericytes, smooth muscle cells, ,etc. distinguished? 

      This was explained under Methods->RNA Sequencing & Bioinformatics: “Analysis of cell-specific and vascular zonation genes was performed as described (Vanlandewijck et al., 2018), using the database provided in (http://betsholtzlab.org/VascularSingleCells/database.html).”

      What were the chi square statistics? If there were cells used instead of rats, please justify. 

      Thank you. The legend was expanded to include the following:

      “The contralateral somatosensory cortex was found to have a significantly higher number of DEGs related to synaptic plasticity, than the ipsilateral side (***p<0.001, Chi-square).”     

      (5) b. what do the icons mean? 

      We agree that the icons were confusing. We simplified this panel to just show when participants were asked to squeeze the ball (black icon). This explanation was added to the Figure legend.

      Abbreviations? 

      Abbreviations of MRI protocols were added to the figure legend for clarity.

      In c-e what are the units of measure? Fold-change? 

      The units represent t-statistics values for each voxel. The label ‘t-statistic’ was added to the figure.  

      What are the white Iines, + and - signs? 

      The white lines point to voxels of highest activation (t-statistic). This was added to the legend.

      And these are not +/- signs these are voxels with significant activation which only appear similar.

      f. Please explain f and g for clarity. 

      Thank you. The explanation was modified for added clarity.

      Supplemental Fig. 4. 

      Original question: If ipsilateral and contralateral showed many changes why do the authors think the effects were only contralateral? 

      The authors replied: Our gene analysis was designed to complement our in vivo and histological findings, by assessing the magnitude of change in differentially expressed genes (DEGs). This analysis showed that: (1) the hemisphere contralateral to the stimulus has significantly more DEGs than the ipsilateral hemisphere; and (2) the DEGs were related to synaptic plasticity and TGF-b signaling. These findings strengthen the hypothesis raised by our in vivo and histological experiments. 

      Could the authors clarify the answer to the question in the text? 

      Thank you. This section was added to the Discussion. 

      Papers referenced in this letter:

      Anis, N. A., Berry, S. C., Burton, N. R., & Lodge, D. (1983). The dissociative anaesthetics, ketamine and phencyclidine, selectively reduce excitation of central mammalian neurones by N-methyl-aspartate. British Journal of Pharmacology, 79(2), 565–575. hQps://doi.org/10.1111/j.1476-5381.1983.tb11031.x

      Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience, 8(9), 1148–1150. hQps://doi.org/10.1038/nn1516

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. Journal of Neurophysiology, 79(2), 1117–1123. hQps://doi.org/10.1152/JN.1998.79.2.1117/ASSET/IMAGES/LARGE/JNP.JA47F4.JPEG

      Draganski, B., Gaser, C., Busch, V., Schuierer, G., Bogdahn, U., & May, A. (2004). Changes in grey matter induced by training. Nature, 427(6972), 311–312. hQps://doi.org/10.1038/427311a

      Eckert, M. J., & Abraham, W. C. (2010). Physiological effects of enriched environment exposure and LTP induction in the hippocampus in vivo do not transfer faithfully to in vitro slices. Learning and Memory, 17(10), 480–484. hQps://doi.org/10.1101/lm.1822610

      Halder, P., Sterr, A., Brem, S., Bucher, K., Kollias, S., & Brandeis, D. (2005). Electrophysiological evidence for cortical plasticity with movement repetition. European Journal of Neuroscience, 21(8), 2271–2277. hQps://doi.org/10.1111/J.1460-9568.2005.04045.X

      Han, Y., Huang, M. De, Sun, M. L., Duan, S., & Yu, Y. Q. (2015). Long-term synaptic plasticity in rat barrel cortex. Cerebral Cortex, 25(9), 2741–2751. hQps://doi.org/10.1093/cercor/bhu071

      McGregor, H. R., Cashaback, J. G. A., & Gribble, P. L. (2016). Functional Plasticity in Somatosensory Cortex Supports Motor Learning by Observing. Current Biology, 26(7), 921–927. hQps://doi.org/10.1016/j.cub.2016.01.064

      Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M., & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. Journal of Neuroscience, 29(16), 5326– 5335. hQps://doi.org/10.1523/JNEUROSCI.5965-08.2009

      Sagi, Y., Tavor, I., HofsteQer, S., Tzur-Moryosef, S., Blumenfeld-Katzir, T., & Assaf, Y. (2012). Learning in the Fast Lane: New Insights into Neuroplasticity. Neuron, 73(6), 1195–1203. hQps://doi.org/10.1016/j.neuron.2012.01.025

      Salt, T. E., Wilson, D. G., & Prasad, S. K. (1988). Antagonism of N-methylaspartate and synapBc responses of neurones in the rat ventrobasal thalamus by ketamine and MK-801. British Journal of Pharmacology,

      94(2), 443–448. hQps://doi.org/10.1111/j.1476-5381.1988.tb11546.x

      Swinyard, E. A., Brown, W. C., & Goodman, L. S. (1952). Comparative assays of antiepileptic drugs in mice and rats. The Journal of Pharmacology and Experimental Therapeutics, 106(3), 319–330. hQp://jpet.aspetjournals.org/content/106/3/319.abstract

      Tao, L., & Nicholson, C. (1996). Diffusion of albumins in rat cortical slices and relevance to volume transmission. Neuroscience, 75(3), 839–847. hQps://doi.org/10.1016/0306-4522(96)00303-X

      Vanlandewijck, M., He, L., Mäe, M. A., Andrae, J., Ando, K., Del Gaudio, F., Nahar, K., Lebouvier, T., Laviña, B.,

      Gouveia, L., Sun, Y., Raschperger, E., Räsänen, M., Zarb, Y., Mochizuki, N., Keller, A., Lendahl, U., &

      Betsholtz, C. (2018). A molecular atlas of cell types and zonation in the brain vasculature. Nature, 554(7693), 475–480. hQps://doi.org/10.1038/nature25739

      Zhang, Y., & Pardridge, W. M. (2001). Mediated efflux of IgG molecules from brain to blood across the blood– brain barrier. Journal of Neuroimmunology, 114(1–2), 168–172. hQps://doi.org/10.1016/S01655728(01)00242-9

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Dubicka and co-workers on calcification in miliolid foraminifera presents an interesting piece of work. The study uses confocal and electron microscopy to show that the traditional picture of calcification in porcelaneous foraminifera is incorrect.

      Strengths:

      The authors present high-quality images and an original approach to a relatively solid (so I thought) model of calcification.

      Weaknesses:

      There are several major shortcomings. Despite the interesting subject and the wonderful images, the conclusions of this manuscript are simply not supported at all by the results. The fluorescent images may not have any relation to the process of calcification and should therefore not be part of this manuscript. The SEM images, however, do point to an outdated idea of miliolid calcification. I think the manuscript would be much stronger with the focus on the SEM images and with the speculation of the physiological processes greatly reduced.

      We agree that fluorescence studies presented in the paper are not an unequivocal proof by itself for calcification model utilised by studied Miliolida species. However, fluorescence data combined with SEM studies, especially overlap of the elements that show autofluorescence upon excitation at 405 nm (emission 420–480 nm) and acidic vesicles marked by p_H-_sensitive LysoGlow84, may be a hint indicating ACC-bearing vesicles.

      We will tone down the the physiological interpretation based on fluorescence studies in the revised version of the manuscript.

      Nevertheless, we think that our fluorescent life-imaging experiments provides important observations in miliolida, which is scarce in the existing literature, and therefore are worth being presented as they might be very helpful in better understanding of full calcification model in the future.

      Reviewer #2 (Public Review):

      Summary:

      Dubicka et al. in their paper entitled " Biocalcification in porcelaneous foraminifera" suggest that in contrast to the traditionally claimed two different modes of test calcification by rotallid and porcelaneous miliolid formaminifera, both groups produce calcareous tests via the intravesicular mineral precursors (Mg-rich amorphous calcium carbonate). These precursors are proposed to be supplied by endocytosed seawater and deposited in situ as mesocrystals formed at the site of new wall formation within the organic matrix. The authors did not observe the calcification of the needles within the transported vesicles, which challenges the previous model of miliolid mineralization. Although the authors argue that these two groups of foraminifera utilize the same calcification mechanism, they also suggest that these calcification pathways evolved independently in the Paleozoic.

      We do not argue that Miliolida and Rotallida utilize exactly the same calcification mechanism but the both groups use less divergent crystallization pathways, where mesocrystalline chamber walls are created by accumulating and assembling particles of pre-formed liquid amorphous mineral phase.

      Strengths:<br /> The authors document various unknown aspects of calcification of Pseudolachlanella eburnea and elucidate some poorly explained phenomena (e.g., translucent properties of the freshly formed test) however there are several problematic observations/interpretations which in my opinion should be carefully addressed.

      Weaknesses:

      (1) The authors (line 122) suggest that "characteristic autofluorescence indicates the carbonate content of the vesicles (Fig. S2), which are considered to be Mg-ACCs (amorphous MgCaCO3) (Fig. 2, Movies S4 and S5)". Figure S2 which the authors refer to shows only broken sections of organic sheath at different stages of mineralization. Movie S4 shows that only in a few regions some vesicles exhibit red autofluorescence interpreted as Mg-ACC (S5 is missing but probably the authors were referring to S3). In their previous paper (Dubicka et al 2023: Heliyon), the authors used exactly the same methodology to suggest that these are intracellularly formed Mg-rich amorphous calcium carbonate particles that transform into a stable mineral phase in rotaliid Aphistegina lessonii. However, in Figure 1D (Dubicka et al 2023) the apparently carbonate-loaded vesicles show the same red autofluorescence as the test, whereas in their current paper, no evidence of autofluorescence of Mg-ACC grains accumulated within the "gel-like" organic matrix is given. The S3 and S4 movies show circulation of various fluorescing components, but no initial phase of test formation is observable (numerous mineral grains embedded within the o rganic matrix - Figures 3A and B - should be clearly observed also as autofluorescence of the whole layer). Thus the crucial argument supporting the calcification model (Figure 5) is missing.

      This is correct that we did not observe the initial phase of test formation in vivo. Therefore, it is not our crucial argument supporting novel components of the new calcification model. We suspect that vesicles preparing and transporting Mg-ACC are produced way before their docking and deposition into the new wall, because such seawater vesicles were observed between the chamber formation stages (Goleń and Tyszka, 2024, personal communication based on independent experiments on a closely related miliolid taxon). It means that our in vivo experiments most likely represent a long, dynamic stage of vesicles formation via seawater endocytosis, their modification (incl. Mg-ACC formation) before the stage of exocytosis during the new chamber formation. Our crucial arguments supporting the calcification model come from the SEM imaging of the specimens fixed during chamber formation, as well as from the transparency of the new chamber wall during its progressive calcification.

      There is no support for the following interpretation (lines 199-203) "The existence of intracellular, vesicular intermediate amorphous phase (Mg-ACC pools), which supply successive doses of carbonate material to shell production, was supported by autofluorescence (excitation at 405 nm; Fig. 2; Movies S3 and S4; see Dubicka et al., 2023) and a high content of Ca and Mg quantified from the area of cytoplasm by SEM-EDS analysis (Fig. S6)."

      We used laser line 405nm and multiphoton excitaton to detect ACCs. These wavelengths (partly) permeate the shell to excite ACCs autofluorescence. The autofluorescence of the shells is present as well but not clearly visible in movieS4 as the fluorescence of ACCs is stronger. This may be related to the plane/section of the cell which is shown. The laser permeates the shell above the ACCs (short distance) but to excite the shell CaCO3 around foraminifera in the same three-dimensional section where ACCs are shown, the light must pass a thick CaCO3 area due to the three-dimensional structure of the foraminiferan shell. Therefore, the laser light intensity is reduced. In a revised version a movie/image with reduced threshold is shown.

      Author response image 1.

      Autofluorescence image of studied Miliolida species (exc. 405 nm) showing algal chlorophyll (blue) and CaCO3 (red), both ACC and calcite shell.

      It would be very convenient if it was possible to visualize ACC by illumination with a blacklight, but there are very many organic molecules that have an autofluorescence excited by ~405 nm. One of the examples is NADH (Lee et al., 2015. Kor J Physiol Pharmac 19(4): 373-382), an omnipresent molecule in any cell (couldn't copy the appropriate picture here, but the reference has a figure with the em/exc spectra).

      The paper of Lee et al. 2015 shows that the excitation spectrum of NADH is ending close to 400 nm. This means that NADH is not or only very weakly excitable at 405nm, what we used as the excitation laser line. 

      (2) The authors suggest that "no organic matter was detected between the needles of the porcelain structures (Figures 3E; 3E; S4C, and S5A)". Such a suggestion, which is highly unusual considering that biogenic minerals almost by definition contain various organic components, was made based only on FE-SEM observation. The authors should either provide clearcut evidence of the lack of organic matter (unlikely) or may suggest that intense calcium carbonate precipitation within organic matrix gel ultimately results in a decrease of the amount of the organic phase (but not its complete elimination), alike the pure calcium carbonate crystals are separated from the remaining liquid with impurities ("mother liquor"). On the other hand, if (249-250) "organic matrix involved in the biomineralization of foraminiferal shells may contain collagen-like networks", such "laminar" organization of the organic matrix may partly explain the arrangement of carbonate fibers parallel to the surface as observed in Fig. 3E1.

      We agree with the reviewer that biogenic minerals should by definition contain some organic components. We just wrote that "no organic matter was detected between the needles of the porcelain structures” that means that we did not detect any organic structures based only on our FE-SEM observations. We will rephrase this part of the text to avoid further confusion.

      (3) The author's observations indeed do not show the formation of individual skeletal crystallites within intracellular vesicles, however, do not explain either what is the structure of individual skeletal crystallites and how they are formed. Especially, what are the structures observed in polarized light (and interpreted as calcite crystallites) by De Nooijer et al. 2009? The author's explanation of the process (lines 213-216) is not particularly convincing "we suspect that the OM was removed from the test wall and recycled by the cell itself".

      Thank you for this comment. We will do our best to supplement our explanations. We are aware about the structures observed in polarized light by De Nooijer et al. (2009). However, Goleń et al. (2022, Prostist; + 2 other citations) showed that organic polymers may also exhibit light polarization. Additional experimental studies are needed to separate these types of polarization. We will try to investigate this issue in our future research.

      (4) The following passage (lines 296-304) which deals with the concept of mesocrystals is not supported by the authors' methodology or observations. The authors state that miliolid needles "assembled with calcite nanoparticles, are unique examples of biogenic mesocrystals (see Cölfen and Antonietti, 2005), forming distinct geometric shapes limited by planar crystalline faces" (later in the same passage the authors say that "mesocrystals are common biogenic components in the skeletons of marine organisms" (are they thus unique or are they common)? It is my suggestion to completely eliminate this concept here until various crystallographic details of the miliolid test formation are well documented.

      Our intension was to express that mesocrystals are common biogenic components in the skeletons of marine organisms however such a miliolid needles forming distinct geometric shapes limited by planar crystalline faces are unique.

      Reviewer #1 (Recommendations For The Authors):

      Below, I have summarized my main criticisms.

      (1) The movies S1-S4 do not indicate what is described. The videos show indeed seawater (S1), cell membranes (S2), and autofluorescence and acidic vesicles (S3 and S4). The presence of all these intracellular structures is not surprising: any eukaryotic cell will have those. The authors, however, claim that they participate in the process of calcification, which is simply not shown. One of the main arguments seems the presence of 'carbonate pools', in the caption these are even claimed to be 'Mg-ACC pools', but this is by no means revealed by an excitation of 405nm/ emission between 420 and 490 nm. It would be very convenient if it was possible to visualize ACC by illumination with a blacklight, but there are very many organic molecules that have an autofluorescence excited by ~405 nm. One of the examples is NADH (Lee et al., 2015. Kor J Physiol Pharmac 19(4): 373-382), an omnipresent molecule in any cell (couldn't copy the appropriate picture here, but the reference has a figure with the em/exc spectra).

      The paper of Lee et al. 2015 shows that the excitation spectrum of NADH is ending close to 400 nm. This means that NADH is not or only very weakly excitable at 405nm, what we used as the excitation laser line. 

      The fluorescence by this excitation/ emission couple unlikely indicates the vesicles in which these foraminifera calcify. Therefore, most of the interpretation of the authors on what happens with the calcitic needles is not based on results but remains pure speculation.

      The fluorescence autofluorescence upon excitation at 405 nm (emission 420–480 nm is typical for CaCO3 both for biocalcite and amorphous calcium carbonate, what was proven by laboratory synthesis of amorphous calcium carbonate (Dubicka et al., in preparation).

      (2) The results mention 'granules', which are the supposed Mg-ACC-containing vesicles, but the movies simply don't show any granules. Only fluorescence. Again, the results show a lot of vesicles with autofluorescence, but these are not necessarily related to calcification. Proof could be supplied by showing that the same fluorescent vesicles are 'used up' when the specimens under observation are making a new chamber, but until that is done, the fate of all these vesicles remains uncertain and once more, may not be involved in calcification at all.

      We suspect that vesicles preparing and transporting Mg-ACC are produced way before their docking and deposition into the new wall, because such seawater vesicles were observed between the chamber formation stages (Goleń and Tyszka, 2024, personal communication based on independent experiments on a closely related miliolid taxon). It means that our in vivo experiments most likely represent a long, dynamic stage of vesicles formation via seawater endocytosis, their modification (incl. Mg-ACC formation) before the stage of exocytosis during the new chamber formation. Our crucial arguments supporting the calcification model come from the SEM imaging of the specimens fixed during chamber formation, as well as from the transparency of the new chamber wall during its progressive calcification.

      (3) The Methods are unclear. How long were the foraminifers kept before being placed under the microscope? Were they fed with anything? This is important since the chlorophyll should not be from any food source. I didn't know that this foraminiferal species has photosynthetic symbionts: genera like Quinqueloculina don't. Is there any reference for this? Normally, I wouldn't care that much, but the authors find the presence of (facultative) symbionts important (lines 305-336). I am a bit suspicious about this since the only evidence for the presence of photosynthetic symbionts is because of the autofluorescence. As the authors said, commonly these miliolid species are regarded as symbiont-barren, so additional proof for these symbionts is necessary.

      We agree that additional proof is needed for the presence of photosynthetic symbionts. We rephrased the manuscript accordingly.

      (4) It is also unclear (Methods) at what stage the miliolids were photographed (Figure 3). How did chamber formation proceed, what was the timing of the photographs, etc. These pictures are to me the most interesting finding of this study, but need to be described much better.

      All individuals of living foraminifera were fixed at the overall stage of chamber formation. However, every individual presents a complete set of successive steps (substages) of chamber wall calcification fixed at once. Fig. 3A and B present nearly the most proximal (youngest) part of the new chamber with a thick wall of calcite nanograins within a gel-like organic matrix. Fig. 3C and D present a bit more distal (intermediate) part of the calcified chamber. Fig. 3E shows the most distal part of the new chamber. This part is anchored to the older, underlying solid calcified chamber (not shown in this figure). All these steps are synchronous, however, represent gradual successive stages of calcification. The main text and Figs 4 and 5 explain this phenomenon in details.

      There are many small issues with the text too. These include:

      Line 28/29: in many other groups, calcification is thought to be polyphyletic (e.g. sponges: Chombard et al., 1997. Biol Bull 193: 359-367).

      Corrected

      Line 29/30: there may be even more 'types of shells'. The first author has shown in earlier papers that nodosarids have a unique shell architecture. Spirillinids also seem to have their own way of calcification. It is unclear what is meant here by 'two contrasting models'.

      By now there are known only two models of foraminiferal calcification. Lagenida biocalcification has not been studied.

      Line 33: 'Both groups'? This paper only shows calcification in miliolids.

      However, we refer to previous study.

      Line 42: Perhaps, but there is no data on the pseudopodial network in this manuscript.

      We refer to Angell, 1980 studies

      Line 43: Likely, but that is not what this manuscript is showing.

      Line 42-44: The authors should make a choice and be clear. The point of this paper is that miliolids and rotalids calcify in ways that are actually not as different as they seemed previously. Still, they are said to have different 'chamber formation modes'. If they are calcifying in a similar way (which I think is not necessarily supported by the results), isn't calcification in these groups like variations on the same theme? How does this relate to the independent origins of calcification within these two groups?

      Our intension is to show that Miliolida and Rotaliida utilize less divergent calcification pathways, following the recently discovered biomineralization principles.

      Line 49-51: is this a well-established distinction? If so, please add a reference. If not: what is fundamentally different between B and C? Does only the size of the intracellular vesicle matter?

      Rephrased

      Line 60: please include a reference for the intracellular calcification by coccolithophores.

      Added

      Line 67: this is wrong. It is the alignment of the needles at the surface that makes them all reflect light in the same way and gives the shells a porcelaneous appearance. A close-up of the miliolid's shell surface shows this arrangement. Underneath this layer, the orientation of the needles is more random.

      We referred to Johan Hohenegger papers.

      Line 114: how else?

      Line 114-116: I don't see the relevance here. If seawater is taken up, the vesicle containing this seawater has to have a membrane around it. By definition. The text here ('These vesicles') suggests that Calcein and FM1-43 were combined (which they easily could have), but the methods describe that they are used successively.

      Yes, we used two dyes separately.

      Lines 122-130: I think the interpretation of this autofluorescence signal is wrong. Even if it was true, these lines belong to the Discussion.

      This paragraph has been placed within discussion

      Line 138: What are 'mobile clusters'? I don't see a relation between the location of the symbionts and the other vesicles (Figure 2).

      Line 147-148: How can an SEM image show the absence of organic matter?

      We meant the absence of the gel-like OM visible in the previous stages of the chamber formation

      Line 148: Should be 'Figs. 3E; 3E1; S4C'.

      Corrected

      Lines 143-150: this can be merged with the following paragraph.

      Done

      Lines 151-169: why is there no indication of the time? Figures 3 and 4 link the pictures in time to show the development of the growing chamber wall. However, neither here nor in the methods, is there any recording of the time after the beginning of chamber formation. Now, the images are linked (Figure 4) as if they were taken at regular intervals, but this is not documented.

      Lines 170-184: this should go to the Discussion.

      Done

      Line 193-195: this is likely, but not visible in Figure 1.

      It was visible by optical microscopy and described by Angell, 1980

      Line 199-201: I don't understand this: the fluorescent vesicles were not observed during chamber formation so any link between the SEM and CLSM scans remains pure speculation.

      Line 203-204: needed for what?

      For better documentation of Miliolid ACC-bearing granules

      Line 220: is this shown in any of the images? 

      Angell, 1980

      Line 230: It sounds nice, but I don't think a 'paradigm shift' is appropriate here. However interesting and important foraminiferal biomineralization is, the authors show that the crystals of miliolids are likely formed differently than previously thought. If this is a 'paradigm shift', then most scientific findings are.

      In our opinion this is definitely a shift of paradigm

      Line 231: I don't think anyone suggested miliolids and coccolithophores share 'the same' pathway. They are shown (cocco's) and thought (miliolids) to secrete their calcite intracellularly.

      Changed to similar, intracellular

      Line 258: References should only be to peer-reviewed studies.

      Line 430: Burgers'

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      Please separate clearly the results (observations) from the discussion (interpretations): various interpretational/commentary phrases should be removed from the Results section to Discussion e.g., lines 124-130, 131-135.

      Interpretation have been separated from results as suggested by Reviewer.

      [line 49] " living cells have evolved three major skeleton crystallization pathways". I would rather say "organisms" not "cells" as the coordination of the calcification process in multicellular organisms clearly involves processes that are beyond the individual cell activity.

      Corrected

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Original comment: There is no explanation for how this work could be a breakthrough in simulation gregarious feeding as is stated in the manuscript.

      Reviewer response: I think I understand where the authors are trying to take this next step. If the authors were to follow up on this study with the proposed implementation of inhalant/exhalent velocities profiles (or more preferably velocity/pressure fields), then that study would be a breakthrough in simulating such gregarious feeding. Based on what has been done within the present study, I think the term "breakthrough" is instead overly emphatic. An additional note on this. The authors are correct that incorporating additional models could be used to simulation a population (as has been successfully done for several Ediacaran taxa despite computational limitations), but it's not the only way. The authors 1 might explore using periodic boundary conditions on the external faces of the flow domain. This could require only a single Olivooid model to assess gregarious impacts - see the abundant literature of modeling flow through solar array fields.

      We appreciate the reviewer 1 for the suggestion. Modeling gregarious feeding via periodic boundary conditions is surely a practical way with limited computational resources. Modeling flow through solar array fields can also be an inspiring case. However, to realism the simulation of gregarious feeding behavior on an uneven seabed and with irregular organism spatial distribution, just using periodic boundary conditions may not be sufficient (see Author response image 1 for a simple example). We will go on exploring the way of realizing the simulations of large-scale gregarious feeding.

      Author response image 1.

      An example of modeling gregarious feeding behavior on an uneven seabed.

      Original comment: The claim that olivooid-type feeding was most likely a prerequisite transitional form to jet-propelled swimming needs much more support or needs to be tailored to olivooids. This suggests that such behavior is absent (or must be convergent) before olivooids, which is at odds with the increasing quantities of pelagic life (whose modes of swimming are admittedly unconstrained) documented from Cambrian and Neoproterozoic deposits. Even among just medusozoans, ancestral 1 state reconstruction suggests that they would have been swimming during the Neoproterozoic (Kayal et al., 2018; BMC Evolutionary Biology) with no knowledge of the mechanics due to absent preservation. Author response: Thanks for your suggestions. Yes, we agree with you that the ancestral swimming medusae may appear before the early Cambrian, even at the Neoproterozoic deposits. However, discussions on the affinities of Ediacaran cnidarians are severely limited because of the lack of information concerning their soft anatomy. So, it is hard to detect the mechanics due to absent preservation. Olivooids found from the basal Cambrian Kuanchuanpu Formation can be reasonably considered as cnidarians based on their radial symmetry, external features, and especially the internal anatomies (Bengtson and Yue 1997; Dong et al. 2013; 2016; Han et al. 2013; 2016; Liu et al. 2014; Wang et al. 2017; 2020; 2022). The valid simulation experiment here was based on the soft tissue preserved in olivooids.

      Reviewer response: This response does not sufficiently address my earlier comment. While the authors are correct that individual Ediacaran affinities are an area of active research and that Olivooids can reasonably be considered cnidarians, this doesn't address the actual critique in my comment. Most (not all) Ediacaran soft-bodied fossils are considered to have been benthic, but pelagic cnidarian life is widely acknowledged to at least be present during later White Sea and Nama assemblages (and earlier depending on molecular clock interpretations). The authors have certainly provided support for the mechanics of this type of feeding being co-opted for eventual jet propulsion swimming in Olivooids. They have not provided sufficient justifications within the manuscript for this to be broadened beyond this group.

      Thanks for your sincere commentary. We of course agree with the possibility of the emergence of swimming cnidarians before the lowermost Cambrian Fortunian Stage. See lines 16-129: “Ediacaran fossil assemblages with complex ecosystems consist of exceptionally preserved soft-bodied eukaryotes of enigmatic morphology, which their affinities are mostly unresolved (Tarhan et al., 2018, Integrative and Comparative Biology, 58 (4), 688–702; Evans et al., 2022, PNAS, 11(46), e220747511).” Undoubtedly Olivooids belong to cnidarians charactered by their external and internal biological structures. Limited by the fossil records, we could only speculate on the transition from the benthic to the swimming of ancestral cnidarians via the valid fossil preservation, e.g. olivooids. The transition may require processes such as increasing body size, thickening the mesoglea, and degenerating the periderm, etc. And these processes may also evolve independently or comprehensively. Moreover, the ecological behaviors of the ancestral cnidarians may evolve independently at different stages from Ediacaran to Cambrian. We therefore could not provide more sufficient justifications beyond olivooids.

      Original comment: L446: two layers of hexahedral elements is a very low number for meshing boundary layer flow

      Reviewer response: As the authors point out in the main text, these organisms are small (millimeters in scale) and certainly lived within the boundary layer range of the ocean. While the boundary layer is not the main point, it still needs to be accurately resolved as it should certainly affect the flow further towards the far field at this scale. I'm not suggesting the authors need to perfectly resolve the boundary layer or focus on using turbulence models more tailored to boundary layer flows (such as k-w), but the flow field still needs sufficient realism for a boundary bounded flow. The authors really should consider quantitatively assessing the number of hexahedral elements within their mesh refinement study.

      To address this concern, we run another four simulations based on mesh4 within our mesh refinement study to assess the number of hexahedral elements (five layers and eight layers of hexahedral elements with different thickness of boundary layer mesh (controlled by thickness adjustment factor), respectively). the results had been supplemented to Table supplement 2. As shown in the results, the number of layers of hexahedral elements seems does not significant influence the result, but the thickness of boundary layer mesh can influence the maximum flow velocity of the contraction phase. However, the results of all the simulations were generally consistent, as shown in Author response image 2. The description of the results above were added to section “Mesh sensitivity analysis”.

      Author response image 2.

      Results of mesh refinement study of different boundary layer mesh parameters.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Gating of Kv10 channels is unique because it involves coupling between non-domain swapped voltage sensing domains, a domain-swapped cytoplasmic ring assembly formed by the N- and C-termini, and the pore domain. Recent structural data suggests that activation of the voltage sensing domain relieves a steric hindrance to pore opening, but the contribution of the cytoplasmic domain to gating is still not well understood. This aspect is of particular importance because proteins like calmodulin interact with the cytoplasmic domain to regulate channel activity. The effects of calmodulin (CaM) in WT and mutant channels with disrupted cytoplasmic gating ring assemblies are contradictory, resulting in inhibition or activation, respectively. The underlying mechanism for these discrepancies is not understood. In the present manuscript, Reham Abdelaziz and collaborators use electrophysiology, biochemistry and mathematical modeling to describe how mutations and deletions that disrupt inter-subunit interactions at the cytoplasmic gating ring assembly affect Kv10.1 channel gating and modulation by CaM. In the revised manuscript, additional information is provided to allow readers to identify within the Kv10.1 channel structure the location of E600R, one of the key channel mutants analyzed in this study. However, the mechanistic role of the cytoplasmic domains that this study focuses on, as well as the location of the ΔPASCap deletion and other perturbations investigated in the study remain difficult to visualize without additional graphical information. This can make it challenging for readers to connect the findings presented in the study with a structural mechanism of channel function.

      The authors focused mainly on two structural perturbations that disrupt interactions within the cytoplasmic domain, the E600R mutant and the ΔPASCap deletion. By expressing mutants in oocytes and recording currents using Two Electrode Voltage-Clamp (TEV), it is found that both ΔPASCap and E600R mutants have biphasic conductance-voltage (G-V) relations and exhibit activation and deactivation kinetics with multiple voltage-dependent components. Importantly, the mutant-specific component in the G-V relations is observed at negative voltages where WT channels remain closed. The authors argue that the biphasic behavior in the G-V relations is unlikely to result from two different populations of channels in the oocytes, because they found that the relative amplitude between the two components in the G-V relations was highly reproducible across individual oocytes that otherwise tend to show high variability in expression levels. Instead, the G-V relations for all mutant channels could be well described by an equation that considers two open states O1 and O2, and a transition between them; O1 appeared to be unaffected by any of the structural manipulations tested (i.e. E600R, ΔPASCap, and other deletions) whereas the parameters for O2 and the transition between the two open states were different between constructs. The O1 state is not observed in WT channels and is hypothesized to be associated with voltage sensor activation. O2 represents the open state that is normally observed in WT channels and is speculated to be associated with conformational changes within the cytoplasmic gating ring that follow voltage sensor activation, which could explain why the mutations and deletions disrupting cytoplasmic interactions affect primarily O2. 

      Severing the covalent link between the voltage sensor and pore reduced O1 occupancy in one of the deletion constructs. Although this observation is consistent with the hypothesis that voltage-sensor activation drives entry into O1, this result is not conclusive. Structural as well as functional data has established that the coupling of the voltage sensor and pore does not entirely rely on the S4-S5 covalent linker between the sensor and the pore, and thus the severed construct could still retain coupling through other mechanisms, which is consistent with the prominent voltage dependence that is observed. If both states O1 and O2 require voltage sensor activation, it is unclear why the severed construct would affect state O1 primarily, as suggested in the manuscript, as opposed to decreasing occupancy of both open states. In line with this argument, the presence of Mg2+ in the extracellular solution affected both O1 and O2. This finding suggests that entry into both O1 and O2 requires voltage-sensor activation because Mg2+ ions are known to stabilize the voltage sensor in its most deactivated conformations. 

      We agree with the reviewer that access to both states requires a conformational change in the voltage sensor. This was stated in our revised article: “In contrast, to enter O2, all subunits must complete both voltage sensor transitions and the collective gating ring transition.” We interpret the two gating steps as sequential; the effective rotation of the intracellular ring would happen only once the sensor is in its fully activated position.

      We also agree that the S4-S5 segment cannot be the only interaction mechanism, as we demonstrated in our earlier work (Lörinczi et al., 2015; Tomczak et al., 2017).  

      Activation towards and closure from O1 is slow, whereas channels close rapidly from O2. A rapid alternating pulse protocol was used to take advantage of the difference in activation and deactivation kinetics between the two open components in the mutants and thus drive an increasing number of channels towards state O1. Currents activated by the alternating protocol reached larger amplitudes than those elicited by a long depolarization to the same voltage. This finding is interpreted as an indication that O1 has a larger macroscopic conductance than O2. In the revised manuscript, the authors performed single-channel recordings to determine why O1 and O2 have different macroscopic conductance. The results show that at voltages where the state O1 predominates, channels exhibited longer open times and overall higher open probability, whereas at more depolarized voltages where occupancy of O2 increases, channels exhibited more flickery gating behavior and decreased open probability. These results are informative but not conclusive because additional details about how experiments were conducted, and group data analysis are missing. Importantly, results showing inhibition of single ΔPASCap channels by a Kv10-specific inhibitor are mentioned but not shown or quantitated - these data are essential to establish that the new O1 conductance indeed represents Kv10 channel activity.

      We observed the activity of a channel compatible with Kv10.1 ΔPAS-Cap (long openings at low-moderate potentials, very short flickery activity at strong depolarizations) in 12 patches from oocytes obtained from different frog operations over a period of two and a half months once the experimental conditions could be established. As stated in the text, we did not proceed to generate amplitude histograms because we could not resolve clear single-channel events at strong depolarizations. Astemizole abolished the activity and (remarkably) strongly reduced the noise in traces at strong depolarizations, which we interpret as partially caused by flicker openings.

      Author response image 1.

      We include two example recordings of Astemizole application (100µM) on two different patches. Both recordings are performed at -60 mV (to decrease the likelihood that the channel visits O2) with 100 mM internal and 60 mM external K+. In both cases, the traces in Astemizole are presented in red.

      It is shown that conditioning pulses to very negative voltages result in mutant channel currents that are larger and activate more slowly than those elicited at the same voltage but starting from less negative conditioning pulses. In voltage-activated curves, O1 occupancy is shown to be favored by increasingly negative conditioning voltages. This is interpreted as indicating that O1 is primarily accessed from deeply closed states in which voltage sensors are in their most deactivated position. Consistently, a mutation that destabilizes these deactivated states is shown to largely suppress the first component in voltage-activation curves for both ΔPASCap and E600R channels.

      The authors then address the role of the hidden O1 state in channel regulation by calmodulation. Stimulating calcium entry into oocytes with ionomycin and thapsigarging, assumed to enhance CaM-dependent modulation, resulted in preferential potentiation of the first component in ΔPASCap and E600R channels. This potentiation was attenuated by including an additional mutation that disfavors deeply closed states. Together, these results are interpreted as an indication that calcium-CaM preferentially stabilizes deeply closed states from which O1 can be readily accessed in mutant channels, thus favoring current activation. In WT channels lacking a conducting O1 state, CaM stabilizes deeply closed states and is therefore inhibitory. It is found that the potentiation of ΔPASCap and E600R by CaM is more strongly attenuated by mutations in the channel that are assumed to disrupt interaction with the C-terminal lobe of CaM than mutations assumed to affect interaction with the N-terminal lobe. These results are intriguing but difficult to interpret in mechanistic terms. The strong effect that calcium-CaM had on the occupancy of the O1 state in the mutants raises the possibility that O1 can be only observed in channels that are constitutively associated with CaM. To address this, a biochemical pull-down assay was carried out to establish that only a small fraction of channels are associated with CaM under baseline conditions. These CaM experiments are potentially very interesting and could have wide physiological relevance. However, the approach utilized to activate CaM is indirect and could result in additional nonspecific effects on the oocytes that could affect the results.

      Finally, a mathematical model is proposed consisting of two layers involving two activation steps for the voltage sensor, and one conformational change in the cytoplasmic gating ring - completion of both sets of conformational changes is required to access state O2, but accessing state O1 only requires completion of the first voltage-sensor activation step in the four subunits. The model qualitatively reproduces most major findings on the mutants. Although the model used is highly symmetric and appears simple, the mathematical form used for the rate constants in the model adds a layer of complexity to the model that makes mechanistic interpretations difficult. In addition, many transitions that from a mechanistic standpoint should not depend on voltage were assigned a voltage dependence in the model. These limitations diminish the overall usefulness of the model which is prominently presented in the manuscript. The most important mechanistic assumptions in the model are not addressed experimentally, such as the proposition that entry into O1 depends on the opening of the transmembrane pore gate, whereas entry into O2 involves gating ring transitions - it is unclear why O2 would require further gating ring transitions to conduct ions given that the gating ring can already support permeation by O1 without any additional conformational changes.

      In essence, we agree with the reviewer; we already have addressed these points in our revised article:

      Regarding the voltage dependence we write “the κ/λ transition could reasonably be expected to be voltage independent because we related it to ring reconfiguration, a process that should occur as a consequence of a prior VSD transition. We have made some attempts to treat this transition as voltage independent but state-specific with upper-layer bias for states on the right and lower-layer bias for states on the left. This is in principle possible, as can already be gleaned from the similar voltage ranges of the left-right transition (α/β) and the κL/λ transition. However, this approach leads to a much larger number of free, less well constrained kinetic parameters and drastically complicated the parameter search. ” As you can see, we also formulated a strategy to free the model of the potentially spurious voltage dependence and (in bold here) explained why we did not follow this route in this study. 

      Regarding the need for gating ring transitions after O1, we wrote, “Thus, the underlying gating events can be separated into two steps: The first gating step involves only the voltage sensor without engaging the ring and leads to a pre-open state, which is non-conducting in the WT but conducting in our mutants. The second gating event operates at higher depolarizations, involves a change in the ring, and leads to an open state both in WT and in the mutants. ” 

      We interpret your statements such that you expect the conducting state to remain available once O1 is reached. However, the experimental evidence speaks against that the pore availability remains regardless of the further gating steps beyond O1. The description of model construction is informative here: “... we could exclude many possible [sites at which O1 connects to closed states] because the attachment site must be sufficiently far away from the conventional open state [O2]. Otherwise, the transition from "O1 preferred" to "O2 preferred" via a few closed intermediate states is very gradual and never produces the biphasic GV curves [that we observed]. ” 

      In other words, voltage-dependent gating steps beyond the state that offers access to O1 appear to close the pore, after it was open. That might occur because only then (for states in which at least one voltage sensor exceeded the intermediate position) the ring is fixed in a particular state until all sensors completed activation. In the WT, closing the pore in deactivated states might rely on an interaction that is absent in the mutant because, at least in HERG: “the interaction between the PAS domain and the C-terminus is more stable in closed than in open KV11.1 (HERG) channels, and a single chain antibody binding to the interface between PAS domain and CNBHD can access its epitope in open but not in closed channels, strongly supporting a change in conformation of the ring during gating ”

      Reviewer #3 (Public Review):

      In the present manuscript, Abdelaziz and colleagues interrogate the gating mechanisms of Kv10.1, an important voltage-gated K+ channel in cell cycle and cancer physiology. At the molecular level, Kv10.1 is regulated by voltage and Ca-CaM. Structures solved using CryoEM for Kv10.1 as well as other members of the KCNH family (Kv11 and Kv12) show channels that do not contain a structured S4-S5 linker imposing therefore a non-domain swapped architecture in the transmembrane region. However, the cytoplasmatic N- and C- terminal domains interact in a domain swapped manner forming a gating ring. The N-terminal domain (PAS domain) of one subunit is located close to the intracellular side of the voltage sensor domain and interacts with the C-terminal domain (CNBHD domain) of the neighbor subunit. Mutations in the intracellular domains has a profound effect in the channel gating. The complex network of interactions between the voltage-sensor and the intracellular domains makes the PAS domain a particularly interesting domain of the channel to study as responsible for the coupling between the voltage sensor domains and the intracellular gating ring.

      The coupling between the voltage-sensor domain and the gating ring is not fully understood and the authors aim to shed light into the details of this mechanism. In order to do that, they use well established techniques such as site-directed mutagenesis, electrophysiology, biochemistry and mathematical modeling. In the present work, the authors propose a two open state model that arises from functional experiments after introducing a deletion on the PAS domain (ΔPAS Cap) or a point mutation (E600R) in the CNBHD domain. The authors measure a bi-phasic G-V curve with these mutations and assign each phase as two different open states, one of them not visible on the WT and only unveiled after introducing the mutations.

      The hypothesis proposed by the authors could change the current paradigm in the current understanding for Kv10.1 and it is quite extraordinary; therefore, it requires extraordinary evidence to support it.

      STRENGTHS: The authors use adequate techniques such as electrophysiology and sitedirected mutagenesis to address the gating changes introduced by the molecular manipulations. They also use appropriate mathematical modeling to build a Markov model and identify the mechanism behind the gating changes.

      WEAKNESSES: The results presented by the authors do not fully support their conclusions since they could have alternative explanations. The authors base their primary hypothesis on the bi-phasic behavior of a calculated G-V curve that do not match the tail behavior, the experimental conditions used in the present manuscript introduce uncertainties, weakening their conclusions and complicating the interpretation of the results. Therefore, their experimental conditions need to be revisited. 

      We respectfully disagree. We think that your suggestions for alternative explanations are addressed in the current version of the article. We will rebut them once more below, but we feel the need to point out that our arguments are already laid out in the revised article.

      I have some concerns related to the following points:

      (1) Biphasic gating behavior

      The authors use the TEVC technique in oocytes extracted surgically from Xenopus Leavis frogs. The method is well established and is adequate to address ion channel behavior. The experiments are performed in chloride-based solutions which present a handicap when measuring outward rectifying currents at very depolarizing potentials due to the presence of calcium activated chloride channel expressed endogenously in the oocytes; these channels will open and rectify chloride intracellularly adding to the outward rectifying traces during the test pulse. The authors calculate their G-V curves from the test pulse steady-state current instead of using the tail currents. The conductance measurements are normally taken from the 'tail current' because tails are measured at a fix voltage hence maintaining the driving force constant. 

      We respectfully disagree. In contrast to other channels, like HERG, a common practice for Kv10 is not to use tail currents. It is long known that in this channel, tail currents and test-pulse steady-state currents can appear to be at odds because the channels deactivate extremely rapidly, at the border of temporal resolution of the measurements and with intricate waveforms. This complicates the estimation of the instantaneous tail current. Therefore, the outward current is commonly used to estimate conductance (Terlau et al., 1996; Schönherr et al., 1999; Schönherr et al., 2002; Whicher and MacKinnon, 2019), while the latter authors also use the extreme of the tail for some mutants.

      Due to their activation at very negative voltage, the reversal potential in our mutants can be measured directly; we are, therefore, more confident with this approach. Nevertheless, we have determined the initial tail current in some experiments. The behavior of these is very similar to the average that we present in Figure 1. The biphasic behavior is unequivocally present.

      Author response image 2.

      Calculating the conductance from the traces should not be a problem, however, in the present manuscript, the traces and the tail currents do not agree. 

      The referee’s observation is perfectly in line with the long-standing experience of several labs working with KV10: tail current amplitudes in KV10 appear to be out of proportion for the WT open state (O2). Importantly, this is due to the rapid closure, which is not present in O1. As a consequence, the initial amplitude of tail currents from O1 are easier to estimate correctly, and they are much more obvious in the graphs. Taken together, these differences between O1 and O2 explain the misconception the reviewer describes next.

      The tail traces shown in Fig1E do not show an increasing current amplitude in the voltage range from +50mV to +120mV, they seem to have reached a 'saturation state', suggesting that the traces from the test pulse contain an inward chloride current contamination. 

      As stated in the text and indicated in Author response image 3, the tail currents In Figure 1E increase in amplitude between +50 and +120 mV, as can be seen in the examples below from different experiments (+50 is presented in black, +120 in red). As stated above, the increase is not as evident as in traces from other mutants because the predominance of O2 also implies a much faster deactivation.

      Author response image 3. 

      We are aware that Ca2+-activated Cl- currents can represent a problem when interpreting electrophysiological data in oocytes. In fact, we show in Supplement 1 to Figure 8 that this can be the case during the Ca2+-CaM experiments, where the increase in Ca2+ would certainly augment Cl- contribution to the outward current. This is why we performed these experiments in Cl--free solutions. As we show in Figure 8, the biphasic behavior was also present in those experiments. 

      Importantly, Cl- free bath solutions would not correct contamination during the tail, since this would correspond to Cl- exiting the oocyte. Yet, if there would be contamination of the outward currents by Cl-, one would expect it to increase with larger depolarizations as the typical Ca2+activated Cl- current in oocytes does. As the reviewer states, this does not seem to be the case.

      In addition, this second component identified by the authors as a second open state appears after +50mV and seems to never saturate. The normalization to the maximum current level during the test pulse, exaggerates this second component on the calculated G-V curve. 

      We agree that this second component continues to increase; the reviewer brought this up in the first review, and we have already addressed this in our reply and in the discussion of the revised version: “This flicker block might also offer an explanation for a feature of the mutant channels, that is not explained in the current model version: the continued increase in current amplitude, hundreds of milliseconds into a strong depolarization (Supp. 4 to Fig. 9). If the relative stability of O2 and C2 continued to change throughout depolarization, such a current creep-up could be reproduced. However, this would require either the introduction of further layers of On ↔Cn states, or a non-Markovian modification of the model’s time evolution.” With non-Markovian, we mean a Langevin-type diffusive process. 

      It's worth noticing that the ΔPASCap mutant experiments on Fig 5 in Mes based solutions do not show that second component on the G-V.

      For the readers of this conversation, we would like to clarify that the reviewer likely refers to experiments shown in Fig. 5 of the initial submission but shown in Fig. 6 of the revised version (“Hyperpolarization promotes access to a large conductance, slowly activating open state.” Fig. 5 deals with single channels). We agree that these data look different, but this is because the voltage protocols are completely different (compare Fig. 6A (fixed test pulse, varied prepulse) and Fig. 2A (varied test pulse, fixed pre-pulse). Therefore, no biphasic behavior is expected. 

      Because these results are the foundation for their two open state hypotheses, I will strongly suggest the authors to repeat all their Chloride-based experiments in Mes-based solutions to eliminate the undesired chloride contribution to the mutants current and clarify the contribution of the mutations to the Kv10.1 gating.

      In summary, we respectfully disagree with all concerns raised in point (1). Our detailed arguments rebutting them are given above, but there is a more high-level concern about this entire exchange: the referee casts doubt on observations that are not new. Several labs have reported for a group of mutant KCNH channels: non-monotonic voltage dependence of activation (see, e.g., Fig. 6D in Zhao et al., 2017), multi-phasic tail currents (see e.g. Fig. 4A in Whicher and MacKinnon, 2019, in CHO cells where Cl- contamination is not a concern), and activation by high [Ca2+]i (Lörinczi et al., 2016). Our study replicates those observations and hypothesizes that the existence of an additional conducting state can alone explain all previously unexplained observations. We highlight the potency of this hypothesis with a Markov model that qualitatively reproduces all phenomena. We not only factually disagree with the individual points raised, but we also think that they don't touch on the core of our contribution

      (2) Two step gating mechanism.

      The authors interpret the results obtained with the ΔPASCap and the E600R as two step gating mechanisms containing two open states (O1 and O2) and assign them to the voltage sensor movement and gating ring rotation respectively. It is not clear, however how the authors assign the two open states.

      The results show how the first component is conserved amongst mutations; however, the second one is not. The authors attribute the second component, hence the second open state to the movement of the gating ring. This scenario seems unlikely since there is a clear voltagedependence of the second component that will suggest an implication of a voltage-sensing current.

      We do not suggest that the gating ring motion is not voltage dependent. We would like to point out that voltage dependence can be conveyed by voltage sensor coupling to the ring; this is the widely accepted theory of how the ring can be involved. Should the reviewer mean it in a narrow sense, that the model should be constructed such that all voltage-dependent steps occur before and independently of ring reconfiguration and that only then an additional step that reflects the (voltage-independent) reconfiguration solely, we would like to point the reviewer to the article, where we write: “the κ/λ transition could reasonably be expected to be voltage independent because we related it to ring reconfiguration, a process that should occur as a consequence of a prior VSD transition. We have made some attempts to treat this transition as voltage independent but state-specific with upper-layer bias for states on the right and lower-layer bias for states on the left. This is in principle possible, as can already be gleaned from the similar voltage ranges of the left-right transition (α/β) and the κL/λ transition. However, this approach leads to a much larger number of free, less well constrained kinetic parameters and drastically complicated the parameter search. ” As you can see, we also formulated a strategy to free the model from the potentially spurious voltage dependence and (in bold here) explained why we did not follow this route in this study. 

      The split channel experiment is interesting but needs more explanation. I assume the authors expressed the 2 parts of the split channel (1-341 and 342-end), however Tomczak et al showed in 2017 how the split presents a constitutively activated function with inward currents that are not visible here, this point needs clarification.

      As stated in the panel heading, the figure legend, and the main text, we did not use 1-341 and 342-end as done in Tomczak et al. Instead, “we compared the behavior of ∆2-10 and ∆210.L341Split,”. Evidently, the additional deletion (2-10) causes a shift in activation that explains the difference you point out. However, as we do not compare L341Split and ∆210.L341Split but ∆2-10 and ∆2-10.L341Split, our conclusion remains that “As predicted, compared to ∆2-10, ∆2-10.L341Split showed a significant reduction in the first component of the biphasic GV (Fig. 2C, D).” Remarkably, the behavior of the ∆3-9 L341Split described in Whicher and MacKinnon, 2019 (Figure 5) matches that of our ∆2-10 L341Split, which we think reinforces our case.

      Moreover, the authors assume that the mutations introduced uncover a new open state, however the traces presented for the mutations suggest that other explanations are possible. Other gating mechanisms like inactivation from the closed state, can be introduced by the mutations. The traces presented for ΔPASCap but specially E600R present clear 'hooked tails', a direct indicator of a populations of inactive channels during the test pulse that recover from inactivation upon repolarization (Tristani-Firouzi M, Sanguinetti MC. J Physiol. 1998). 

      There is a possibility that we are debating nomenclature here. In response to the suggestion that all our observations could be explained by inactivation, we attempted a disambiguation of terms in the reply and the article. As the argument is brought up again without reference to our clarification attempts, we will try to be more explicit here:

      If, starting from deeply deactivated states, an open state is reached first, and then, following further activation steps, closed states are reached, this might be termed “inactivation”. In such a reading, our model features many inactivated states. The shortest version of such a model is C-O-I. It is for instance used by Raman and Bean (2001; DOI: 10.1016/S00063495(01)76052-3) to explain NaV gating in Purkinje neurons. If “inactivation” is meant in the sense that a gating transition exists, which is orthogonal to an activation/deactivation axis, and that after this orthogonal transition, an open state cannot be reached anymore, then all of the upper floor in our model is inactivated with respect to the open state O1. Finally, the state C2 is an inactivated state to O2. In this view, “inactivation” explains the observed phenomena. 

      However, we must disagree if the referee means that a parsimonious explanation exists in which a single conducting state is the only source for all observed currents.   

      There is a high-level reason: we found a single assumption that explains three different phenomena, while the inactivation hypothesis with one conducting state cannot explain one of them (the increase of the first component under raised CaM). But there is also a low-level reason: the tails in Tristani-Firouzi and Sanguinetti 1998 are fundamentally different from what we report herein in that they lack a third component. Thus, those tails are consistent with recovery from inactivation through a single open state, while a three-component tail is not. In the framework of a Markov model, the time constants of transitions from and to a given state (say O2), cannot change unless the voltage changes. During the tail current, the voltage does not change, yet we observe: 

      i) a rapid decrease with a time constant of at most a few milliseconds (Fig 9 S2, 1-> 2),  ii) a slow increase in current, peaking after approximately 25 milliseconds and iii) a relaxation to zero current with a time constant of >50 ms. 

      According to the reviewer’s suggestion, these processes on three timescales should all be explained by depopulating and repopulating the same open state while all rates are constant. There might well be a complicated multi-level state diagram with a single open state with different variants, like (open and open inactivated) that could produce triphasic tails with these properties if the system had not reached a steady state distribution at the end of the test pulse. It cannot, however, achieve it from an equilibrated system, and certainly, it cannot at the same time produce “biphasic activation” and “activation by CaM”. 

      The results presented by the authors can be alternatively explained with a change in the equilibrium between the close to inactivated/recovery from inactivation to the open state. 

      Again, we disagree. The model construction explains in detail that the transition from the first to the second phase is not gradual. Shifting equilibria cannot reproduce this. We have extensively tested that idea and can exclude this possibility.

      Finally, the authors state that they do not detect "cumulative inactivation after repeated depolarization" but that is considering inactivation only from the open state and ignoring the possibility of the existence of close state inactivation or, that like in hERG, that the channel inactivates faster that what it activates (Smith PL, Yellen G. J Gen Physiol. 2002). 

      We respectfully disagree. We explicitly model an open state that inactivates faster (O2->C2) than it activates. Once more, this is stated in the revised article, which we point to for details. Again, this alternative mechanism does not have the potential to explain all three effects. As discussed above about the chloride contamination concerns, this inactivation hypothesis was mentioned in the first review round and, therefore, addressed in our reply and the revised article. We also explained that “inactivation” has no specific meaning in Markov models. In the absence of O1, all transitions towards the lower layer are effectively “inactivation from closed states”, because they make access to the only remaining open state less likely”. But this is semantics. What is relevant is that no network of states around a single open state can reproduce the three effets in a more parsimonious way than the assumption of the second open state does.

      (3) Single channel conductance.

      The single channels experiments are a great way to assess the different conductance of single channel openings, unfortunately the authors cannot measure accurately different conductances for the two proposed open states. The Markov Model built by the authors, disagrees with their interpretation of the experimental results assigning the exact same conductance to the two modeled open states. To interpret the mutant data, it is needed to add data with the WT for comparison and in presence of specific blockers. 

      We respectfully disagree. As previously shown, the conductance of the flickering wild-type open state is very difficult to resolve. Our recordings do not show that the two states have different single-channel conductances, and therefore the model assumes identical singlechannel conductance. 

      The important point is that the single-channel recordings clearly show two different gating modes associated with the voltage ranges in which we predict the two open states. One has a smaller macroscopic current due to rapid flickering (aka “inactivation”). These recordings are another proof of the existence of two open states because the two gating modes occur.  Wild-type data can be found in Bauer and Schwarz, (2001, doi:10.1007/s00232-001-0031-3) or Pardo et al., (1998, doi:10.1083/jcb.143.3.767) for comparison.

      We appreciate the effort editors and reviewers invested in assessing the revised manuscript. Yet, we think that the demanded revision of experimental conditions and quantification methods contradicts the commonly accepted practice for KV10 channels. Some of the reviewer comments are skeptical about the biphasic behavior, which is an established and replicated finding for many mutants and by many researchers. The alternative explanations for these disbelieved findings are either “semantics” or cannot quantitatively explain the measurements. Therefore, only the demand for more explanations and unprecedented resolution in singlechannel recordings remains. We share these sentiments.

      ———— The following is the authors’ response to the original reviews.

      (1) The authors must show that the second open state is not just an artifact of endogenous activity but represents the activity of the same EAG channels. I suggest that the authors repeat these experiments in Mes-based solutions. 

      (2) Along the same lines, it is necessary to show that these currents can be blocked using known EAG channel blockers such as astemizole. Ultimately, it will be important to demonstrate using single-channel analysis that these do represent two distinct open states separated by a closed state. 

      We have addressed these concerns using several approaches. The most substantial change is the addition of single-channel recordings on ΔPASCap. In those experiments, we could provide evidence of the two types of events in the same patch, and the presence of an outward current at -60 mV, 50 mV below the equilibrium potential for chloride. The channels were never detected in uninjected oocytes, and Astemizole silenced the activity in patches containing multiple channels. These observations, together with the maintenance of the biphasic behavior that we interpret as evidence of the presence of O1 in methanesulfonate-based solutions, strongly suggest that both O1 and O2 obey the expression of KV10.1 mutants.

      (3) Currents should be measured by increasing the pulse lengths as needed in order to obtain the true steady-state G-V curves. 

      We agree that the endpoint of activation is ill-defined in the cases where a steady-state is not reached. This does indeed hamper quantitative statements about the relative amplitude of the two components. However, while the overall shape does change, its position (voltage dependence) would not be affected by this shortcoming. The data, therefore, supports the claim of the “existence of mutant-specific O1 and its equal voltage dependence across mutants.”

      (4) A more clear and thorough description should be provided for how the observations with the mutant channels apply to the behavior of WT channels. How exactly does state O1 relate to WT behavior, and how exactly do the parameters of the mathematical model differ between WT and mutants? How can this be interpreted at a structural level? What could be the structural mechanism through which ΔPASCap and E600R enable conduction through O1? It seems contradictory that O1 would be associated exclusively with voltage-sensor activation and not gating ring transitions, and yet the mutations that enable cation access through O1 localize at the gating ring - this needs to be better clarified. 

      We have undertaken a thorough rewriting of all sections to clarify the structural correlates that may explain the behavior of the mutants. In brief, we propose that when all four voltage sensors move towards the extracellular side, the intracellular ring maintains the permeation path closed until it rotates. If the ring is altered, this “lock” is incompetent, and permeation can be detected (page 34). By fixing the position of the ring, calmodulin would preclude permeation in the WT and promote the population of O1 in the mutants.

      (5) Rather than the t80% risetime, exponential fits should be performed to assess the kinetics of activation. 

      We agree that the assessment of kinetics by a t80% is not ideal. We originally refrained from exponential fits because they introduce other issues when used for processes that are not truly exponential (as is the case here). We had planned to perform exponential fits in this revised version, but because the activation process is not exponential, the time constants we could provide would not be accurate, and the result would remain qualitative as it is now. In the experiments where we did perform the fits (Fig. 3), the values obtained support the statement made. 

      (6) It is argued based on the G-V relations in Figure 2A that none of the mutations or deletions introduced have a major effect on state O1 properties, but rather affect state O2. However, the occupancy of state O2 is undetermined because activation curves do not reach saturation. It would be interesting to explore the fitting parameters on Fig.2B further to test whether the data on Fig 2A can indeed only be described by fits in which the parameters for O1 remain unchanged between constructs. 

      We agree that the absolute occupancy of O2 cannot be properly determined if a steady state is not reached. This is, however, a feature of the channel. During very long depolarizations in WT, the current visually appears to reach a plateau, but a closer look reveals that the current keeps increasing after very long depolarizations (up to 10 seconds; see, e.g., Fig. 1B in Garg et al., 2013, Mol Pharmacol 83, 805-813. DOI: 10.1124/mol.112.084384). Interestingly, although the model presented here does not account for this behavior, we propose changes in the model that could. “If the relative stability of O2 and C2 continued to change throughout the depolarization such a current creep-up could be reproduced. However, this would require either the introduction of further layers of On↔Cn states or a non-Markovian modification of the model’s evolution.” Page 34.

      (7) The authors interpret the results obtained with the mutants DPASCAP and E600R -tested before by Lorinczi et al. 2016, to disrupt the interactions between the PASCap and cNBHD domains- as a two-step gating mechanism with two open states. All the results obtained with the E600R mutant and DPASCap could also be explained by inactivation/recovery from inactivation behavior and a change in the equilibrium between the closed states closed/inactivated states and open states. Moreover, the small tails between +90 to +120 mV suggest channels accumulate in an inactive state (Fig 1E). It is not convincing that the two open-state model is the mechanism underlying the mutant's behavior.  

      We respectfully disagree with the notion that a single open state can provide a plausible explanation for "All the results obtained with the E600R mutant and DPASCap". We think that our new single channel results settle the question, but even without this direct evidence, a quantitative assessment of the triphasic tail currents all but excludes the possibility of a single open state. We agree that it is, in principle, possible to obtain some form of a multiphasic tail with a single open state using the scheme suggested in this comment: at the end of the test pulse, a large fraction of the channels must be accumulated in inactive states, and a few are in the open state. The hyperpolarization to -100mV then induces a rapid depopulation of the open state, followed by slower replenishments from the inactive state. Exactly this process occurs in our model, when C2 empties through O2 (Supp. 5 to Fig 9, E600R model variant). However, this alone is highly unlikely to quantitatively explain the measured tail currents, because of the drastically different time scales of the initial current decay (submillisecond to at most a few milliseconds lifetime) and the much slower transient increase in current (several tens of milliseconds) and the final decay with time constants of >100 ms (see for instance data in Fig. 1 E for E600R +50 to +120mV test pulse). To sustain the substantial magnitude of slowly decaying current by slow replenishment of an open state with a lifetime of 1 ms requires vast amounts of inactivated channels. A rough estimation based on the current integral of the initial decay and the current integral of the slowly decaying current suggests that at the end of the test pulse, the ratio inactivated/open channels would have to be 500 to 1500 for this mechanism to quantitatively explain the observed tail currents. To put this in perspective: This would suggest that without inactivation all the expressed channels in an oocyte would provide 6 mA current during the +100 mV test pulse. While theoretically possible, we consider this a less likely explanation than a second open state.

      (8) Different models should be evaluated to establish whether the results in Figure 4 can also be explained by a model in which states O1 and O2 have the same conductance. It would be desirable if the conductance of both states were experimentally determined - noise analysis could be applied to estimate the conductance of both states. 

      In the modified model, O1 and O2 have the same single-channel conductance. The small conductance combined with the fast flickering did not allow an accurate determination, but we can state that there is no evidence that the single-channel conductance of the states is different.

      (9) Although not included, it looks like the model predicts some "conventional inactivation" This can be appreciated in Fig 8, and in the traces at -60mV. Interestingly, the traces obtained in the absence of Cl- also undergo slow inactivation, or 'conventional inactivation' as referred to by the authors. Please revise the following statement "Conventional inactivation was never detected in any mutants after repeated or prolonged depolarization. In the absence of inactivation, the pre-pulse dependent current increase at +40 mV could be related to changes in the relative occupancy of the open states". 

      We have carefully edited the manuscript to address this concern. The use of the term inactivation admittedly represents a challenge. We agree that the state that results from the flickering block (C2) could be defined as “inactivated” because it is preceded by an open state. Yet, in that case, the intermediate states that the channel travels between O1 and O2 would also be sensu stricto “inactivated”, but only in the mutants. We have made this clear in page 17.

      Recommendations for improving the writing and presentation.

      (1) Methods section: Please state the reversal potential calculated for the solution used. It looks like the authors used an Instantaneous I-V curve method to calculate the reversal potential; if that's correct, please show the I-V and the traces together with the protocol used. 

      We have provided the calculated reversal potentials for excised patches. We cannot predict the reversal potential in whole oocytes because we have no control over the intracellular solution. The reversal potential was determined in the mutants through the current at the end of the stimulus because the mutants produced measurable inward currents. The differences in reversal potential were not significant among mutants.

      Pulse protocols have been added to the figures.

      (2) Figure 1 suggestion: Combine the two panels in panel D and move the F panel up so the figure gets aligned in the lower end.

      Thank you, this has been done.

      (3) Please clarify the rationale for using the E600R-specific mutant. I assume it is based on the Lorinzci et al. 2016 effect and how this is similar to the DPASCap phenotype, or is it due to the impact of this mutation in the interactions between the N-term and the cNBHD? 

      We have explained the rationale for the use of E600R explicitly on page 6.

      (4) Fig S1A is not present in the current version of the manuscript. Include a cartoon as well as a structural figure clearly depicting the perturbations introduced by E600R, ΔPASCap, and the other deletions that are tested. Additional structural information supporting the discussion would also be helpful to establish clearer mechanistic links between the experimental observations described here and the observed conformational changes between states in Kv10 channel structures. 

      We have corrected this omission, thank you for pointing it out.

      (5) It would be informative to see the traces corresponding to the I-V shown in Fig 7 A and B at the same indicated time points (0, 60, 150, and 300s). Did the authors monitor the Ca2+ signal rise after the I&T treatment to see if it coincides with the peak in the 60s? 

      In Figure 7 (now Figure 8) we used voltage ramps instead of discrete I-V protocols because of the long time required for recording the latter. This is stated on page 19. Ca2+ was monitored through Cl- current after ionomycin/thapsigargin. The duration of the Ca2+ increase was reproducible among oocytes and in good agreement with the changes observed in the biphasic behavior of the mutants (Supplement 1 to Figure 8).

      (6) Fig 4. Please state in the legend what the different color traces correspond to in E600R and DPASCap. Is there a reason to change the interpulse on DPASCap to -20mV and not allow this mutant to close? Please state. How do the authors decide the 10 ms interval for the experiments in Fig 2? 

      Thank you for pointing this out, we have added the description. We have explained why we use a different protocol for ΔPASCap and the reason for using 10 ms interval (we believe the referee means Figure 4) on page 12.  

      (7) Fig. 5. Since the pre-pulse is supposed to be 5s, but the time scale doesn't correspond with a pre-pulse of 5 s before the test pulse to +40mV. Has the pre-pulse been trimmed for representation purposes? If so, please state. 

      The pre-pulse was 5s, but as the reviewer correctly supposed, the trace is trimmed to keep the +40 mV stimulus visible. This has now been clearly stated in the legend.

      (8) The mutant L322H is located within the S4 helix according to the Kv10.1 structure (PDB 5K7L), not in the 'S3-S4 linker'; please correct. 

      This has been done, thank you.

      The introduction of this mutant should also shift the voltage dependence toward more hyperpolarizing potentials (around 30mV, according to Schoenherr et al. 1999). It looks like that shift is present within the first component of the G-V. Still, since the max amplitude from the second component could be contaminated by endogenous Cl- currents, this effect is minimized. Repeating these experiments in the no Cl- solutions will help clarify this point and see the effect of the DPASCap and E600R in the background of a mutation that accelerates the transitions between the closed states (see Major comment 1). Did the authors record L322H alone for control purposes? 

      We have decided not to measure L322H alone or repeat the measurements in Cl--free solutions because we do not see a way to use the quantitative assessment of the voltage dependence of L322H and the L322H-variants of the eag domain mutants. Like in our answer to main point 3, we base our arguments not on the precise voltage dependence of the second component but on the shape of the G-V curves instead, specifically the consistent appearance of the first component and the local conductance minimum between the first and second components. After the introduction of L322H the first component is essentially absent.

      We think that the measurements of the L322H mutants cannot be interpreted as a hyperpolarizing shift in the first component. The peak of the first conductance component occurs around -20 mV in ΔPASCap and E600R (Fig. 7 C, D). After a -30mV shift, in L322H+DPASCap and L322H+E600R, this first peak would still be detected within the voltage range in our experiments, but it is not. A contamination of the second component would have little impact on this observation, which is why we refrain from the suggested measurements.  

      (9) The authors differentiate between an O1 vs. O2 state with different conductances, and maybe I missed it, but there's no quantitative distinction between the components; how are they different?

      Please see the response to the main comments 1 and 2. This has been addressed in singlechannel recordings.

      (10) Please state the voltage protocols, holding voltages, and the solutions (K+ concentration and Cl-presence/absence) used for the experiments presented in the legends on the figures. Hence, it's easier to interpret the experiments presented. 

      Thank you, this has been done.

      (11) The authors state on page 7 that "with further depolarizations, the conductance initially declined to rise again in response to strong depolarizations. This finding matches the changes in amplitude of the tail currents, which, therefore, probably reflect a true change in conductance" However, the tails in the strong voltage range (+50 to +120 mV) for the E600R mutant argue against this result. Please review.

      The increase in the amplitude of the tail current is also present in E600R, but the relative increase is smaller. We have decided against rescaling these traces because the Figure is already rather complex. We indicated this fact with a smaller arrow and clarified it in the text (page 8).

      (12) The authors mention that the threshold of activation for the WT is around -20mV; however, the foot of the G-V is more around -30 or -40mV. Please revise. 

      Thank you. We have done this. 

      (13) The authors state on page 9 that the 'second component occurs at progressively more depolarized potentials for increasingly larger N-terminal deletions" However E600R mutant that conserves the N-terminal intact has a shift as pronounced as the DPASCap and larger than the D2-10. How do the authors interpret this result? 

      We have corrected this statement in page 10 : “…the second component occurs at progressively more depolarized potentials for increasingly larger N-terminal deletions and when the structure of the ring is altered through disruption of the interaction between N- and C-termini (E600R)”.

      (14) The equation defined to fit the G-Vs, can also be used to describe the WT currents. If the O1 is conserved and present in the WT, this equation should also fit the WT data properly. The 1-W component shown could also be interpreted as an inactivating component that, in the WT, shifts the voltage-dependence of activation towards depolarizing potentials and is not visible. Still, the mutants do show it as if the transition from closed-inactivated states is controlled by interactions in the gating ring, and disturbing them does affect the transitions to the open state. 

      Out of the two open states in the mutant, O2 is the one that shares properties with the WT (e.g. it is inaccessible during Ca2+-CaM binding) while O1 is the open state with the voltage dependence that is conserved across the mutants. We, therefore, believe that this question is based on a mix-up of the two open states. We appreciate the core of the question: does the pattern in the mutants’ G-V curves find a continuation in the WT channel? 

      Firstly, the component that is conserved among mutants does not lead to current in the WT because the corresponding open state (O1) is not observed in WT. However, the gating event represented by this component should also occur in WT and –given its apparent insensitivity to eag domain mutations–  this gating step should occur in WT with the same voltage dependence as in all the mutants. This means that this first component sets a hard boundary for the most hyperpolarized G-V curve we can expect in the WT, based on our mutant measurements. Secondly, the second component shows a regular progression across mutants: The more intact the eag domain is, the more hyperpolarized the Vhalf values of transition term (1-W) and O2 activation. In Δ2-10, the transition term already almost coincides with O1 activation (estimated Vhalf values of -33.57 and -33.47 mV). A further shift of (1-W) in the WT is implausible because, if O1 activation is coupled to the earliest VSD displacement, the transition should not occur before O1 activation. Still, the second component might shift to more hyperpolarized values in the WT, depending on the impact of amino acids 2 to 10 on the second VSD transition.

      In summary, in WT the G-V should not be more hyperpolarized than the first component of the mutants, and the (1-W)-component probably corresponds to the Δ2-10 (1-W)-component. In WT the second component should be no more depolarized than the second component of Δ2-10. The WT G-V (Fig.1B) meets all these predictions derived from the pattern in the mutant GVs: When we use Eq. 4 to fit the WT G-V with A1=0 (O1 is not present in WT) and the parameters of the transition term (1-W)  fixed to the values attained in Δ2-10, we obtain a fit for the O2 component with Vhalf\=+21mV. This value nicely falls into the succession of Vhalf values for Δeag, ΔPASCap, and Δ2-10 (+103mV,+80mV,+52mV) and, at the same time, it is not more hyperpolarized than the conserved first component (Vhalf -34mV). Our measurements therefore support that the O2 component in the mutants corresponds to the single open state in the WT. 

      (15) Page 15, the authors state that 'The changes in amplitude and kinetics in response to rising intracellular Ca2+ support our hypothesis that Ca-CaM stabilized O1, possibly by driving the channels to deep closed states (Fig 5 and 6)' (pg 15). This statement seems contradictory; I can't quite follow the rationale since Ca2+ potentiates the current (Fig 7), and the addition of the L322H mutant in Fig 7 makes the shift of the first component to negative potentials visible.

      Please check the rationale for this section. 

      We have explained this more explicitly in the discussion (page 32). “Because access to O1 occurs from deep closed states, this could be explained by an increased occupancy of such deactivated states in response to CaM binding. This appears to be the case since CaM induces a biphasic behavior in the mutant channels that show reduced access to deep closed states; thus, L322H mutants behave like the parental variants in the presence of Ca2+-CaM. This implies a mechanistic explanation for the effect of Ca2+-CaM on WT since favoring entry into deep closed states would result in a decrease in current amplitude in the absence of (a permeable) O1”.

      Also, Figs 5 and 6 seem miscited here. 

      Thank you, we have corrected this.

      (16) For Figure 5, it would be helpful if each of the current traces corresponding to a particular voltage had a different color. That way, it will be easier to see how the initial holding voltage modulates current. 

      We have considered this suggestion, and we agree that it would make it easier to follow. Yet, since we have identified the mutants with different colors, it would be inconsistent if we used another color palette for this Figure. Supplement 3 to Figure 9 shows the differences in a clearer way.

      (17) Add zero-current levels to all current traces.

      We have done this.

      (18) The mathematical model should be described better. Particularly, the states from which O1 can be accessed should be described more clearly, as well as whether the model considers any direct connectivity between states O1 and O2. The origin of the voltage-dependence for transitions that do not involve voltage-sensor movements should be discussed. Also, it separation of kappa into kappa-l and kappa-r should be described. 

      We have extensively rewritten the description of the mathematical model to address these concerns.

      (19) Page 4, "reveals a pre-open state in which the transmembrane regions of the channel are compatible with ion permeation, but is still a nonconducting state". Also, page 27, "renders a hydrophobic constriction wider than 8 Å, enough to allow K+ flow, but still corresponds to a non-conducting state". These sentences are confusing - how can the regions be compatible with ion permeation, and still not be conducting? Is cation conductance precluded by a change in the filter, or elsewhere? How is it established that it represents a non-conducting state? 

      We have rephrased to clarify this apparent inconsistence. Page 4: “(…) in which the transmembrane regions of the channel are compatible with ion permeation (the permeation path is dilated, like in open states) but the intracellular gate is still in the same conformation as in closed states (Zhang et al., 2023).” Page 31: “The presence of an intact intracellular ring would preclude ionic flow in the WT, and its alteration would explain the permeability of this state in the mutants.”

    1. to simply slot Clarkson into the standard history of the field would miss much of the point

      I think this is too much modesty, or a kind of self-undercutting to try to convey the importance of the point. But functionally it undercuts the significance of the earlier parts of the chapter. At the end here I'm understanding the argument as being

      1. The antislavery campaign shows what state-of-the-art data visualization meant c. 1800, and these two different visualizations from Clarkson make the case that he should be considered one of the canonical figures.
      2. That's important because it heightens a set of ethical and political questions about whether and when to visualize. Clarkson's work can be considered a countervisualization or something -- possibly a concept to introduce ? -- because it's taking advantage of the trade etc. Also highlights dataviz as a political-rhetorical form, not just a scientific practice about astronomy etc.
      3. Just because we admire things about Clarkson's career doesn't mean should literally canonize him as a saint. Equiano's reaction shows that even at that time there are a different set of requirements.

      And then there is the metaphor of water and streams. This does a few things: 1. provides a counterpoint to the God's-eye, object view by adding a contingency of flow and direction, fluidity, and contingency. 2. was useful for ~1800 readers who ALSO weren't always looking for this objective god's-eye view, which is OK. (I think the infographic/dataviz distinction from the introduction here is useful, because it underlines that the more 'subjective' or whatever flow timelines are an ADVANCE on Priestley's straight lines and can be seen as such. 3. Motivates your own data visualization of the streams of with the also-canonical Mississippi visualization. I may have missed this but I think the connection here is almost fully implicit. This could be one key to motivating the water thing as your own choice.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:

      We thank the reviewer for his/her time and for the constructive comments. Below please find our detailed responses to your points.

      STING is a key signalling hub in the innate immune system, receiving multiple inputs from upstream activators (such as cGAS) and in turn triggering multiple downstream events (such as IFN induction, NF-kB signalling, autophagy, cell death). Mutations in the STING gene cause a rare inflammatory disease called SAVI. Using a previously established STING ki mouse that recapitulates some of the clinical observations in SAVI patients, this manuscript tests the hypothesis that TNF signalling drives pathology. Using anti-TNF antibody and TNF receptor knockout, the authors show that TNF indeed plays important roles in causing disease in this mouse model. For example, the loss of T cells and neurons is prevented when TNF signalling is blocked, and lung pathology is rescued in STING ki mice lacking TNF receptors. Overall, the manuscript is well written and laid out, and the experimental work is of a high technical standard.

      Major comments

        • Most figures show pooled data from two independent experiments including a total of 5-8 mice. Given the variability in some of the readouts, this raises the question of whether there is sufficient statistical power to draw conclusions. For example, in Figure 2, the conclusion that "Infliximab did not alter the expression of inflammatory mediators" seems questionable given the results in Figure 2F and G. Did the authors perform a power calculation? What effect size can the authors detect given the variability and number of replicates? Similarly, in Figure 3, the authors conclude that "Disruption of TNFR signaling did not significantly prevent T cell lymphopenia"; however, with some more replicates, the data in Figure 3D would likely reach significance. Similar concerns apply to several panels in Figures 4 and 6 and to Figure S5M. Ideally, the authors should perform additional repeat experiments to increase the number of replicates. If that is not possible, power calculations need to be provided and conclusions should explicitly mention the minimum effect size that the author can detect given the small sample size (for example "Infliximab did not alter the expression of inflammatory mediators more than x-fold").* Thank you for this suggestion. However, it is not possible to repeat the treatment of mice with Infliximab for generation of more replicates. The blockade of TNF signalling by treatment with drugs did not cure the murine SAVI disease. According to animal welfare restrictions, we cannot perform additional treatment experiments with Infliximab or Etanercept.

      We analysed the effect size d, f and power of all these presented results and collected them in table S4. Additional explanations about effect sizes were added in the corresponding text to Figures 2 and 3. The demonstrated results in Figure 4 and 6 already contain significant data. We did not include the calculation of effects sizes here. All effect size and power calculations are summarized in table S4.

      • The authors should not make unjustified overstatements. For example, STING KI; TNFR1/2 KO mice should not be referred to as a "new mouse model". The manuscript simply tests the role of TNFR1/2 in the already published STING N153S model. In line 687, avoid using "impressively" and in line 734 avoid using "massively".*

      • *

      Thank you for this suggestion. We changed this sentence into:…”these newly generated mouse lines of TNFR”…., see line 796. Additionally, in line 687 (actual line 705) we omitted “impressively” and in line 734 “massively produced” into “elevated” (actual line 752).

      Minor comments

      • Line 767-769: The statement that spike activates cGAS is misleading, because this effect is an indirect consequence of cell-to-cell fusion (Liu et al 2022).*

      • *

      Thank you for this suggestion. We changed this sentence into: Cell fusion caused by the SARS-CoV-2 spike protein is a potent… (actual line 785).

      Reviewer #1 (Significance (Required)):

      • *

      The main strengths of this study are (1) the use of complementary antibody-based and genetic methods to test the role of TNF signalling; (2) the use of multiple different readouts; and (3) the analysis of many different cell types / organ systems. The main weaknesses are (1) small sample sizes limiting statistical power (see above) and (2) the exclusive use of mouse models.

      • *

      Overall, my opinion is that the advance is important, both fundamentally and clinically. Studies of this and the related V154M mouse model previously showed an important role of non-IFN pathways in driving disease. This study indicates that TNF signalling may cause pathology. This not only extends our understanding of STING's role in autoinflammation but also opens a direct therapeutic avenue using approved TNF targeting drugs.

      • *

      This study will be primarily of interest to specialised audiences working on STING and SAVI, and secondarily to the wider innate immunity field.

      • *

      This reviewer has expertise in the field of nucleic acid sensing, including cGAS-STING.

      • *

      • *

      Reviewer #2:

      We thank the reviewer for his/her time and for the constructive comments. Below please find our detailed responses to your points.

      *In this paper, Luksch et al (2024) examines the role of TNF signaling in STING-associated vasculopathy with onset in infancy (SAVI). By using pharmacological inhibition and genetic inactivation of TNF receptors in a murine SAVI model (STING ki), the research found that pharmacologically inhibiting TNF signaling improved T cell lymphopenia but had limited effects on lung disease. Genetic inactivation of TNFR signaling, particularly TNFR1, enhanced thymocyte survival and expanded the peripheral T cell pool, reducing inflammation and neurodegeneration. The development and progression of severe lung disease in STING ki mice are also reliant on TNFR1 signaling, while TNFR2 deletion did not alleviate lung inflammation. The authors also explored the severe inflammatory lung disease manifestation, showing that primary lung endothelial cells in STING ki mice allowed more neutrophil attachment compared to those in STING WT mice, indicating chronic STING activity in endothelial cells disrupts the endothelial barrier and promotes severe lung disease. The study highlights TNFR signaling as crucial in SAVI and COVID-19 progression and suggests blocking TNFR1 signaling as a potential therapeutic approach for both diseases. *

      • *

      Major comments:

      The paper establishes a strong connection between TNFR1 depletion and the reduction of SAVI disease severity in lung and neuroinflammation, suggesting TNFR1 blockade as a viable therapeutic strategy for SAVI. To strengthen the arguments and improve the therapeutic potential, the authors should address the following major comments:

        • The authors conclude that TNFR1 signaling drives murine SAVI disease, as evidenced by the reduced severity of lung disease in TNFR1 -/- mice. While the genetic model is convincing, the discrepancy between pharmacological inhibition and genetic models needs clarification. Before attributing the pharmacological failure to late administration, have the authors considered that Infliximab might not sufficiently deplete TNF to achieve therapeutic benefits? In figure 2H, serum TNF levels were not significantly altered in STING ki mice treated with Infliximab. Have the authors considered using other TNF inhibitors or alternative methods to measure TNF depletion efficacy in STING ki murine models, such as qPCR, flow cytometry, or immunohistochemistry in lymph nodes or lung tissues?* Thank you for this suggestion. In a preliminary experiment, we already treated STING WT and STING ki mice with Etanercept which is not included in the paper. 3-week-old mice were treated with subcutaneously injection of 25 mg/kg Etanercept or saline, twice per week, for 7 weeks. After treatment, all mice were euthanized and single cell suspensions of blood and spleen were used for flow cytometry analysis. Lung tissue was harvested for histological analysis. Quantification of gene expression was performed by snap frozen lung and kidney tissue and quantification of secreted proteins was analysed by snap frozen serum.

      The transcription of ISGs and proinflammatory mediators in lung tissue was not significantly improved by the Etanercept treatment of mice, see additional figure below (A – D). Interestingly, the amount of secreted CXCL9 in the serum was reduced in Etanercept treated mice compared to vehicle treated mice (E). We concluded that our treatment strategy had no impact in the manifestation and progression of murine SAVI disease, in highly inflamed tissues / organs. However, we found a reduction (partially significant) of proinflammatory mediator transcriptions in the kidney of Etanercept treated mice compared to vehicle control mice. Murine SAVI disease is a systemic autoinflammatory disease without histological alteration in kidney tissue of 10 weeks old mice. Remarkably, transcription of ISGs and proinflammatory mediators is highly upregulated in SAVI mice. Treatment with Etanercept improved this aberrant gene expression in murine SAVI influenced tissue / organ (I – K). These results encouraged us to perform the treatment with infliximab because we expected a more pronounced effect since infliximab can bind the monomeric and trimeric form while etanercept can only bind to the active trimeric from of TNF.

      Etanercept treatment of STING WT (in black) and ____STING ki (in red)____ mice.

      (A) Relative expression level of Cxcl10, (B) Mx1, (C) Tnf and (D) Il1b in lung tissue of Etanercept or saline treated STING WT and STING ki mice. (E) Quantification of CXCL9, (F) CXCL10, (G) IL-6 and (H) TNF in serum samples from STING WT and STING ki mice after treatment. (I) Relative expression level of Cxcl10, (J) Mx1, (K) Tnf and (L) Il1b in kidney tissue of treated mice.

      • The TNF pathway exhibits redundancy, as multiple signaling molecules or pathways can compensate for the loss of TNF function to maintain cellular processes and immune responses. The authors showed that thymocytes of STING ki mice lacking TNFR1/2 expressed significantly lower levels of IFN-related genes (Cxcl10, Sting1), and mice lacking TNFR1 and TNFR1/2 expressed reduced levels of NF-κB-related genes. Does this imply that IFN and NF-κB pathways are downstream of TNF signaling driving SAVI progression? It would be valuable to hear the authors' comments or postulations on the potential mechanisms of TNF driving SAVI progression in the discussion, and the methods to dissect the mechanisms further using genetic or pharmacological methods.*

      Thank you for this suggestion. STING is a key player in various proinflammatory mechanism and is directly involved in IFN and NF-κB signalling. We assume that these signalling pathways are adaptable to various proinflammatory situations. Knock out of TNFR1 and TNFR1/2 results in a strong inhibition of all inflammatory reactions in the whole organisms. We think, it is not possible to conclude mechanisms of murine SAVI manifestation and progression from the results of these mouse lines only. These observations provide new hypothesis, but cannot completely explain the mechanism.

      • The authors mentioned that the pharmacological inhibition of TNF by Infliximab is ineffective due to late administration compared to the onset of SAVI. How would this affect the therapeutic treatment of TNF if the treatment is going to be later than the disease onset? Can the authors elaborate on the potential ways to circumvent the timing of treatment? Would TNFR1 antagonists experience the same issue? To understand disease progression and optimal targeting times, the creation of an inducible TNFR1/2 -/- mouse model could be beneficial. This is optional, but the authors are encouraged to comment on improving TNFR1/2 -/- mouse SAVI models to further study the therapeutic potential of TNF signaling blockage in treating SAVI.*

      We agree with the suggestion. In the next project, we want to generate STING ki mice with inducible knock out.

      Minor comments:

      • The authors separate STING WT and STING ki into different graphs, which can sometimes make it hard to compare STING WT and STING ki baseline levels. It would be beneficial to merge the two genotypes into single graphs for easier comparison.*

      Thank you for this suggestion. In the first version of this manuscript, we collected results from STING WT and STING ki mice in one graph with 8 bars in different colours and textures in the case of TNFR knock out lines. These graphs were overloaded and very confusing. It is was not possible to mark statistical calculations inside these graphs without losing the focus. Hence, we created the demonstrated design of graphs. We think this is the most convincing version.

      • Figure S5 lacks statistical annotations, although the legends mention them. Are the statistics usually shown when a comparison is mentioned in the text, or are they only displayed when the differences are significant? It would be helpful if the authors could clarify this and ensure that all relevant statistical comparisons are clearly reflected in the graphs, regardless of the significance level. This consistency would improve the clarity and interpretation of the data presented.*

      • *

      Thank you for this suggestion. We removed the significance level from the legend of Figure S5 (actually line 1199).

      • *

      The authors did an excellent job discussing the study's implications, but some of this content could be moved to the introduction. The hypothesis that "tumor necrosis factor (TNF) signaling is involved in the manifestation and progression of murine SAVI disease" can be introduced more naturally once the authors present previous findings on TNF's association with various autoimmune disorders. This would set a clear context for the study's objectives and rationale.

      We agree with this suggestion and inserted the sentence: “In our previous investigations, we observed an elevated transcription of Tnf in spleen and thymus of STING ki mice (Siedel et al., 2020).” (actual line 89/90).

      General Assessment: The study identifies enhanced TNF signaling as a driver of SAVI and specifies TNFR1 blockage as a promising treatment to reduce disease severity. It thoroughly characterizes pharmacological inhibition and genetic perturbations of TNF signaling in murine SAVI models and creates a novel mouse model for studying TNF-targeted therapies in SAVI treatment.

      *However, the study is limited in characterizing the discrepancy between pharmacological inhibition and genetic depletion of TNF and understanding the underlying mechanisms of TNF driving chronic STING activation and tissue inflammation. *

      Advances: The study extends knowledge in the field by demonstrating that enhanced TNF signaling drives SAVI, establishing causation rather than mere correlation. The authors provide strong rationale for treating SAVI with TNF inhibitors/blockage, previously used in other autoimmune disorders like IBD or Crohn's disease, but not in SAVI. They also present a valuable genetic model for studying TNFR signaling blockage in SAVI progression, which is important for both the field of SAVI and future therapy development.

      Audience: The research provides translational and clinical insights by suggesting that targeting TNFR1 signaling could inspire novel treatments for SAVI. The study also advances basic research on SAVI disease progression. Immunologists and clinicians studying and treating autoimmune disorders are the intended audience, but the findings have broader implications. The study highlights the potential role of TNF signaling in COVID-19 disease progression and treatment, thus attracting interest beyond the field of autoimmune disorders.

      • *

      Field of expertise:

      cGAS-STING regulation in chromosomally unstable cancers, genomic instability, nuclear envelope rupture and repair

      Do not have sufficient expertise in:

      Immunological underpinning of autoimmune disorders, clinical models or manifestations of SAVI

      • *

      • *

      Reviewer #3:

      We thank the reviewer for his/her time and for the constructive comments. Below please find our detailed responses to your points.

      • *

      Uncontrolled activation of STING is linked to autoinflammatory disease "STING-associated vasculopathy with onset in infancy (SAVI)". The authors had previously published a mouse model of SAVI, which was generated by knocking in the disease causing variant N153S into the endogenous murine Sting1 gene (STING ki) (Luksch et.al., 2019). In the current study, the author further investigated the role of tumor necrosis factor (TNF) signaling in manifestation and progression of murine SAVI disease by using the approach of pharmacologic and genetic inhibition of TNF receptors TNFR1 and TNFR2. Overall, the authors were able to demonstrate the following novel findings:

      • *

      1) Infliximab treatment of STING ki mice significantly increased the number of blood CD8+ T cells and thymic cells count. The authors claimed that the pharmacological inhibition of TNF signalling has a partial rescue effect of T cell lymphopenia. However, pharmacologic inhibition of TNF signalling however has no effect on lung disease.

      2) On the other hand, STING ki;Tnfr1-/- (lacking TNFR1) showed the similar modest rescue of the CD8+ T and CD4+ T cells in blood compared to the WT C57BL/6 (BL6) but not with STING ki;Tnfr2-/- (lacking TNFR2). STING ki;Tnfr1-/-, STING ki;Tnfr2-/- and STING ki;Tnfr1/2-/- had modest rescue of thymic cell count and reduced spleen cell count (reduced splenomegaly). Along with the rescued thymic content and reduced splenomegaly, genetic ablation of TNF signalling (STING ki;Tnfr1-/-) also prevented manifestation of severe inflammatory lung disease.

      3) To investigate the role of lung endothelial cells in the development of interstitial lung disease, primary murine lung endothelial cells from STING WT, STING ki and STING WT;Tnfr1/2-/- and STING ki;Tnfr1/2-/- mice were isolated and bulk RNAseq was performed. This showed decreased level of several proinflammatory cytokines (e.g. Tnf, Il1b) and chemokines (e.g. Cxcl1, Cxcl2, Cxcl9, Cxcl10, Ccl2, Ccl3 and Ccl4) in STING ki mice lacking TNFR1/2 compared to STING ki mice.

      4) Neutrophils were isolated from bone marrow and were added to cultured primary lung endothelial cell monolayers. The experiments demonstrated that the attachment and transmigration of neutrophil cells were dependent on expression of STING gain-of-function mutation in endothelial cells.

      • *

      A few points require clarification before publication of this study.

      • Tnfr1-/-, Tnfr2-/- and Tnfr1/2-/- did not show any statistical significant improvement of thymic cell count in STING ki mice. As such, the statement in the conclusion/summary section of discussion regarding Tnfr1 can restore thymocyte numbers should be toned-down.
      • Thank you for this suggestion. In Figure 4 E, we demonstrated that knock out of TNFR1 leads to increasing of SP CD8 thymocyte count and partially of SP CD4 thymocyte count (Fig. 4 D). In agreement with this suggestion, we marked this subpopulation of thymocytes in the discussion and summary section, see actual line 684 and see actual line 794.

      2) The section on Neuroinflammation and neurodegeneration and dependency of TNFR1/2 signaling is very currently difficult to follow (based on how the data are presented in figures and text). This section requires to be re-written for clarity.

      • *

      Thank you for this suggestion. We re-wrote this section, see line 472 - 499.

      Neuroinflammation and neurodegeneration in dependency of TNFR1/2 signaling

      The extent of inflammation in mouse brain resulting from constitutive activation of STING N153S was reported by quantifying the density of Iba1-positive microglia (Fig.5 A). Consistent with our previous findings (Szego et al., 2022), the density of Iba1-positive microglia in the substantia nigra was higher in STING ki;BL6 mice than in STING WT mice (Fig.5 B). TNFR deficiency did not affect neuroinflammation because there was no significant difference between the density of Iba1-positive microglia between STING ki;BL6 mice and STING ki;Tnfr1/2-/- mice (Fig.5 B). This suggests that the TNF pathway is not required for STING-induced microglia activation in the substantia nigra.

      In addition, we measured the extent of STING-induced astrogliosis by quantifying the density of GFAP-positive cells (Fig. 5 A). Consistent with our previous findings, the density of GFAP-positive astroglia was higher in STING ki than in STING WT mice (Fig. 5C). Yet, as for microglia, there was no significant difference between the density of GFAP-positive astroglia between STING ki;BL6 mice and STING ki;Tnfr1/2-/- mice (Fig.5 C), suggesting that the TNF pathway is not required for STING-induced astrogliosis in the substantia nigra.

      Finally, we measured the extent of STING-induced neurodegeneration by quantifying the density of TH-positive dopaminergic neurons in the substantia nigra (Fig. 5A). As in our previous findings, the density of TH-positive neurons was lower in STING ki;BL6 mice than in STING WT mice (Fig.5 D). The density of TH-positive neurons in the substantia nigra of STING ki;Tnfr1/2-/- mice was higher than the density of TH-positive neurons in the substantia nigra of STING ki;BL6 mice (Fig. 5 D), suggesting that the STING-induced degeneration of TH-positive neurons was blunted in Tnfr1/2-/- mice and that TNFR1/2 are involved in the STING-induced degeneration of dopaminergic neurons.

      Hence, there is a discrepancy between STING-induced effects on glial cells as opposed to STING-induced effects on neurons. The dependence of STING-induced neurodegeneration but not glial response on TNFR1/2 suggests that the STING-induced degeneration of dopaminergic neurons is not a direct consequence of microglia or astroglia activation. This is consistent with the emerging concept of a neuron-specific inflammatory response (Welikovitch et al., 2020).

      *The powerful use of in vivo genetic KO models and TNF inhibitor makes this study a valuable contribution to the field - helping further decipher the importance of the NF-KB/TNF branch of STING in SAVI (knowledge gap). The audience for this work would be specialised to STING biology and potential clinical treatments of SAVI. *

      • *

      Our expertise is in nucleic acids sensing (such as STING) and auto-immunity.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers

      We thank the reviewers for their comments and suggestions, which we think are helpful and will improve the manuscript, and intend to address with the changes and planned revisions below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Bello et al look at the SNP rs28834970 associated with Alzheimer's disease (AD), with C being the risk allele, on chromatin accessibility and expression of a nearby gene, PTK2B, in microglia. Their contention is that the single SNP affects chromatin accessibility and binding of the transcription factor CEBP[beta] in an intronic region of PTK2B and thereby affects PTKB expression. I had a few questions that I think are critical to be addressed. Please note that my numbering of panels is based on the figures, not the legends, which do not seem to quite agree with each other. There are also some figure legends that say "IFNg" while the figures say "LPS", which should be fixed.

      We apologise for the mistake in the figure legend that made this confusing, which we have now revised.

      The abstract says that editing a line that is homozygous for protective alleles to homozygous for risk results in "subtle downregulation of PTK2B expression". It isn't clear to me that the presented data fully supports this contention, which is central to the argument of the paper. In figure 2e, the authors show in both RNAseq and ddPCR that there is numerically lower PTK2B expression but this is not indicated to be statistically significant by one-way paired ANOVA. If there is no nominally significant difference in the edited lines, compared to the proposed significant differences in lines carrying the full risk haplotype (figure 1), then it would not seem sensible to ascribe the effects to the single edited base pair.

      We agree with the reviewer that given the effect of the SNP on PTK2B expression in the edited lines is small and only significant in macrophages, we should not interpret the effects to be mediated solely through PTK2B expression, and have substantially reworded the manuscript accordingly.

      Whilst the effects in the eQTL analysis are significant, it is worth noting that this is likely due to the much larger number of donors (133-217) giving greater power to detect the subtle changes in expression (~1.1 to 2 fold in eQTL). This change is of a similar magnitude in our SNP edited lines (~1.2 fold in SNP edited lines) as would be expected of most common regulatory variants so we believe that it could be the primary causal variant. However, we cannot exclude that other variants in the haplotype could contribute to the effect, so have also reworded accordingly to make this clear.

      Given this uncertainty about the overall strength of effect of the single base pair change it would seem important to evaluate the proposed mechanism of CEBPb binding. It wasn't clear whether the ATAC-seq data summarized in the volcano plot in 2C is proposed to be a cause or a consequence of the CEBPb binding change. I am assuming that the 'fold change' estimate here is CC compared to TT, which would be consistent with direction of effect in figure 1, but please clarify.

      We apologise for the mistake in the figure legend that made this confusing, which we have now revised along with clarification in the revised text. It is difficult to be sure whether changes in chromatin accessibility are a cause or consequence of CEBPb binding, but the fact that the binding of CEBPb is increased in the CC allele (Fig 2a, Fig 2c), that the C allele better matches the consensus sequence (Fig 2b) and there is increased chromatin accessibility (Fig 2a, Supp Fig 3b) suggests that CEBPb binding is causing the formation of the region of chromatin accessibility.

      In contrast to the subtle effects at PTK2B, the global transcriptional effects in figure 3 look quite strong. Are any of these changes dependent on PTK2B, that is to say, are they mimicked by partial suppression of PTK2B expression or activity?

      We agree that the downstream effects of the SNP are much stronger than the effects on PTK2B expression, and we have substantially reworded the manuscript to make it clear that we are unsure that the effects of the SNP are all mediated via PTK2B.

      However, we note that there is evidence in the literature of a loss in CCL4 and CCL5 expression upon PTK2B knockout in macrophages (https://www.nature.com/articles/s41467-021-27038-5) and inhibition of PTK2B in monocytes results in a reduction in CCL5 and CXCL1 (https://www.nature.com/articles/s41598-019-44098-2) consistent with our observations.

      Experiments to manipulate PTK2B expression in microglia and readout changes at the RNA level would take a few months to complete, but we would be willing to do this if the reviewer felt this was necessary.

      Finally, in figure 4, it should be clarified as to why lower expression of PTK2B would be expected to have a detrimental effect on Alzheimer's risk. If understood correctly, and again fixing the figure legends would be helpful, the CC edited lines (risk) have lower chemokine induction than the unedited TT lines.

      We apologise for the error in this figure which we have corrected in the revised version. You are correct that the CC lines have a lower chemokine level in both unstimulated and stimulated cells, and we have now discussed further how this may be linked to increased disease risk.

      "Even though overexpression of these chemokines is characteristic of neuroinflammation, correlated with disease progression and found in late stages of AD, knockout of chemokines, such as CCL2, and chemokine receptors, such as CCR2 and CCR5, in mice is associated with increased Aβ deposition and accumulation [47,50-52,107]. It has also been found that patients carrying CCR5Δ32 mutation, which prevents CCR5 surface expression, develop AD at a younger age[108]. Therefore, we hypothesize that in individuals carrying the C/C allele of rs28834970 downregulation of these chemokines in macrophages and microglia harbouring the C/C allele of rs28834970 affects Aβ-induced microglia chemotaxis, leukocytes recruitment and clearance of Aβ, and may increase the risk of developing symptomatic AD"

      Reviewer #1 (Significance (Required)):

      Going from GWAS hits, which represent blocks of high LD inherited variants, to single functional variants is a difficult problem in human genetics. The current paper attempts to isolate the effect of a single variant within an LD block on IPSC derived macrophages and microglia. This idea might be useful in nominating PTK2B as a therapeutic target for AD, although there is some question in my mind as to direction of effect.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      SUMMARY: In this manuscript the authors explore the biological effects of an intronic SNP in the PTK2B gene, previously shown to be associated with late onset Alzheimer's disease (AD) risk. Based on the likely effect of the SNP locus on PTK2B expression in the macrophage lineage, the authors explore the consequences of introducing with the Crispr/Cas9 technique the biallelic SNP base change (C/C vs T/T) in a human IPSC line that is then differentiated into macrophages or microglia. They observe that C/C increases chromatin accessibility and CEBPb binding in comparison to T/T, with a slight decrease in PTK2B expression, significant in macrophages but not in microglia. The authors then investigate the transcriptome changes induced by the C/C mutation and find alteration in many genes, including a decreased expression of a number of cytokine or receptor proteins involved in inflammatory responses. The authors also mention a decreased effect on IFNg-induced reduced mobility but the data are missing (see Figure errors below). Overall the authors propose that the risk SNP is associated with a decreased PTK2B expression and hypothesize a link between this change and a decreased function of macrophages/microglia that may contribute to AD pathology.

      MAJOR COMMENTS

      1- The authors claim that their results show that the investigated SNP has a causal effects in "microglial function" (Title) and in Alzheimer's disease (AD) (Abstract 2nd sentence "Here we validate a causal single nucleotide polymorphism (SNP) associated with an increased risk of Alzheimer's disease". The word "causal" is repeated many times. However the authors should qualify their claim with respect to AD. Their results do show that the SNP has an effect on chromatin accessibility, CEBP binding, PTK2B expression and transcriptome, but the link between these changes is not formally demonstrated and their potential role in AD-like phenotype is not explored. The "causal" role is not formally and logically demonstrated. It remains an interesting, plausible hypothesis and the results provide strong arguments in support of that hypothesis but do not prove it, yet.

      Concerning the title, "causal effects on microglial function" is awkward, anything that has effects is logically "causal" in these effects. The title should be "... has effects on microglial functions" or "... alters microglial function".

      We agree with the reviewer that given the effect of the SNP on PTK2B expression in the edited lines is small and only significant in macrophages, we should not interpret the effects to be mediated solely through PTK2B expression, or that they cause AD. We have substantially reworded the manuscript throughout to account for this.

      2- One major difficulty in the results is to link the slight decrease in PTK2B transcript, which is only significant in macrophages, with the rest of the phenotype. Because what matters to make this link is not the mRNA but the protein, and because mRNA levels are often not strictly correlated with the protein levels, the authors should measure the PTK2B/PYK2 protein levels in their differentiated cell lines in basal conditions and following activation (as they do for other readouts) using immunoblotting. A robust and significant diminution in PYK2 protein would strongly support its role in linking PTK2B expression and transcriptome change.

      We have performed preliminary analyses of PTK2B expression by Western blot in these cell lines after differentiation, but were unable to observe a significant change in abundance in the edited cell lines. This is not unexpected given the results at the RNA level, since the effect size of this common regulatory variant is likely very small (estimated to be ~1.2 fold from the eQTL analysis), and likely within the variability of this assay.

      As mentioned above, we have reworded the manuscript to avoid interpreting that the effects of rs28834970 are mediated solely through effects on PTK2B expression. We think that an experiment to manipulate PTK2B levels (see next point) may be a better way to demonstrate whether these effects are mediated through PTK2B expression.

      An optional additional key experiment would be to reverse the transcriptome phenotype by increasing the expression of PTK2B (e.g. by cDNA transfection). Note that these points are important because an alternative hypothesis to explain the effects of C/C mutation on macrophage function would be that the C/C mutation has a long distance effect on other chromatin regions with key role in regulating these cells.

      We agree that this would be a valuable experiment, and are planning additional experiments to investigate the effect of manipulating PTK2B levels (through knockout) on microglia.

      3- The manuscript contains several errors in the figures and figure legends. In Fig. 2 the legends for the figure items are shuffled. Figure 4 and Supplementary Figure 5 are duplicates of the same one. Consequently important data are not presented.

      We apologise for the errors in these figures that were due to a mistake during uploading where the incorrect versions were used. The legends for figure 2 and panels in figure 4 have now been corrected, and show the effects of rs28834970 on microglial migration and chemokine release in the presence or absence of IFNg.

      4- When the number of replicates is small (e.g. n = 3) it is preferable to use non parametric tests (rank analysis, e.g. Mann Whitney's test) rather than t test. This applies to Figures 2D (current legend 2A), 2E (current legend 2B), Figure 4A-C, Supplementary Figures 2A, 2B. In Supplementary Fig 4E (MARCO) the number of replicates (presumably 3 because based on RNAseq) and the used test are not indicated. Is it the RNAseq statistical analysis?

      We thank the reviewer for this comment. We acknowledge that the t-test may lead to inflated false discovery rates. However, it has been shown that for small sample sizes parametric tests have a power advantage compared to non-parametric ones that may outweigh the possibly exaggerated false positives. See https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02648-4#Sec3 which states:

      "In conclusion, when the per-condition sample size is less than 8, parametric methods may be used because their power advantage may outweigh their possibly exaggerated false positives."

      We have also modified the legend of supplementary figure 4E to clarify the number of replicates used.

      5- In addition to the above comment on tests, when the number of replicates is small it is not appropriate (and misleading) to show box plots or bars with SEM. In the indicated figures the individual data points should be shown.

      We now show individual replicates on box plots (Figure 2D, 2E and supp figure 4E).

      MINOR COMMENTS:

      a- Macrophages and microglia are very similar cell types. Could the authors comment more on the differences they observe and how they are related to those previously described?

      We have now referenced the original papers and commented on the markers that we see differentially expressed, notably P2RY12 which is a key homeostatic microglia marker that distinguishes these cells from macrophages.

      b- In Fig. 2A CEBPb cut and run plot, the differences are not limited to the SNP immediate vicinity, there are also visible differences between T/T and C/C plots in at least a 40-kb range. Is it due to multiple interactions of CEBPb? How can the point difference have broad consequences? Please explain this potentially interesting and relevant finding.

      Whilst there may be small changes in CEBPb binding at the second intronic PTK2B chromatin peak, this is not statistically significant given the variability between repeats. In fact, the only significant change we see in CEBPb binding genome-wide is at the locus overlapping the SNP (Fig 2c).

      c- Potentially cis-altered genes near the SNP include CHRNA2 and EPHX2 (see Sup. Fig. 3a). Their expression may not be detected in macrophage lineage. If this is the case please indicate in the text, otherwise please include the corresponding data in Sup. Fig. 3b to show the presence or absence of SNP-induced change.

      You are correct that CHRNA2 and EPHX2 are not expressed in our macrophages or microglia, and we have now explicitly stated this in the revised text.

      d- In general the Figures are not of very high quality and are difficult to read or understand without constantly going back and forth to the legends (which are mislabeled in some instances). To improve:

      . Please increase font size whenever possible.

      . Please improve Fig. 1d by indicating the position of the SNP, numbering the exons (an intermediate scale plot may be necessary and lines on bottom trace are hardly visible).

      . Please indicate the correct color code for T/T and C/C in Fig 3a and b, left panels, which currently doesn't match.

      . Please label the Venn's diagrams comparisons in Sup. Fig. 4b.

      . In the text and legends the Figure items are identified with letters in upper case, in the figures they are in lower case. Please be consistent.

      We have improved the resolution of the images in the pdf and Fig 1d has been revised to include the position of the SNP. The colour code for T/T and C/C is correct in fig 3a and 3b, but since the PCA plots are independently created, we would not always expect the position of the T/T and C/C alleles to be the same. The Venn diagrams in Sup Fig 4b have been updated, and the letters for the figure panels made consistently upper case throughout.

      e- In Fig. 2D and 2E, the Y axes should start at zero to avoid artificially increasing the visual differences. If there is a strong reason not to do so (I don't see any here), the Y axis should be clearly interrupted to avoid confusion.

      We have altered this accordingly.

      f- In the introduction the authors provide some background about previous work about the potential role of PTK2B/PYK2 in AD pathophysiology. The cited preclinical results suggest that PTK2B activity could have a deleterious effect (references in the manuscript). In contrast, some other reports (PMID: 29803828, 33718872) suggest a protective effect of PTK2B/PYK2. Because the evidence in the current manuscript suggests that the risk-associated SNP results in a decreased function of PTK2B/PYK2 (through decreased levels), at least in cells of the macrophage lineage, the authors could broaden their discussion to include these results.

      We have now discussed the conflicting evidence in the revised manuscript.

      Reviewer #2 (Significance (Required)):

      ADVANCE: Late onset Alzheimer's disease is a major medical issue. It has a complex genetic risk component with many associated loci identified in GWAS. Most of these have only a small individual impact on the risk. One of the SNPs associated with increased risk (rs28834970) is located in an intron of the PTK2B gene. Although various reports have investigated the role of the PTK2B gene product, the tyrosine kinase PYK2, in several AD models, the possible link with rs28834970, is unclear.

      An important point is to determine whether TàC SNP corresponding to rs28834970 alters PTK2B expression and how it does so. An alternative hypothesis could be that the SNP has a strong linkage disequilibrium with an unidentified allele in human populations that could be responsible for AD risk. The current manuscript is a significant step forward in addressing that question. By generating a biallelic C/C SNP mutation in a human IPSC line the current study allows to eliminate such linked contribution.

      The strength of the manuscript is to show an effect on chromatin accessibility, CEBP binding and possibly PTK2B transcripts. It also provides interesting evidence of a broad effect of the C/C mutation on the transcriptome of macrophage lineage cells. In its current form the manuscript presents weaknesses that could be improved. These flaws include issues with the presentation discussed above and the uncomplete demonstration that it is the decrease in PTK2B expression that causes the macrophage/microglia phenotype. If these flaws were overcome the paper would represent a significant advance.

      AUDIENCE: The expected audience is specialized in AD with a possible broader range if all weaknesses are addressed.

      REVIEWER EXPERTISE: Basic science close to the field.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer #1:

      We agree with Reviewer 1 that a function of ROPGEFs in this process was expected to some degree. However, we want to point out that this manuscript focuses on the requirement of ROPGEFs and especially the spatio-temporal description of ROP signalling polarisation and activation during pollen germination. Moreover, different to the downstream ROPs, we show ROPGEFs do not act strictly redundant, confirming results from root hair initiation and providing additional evidence that multiple signalling pathways are required for pollen germination and that ROPGEFs might be essential for bringing specificity to these signals.

      Major comments:

      1. Only one GEF11 mutant line, gef11-t1, was analyzed for germination ratio. It is presumptuous to conclude that GEF11 has no function in the pollen germination of Arabidopsis thaliana (line 241- line 242).

      After the initial negative results, we did not focus on GEF11 further. Thus, we fully agree that it is presumptuous to make such strong statements about the role of GEF11 during pollen germination. We generated additional gef11 mutant alleles for this revision plan using CRISPR/Cas9 as no other suitable lines were available. Moreover, we now have additional higher-order mutants available to demonstrate the function of GEF11 during pollen germination. These additional lines were generated and confirmed and are growing right now. Thus, we will be able to implement new results addressing this point timely, allowing us to make a more founded statement about the function of GEF11 (see Response to Reviewer #2).

      Minor comments:

      1. In Figure 2A, pollen germination ratio was not provided for the single mutants gef8-c△3 and gef9-c△

      This is due to the generation process of the CRISPR/Cas9 alleles. These alleles were generated by a construct mutating both genes simultaneously; thus, these mutants are unavailable as single mutant lines. Instead of separating these alleles by outcrossing, we included additional single mutant alleles for both GEFs with a similar deletion. As all these CRISPR/Cas9 mutants have a complete deletion of the GEF-ORF, we are sure about the loss of the according GEF function. Additional alleles account for possible unspecific effects.

      In Figure 3D, the subcellular localization of GEF12GEF8C is fuzzy. Better imaging is needed.

      We agree that the quality of these images is not ideal due to this specific line having less fluorescent signal. We screened for more lines of this construct and already performed more experiments. We will provide better images for this genotype.

      In Figure 3E, it is intriguing that both GEF8-S518A and GEF8-S518D are not associated with the PM in germinating pollen grains. Does it mean that phosphorylation at S518 is not relevant to polar distribution of GEF8?

      We also find this very intriguing as we did not expect this result. However, we interpret it slightly differently in the way that the S518 site is relevant for GEF polarisation, which might be conferred by RLK interaction. We think both mutant forms alter this potential association with RLKs, thus losing polarisation. We will include more imaging experiments of these constructs and additional lines to strengthen our results. Moreover, we generated lines to study these lines' functionality and complementation capacity, which will be included in a revised manuscript.

      T-DNA insertion lines, gef11-t1 and gef12-t1, need to be verified by PCRs in Figure S3D.

      Thanks for pointing this out. This control should be provided, and we will include the verification in the supplement.

      Response to Reviewer #2:

      Like Reviewer #2, we are also very intrigued by the biphasic accumulation of GEFs, as this is an entirely novel feature of this process. Like Reviewer #2, we also interpret this as an exploration and establishment phase, which could help us to understand how the pollen germination site is decided in species without aperture-dependent pollen germination.

      Major comments:

      1. In line 241, the authors conclude that GEF11 has no function in pollen germination. However, it is likely that GEF11 also plays a redundant role as GEF12 does. I recommend the authors check the phenotypes of gef11,gef12 double mutant and gef8,gef9,gef11 triple mutant to confirm that GEF11 has indeed no function. Otherwise, this conclusion should be better rephrased.

      This point is well justified and similar to the comment of Reviewer #1. As stated before, we had to generate additional lines for this. We will analyse an additional gef11 allele, gef8/gef11 and gef9/11 double mutants, and gef9/11/12 triple mutants to address the function of GEF11 in more detail. The conclusions of the original manuscript will, of course, be adjusted according to the new results.

      Although GEF12 is in the cytosol, the strong pollen germination defects in gef8,gef9,gef12 triple mutants do indicate a critical role of GEF12. Is it possible that GEFs could function in the cytosol? The authors can test this possibility by examining the rescuing ability of several constructs that express, for example, GEF12, GEF12(+GEF8C), GEF8(SA), or GEF8(SD) in gef8. The authors may not perform all of these rescue experiments, but some of the mentioned lines are already in hands. They could readily check the phenotypes.

      We thank the Reviewer for this great point. This information is crucial to discriminate the function of the individual GEFs. We have generated new lines expressing some of the mentioned constructs in the gef8 background to address this. We now have lines that complement gef8 with GEF12, GEF12GEF8C, GEF8S518A, GEF8S518D, and GEF8ΔC. We are currently performing experiments which determine the functionality of these constructs, which will allow us to make more conclusive statements about the function of GEFs in the cytosol and how important the PRONE domain alone, or the membrane attachment of GEFs, is for their function.

      The authors conclude that the C-terminus of GEF8 and GEF9 is necessary and sufficient for membrane localization because GEF8/9C can target GEF12 PRONE domain to the membrane. It is intriguing whether the C-terminus alone could confer membrane targeting ability. Currently, it is not fully understood how GEFs localize to the membrane. Examining the localization of GEF8/9C itself would help clarify this and improve our understanding of GEF regulation. Alternatively, the authors may discuss evidence that supports or disagrees with this possibility.

      This is a good suggestion by the reviewer and indeed intriguing if the C-Terminus alone could confer membrane attachment. Meanwhile, we obtained plants expressing such constructs, showing that the C-terminus alone is insufficient for membrane attachment. This is not surprising, as these domains are largely disordered, and we suspect that the context of an adjacent PRONE domain is required to carry out this function. We will include our new results in the revised manuscript.

      Minor comments:

      1. The N- and C-terminus of GEF8 are predicted to inhibit complex formation. How is the prediction performed? Do the authors use monomer prediction or multimer prediction? Alphafold2 has a low accuracy in predicting non-conserved regions. How confident are the predicted inhibitory contacts?

      We used multimer-prediction of Alphafold2 for the shown structures. However, we fully agree that the predicted structures of Alphafold have low accuracy in that regard, especially for disordered domains like this. We will provide confidence models and predicted aligned error (PAE) plots for this structure. Additionally, we will put our conclusions in a better perspective of these structure confidences and tone down our interpretations of this section.

      Localization of ROPs and calcium reporter in Figure 4 appears to be variable. It would help clarify the specific effects on each reporter if the authors present these data more quantitatively.

      We agree with the reviewer that some of the observations are variable. We will provide the data more quantitatively, including overviews of which percentage we observed the described phenomena and a more quantitative analysis of the strength and timing of signal accumulation (see also Response to Reviewer #3).

      Response to Reviewer #3:

      Major points:

      1. One of my major points is that the manuscript is now mainly based on the observations of individual pollen grains. These are then subjected to well-performed image analysis approaches but still represent somewhat anecdotal evidence (Fig 1A, B, Fig 3C-E, etc). The analysis and (numerical) presentation of a more robust data sample (which I presume the authors have acquired) would strengthen the ms considerably. This goes beyond the Figs - e.g. in l. 164-165 authors state rather vaguely, "we observed that mCit-GEF8 and mCit-GEF9 accumulated at a defined region in the cell periphery, which strongly correlated with the future germination site." Here, I would appreciate the data showing the actual correlation, if every germinated pollen grain displays GEF8/9 accumulation, whether there is a population of pollen grains showing the GEF8/9 transient but not germinating, etc...

      We very much appreciate the reviewer's comment, as this version of the manuscript indeed seems like we made our conclusions based on observations made from individual pollen. However, this is not the case. As the reviewer suspected, more data is available but not included in the manuscript. We have multiple observations for each of the shown constructs and only show a representative one. Furthermore, we imaged more pollen germination events of lines that showed variability and included additional lines for some constructs. We will provide a more quantitative analysis of the results to better represent the variability of the individual constructs, and we will adjust the manuscript accordingly (see comment 2).

      Where the authors analyse multiple cells, we are still missing some info - e.g. it is not stated what the error bars in Fig 1C, D represents (SD, SEM, CI?), size of the sample, etc. In any case, it is evident that there is quite substantial variability in the data, which is understandable. Maybe the authors can plot the individual profile lines along the average? Plus, GEF9 seem to have the maximum pre-germination localisation at -5 min rather than -9 min.

      We agree with the Reviewer that information is missing or not obviously stated. We will correct this for the revised manuscript. Moreover, we agree that the suggested way of showing the data would provide more information and allow a better representation of the results and their variability. This can be seen in the reviewer's interpretation of the results of GEF9. In this case, we see some variability in the timing of GEF9 accumulation, leading to the peak maximum shift. In a revised manuscript, we will, as suggested, show the data as individual lines, providing a better representation of the data. Moreover, we will include such representations for other used constructs to provide a general, more quantitative data analysis (see comment 1).

      I know it is very challenging, but the ms would be much stronger with the in vivo imaging of pollen germination on stigmatic papillae (i) GEF8/9 in wt, (ii) gef8/9 double mutant. This would bring crucial data about the role of the GEF polar domain and its functional relation to pollination.

      This would indeed be great to see. We put an effort into establishing such in vivo imaging experiments with our fluorescent markers. However, we cannot image these events in an in vivo setup (at least with our resources). This has two reasons: 1. The events are very fast and limited to a small region at the pollen-papilla contact side, which we have issues resolving optically and timely. 2. The used marker lines only have a low fluorescent level due to the native promoter, and stronger expression would lead to overexpression artefacts. In vitro, it is difficult to see the observed signal accumulation. In the in vivo situation, we are facing additional diffraction of the papilla cells, which would make the observation of GEF accumulation impossible with our microscopes.

      The phylogeny presented in Fig S1 is only rudimental and not very interesting. Given the author's results, I would love to see if GEF8/9 orthologs also exist in species with defined pollen apertures (where establishing a dynamic site makes little sense). The authors touch on this (L409-411), but it would deserve better analysis and discussion.

      We agree with the reviewer that studying GEF function/accumulation in species with aperture-dependent germination would be interesting. However, we can not conclude functional orthologs in other species based on phylogeny. Such phylogenetic analyses were done, for example, by Kim et al. (BMC Plant Biology, 2020, doi: 10.1186/s12870-020-2298-5). The issue is that all Arabidopsis pollen-expressed GEFs form a closed phylogenetic group without allowing the interpretation of which rice homolog is the functional ortholog of the respective Arabidopsis GEF (this is the same for maize). Thus, such phylogenetic analyses are not conclusive, and they would require experimental data to prove orthology. However, we agree that this point can be interpreted and discussed better, and we will include this in the revised manuscript.

      I am not entirely convinced by the authors' interpretation of rather strange S518 mutation data. Could S518A mutation affect overall GEF8 structure/stability?

      We were also suspicious about these results, as they were unexpected (see also Response to Reviewer #1). To confirm these results, we made additional lines for these constructs, double-checked that the constructs were correct and made more observations for both GEF8S18A and GEF8S18D. Additionally, we started investigating the functionality of these constructs and have this data available timely. Preliminary results suggest that the constructs are partial to fully functional compared to the WT GEF8, arguing against these mutations' effect on structure or stability. We will include more data for these constructs in a revised manuscript to allow a more conclusive interpretation of these unexpected observations.

      Although the authors cannot observe the localisation of ROPs in the plasma membrane, they see the apparent accumulation of active ROP marker CRIB4 there - implying that ROPs must localise to the pollen PM at the germination site. This discrepancy should be solved or at least discussed more.

      The reviewer is correct in that we cannot observe ROP accumulation but rather the accumulation of ROP activity (as seen by CRIB4). This is in line with the observation made by Xiang et al. (2023, Plant Physiology, doi: 10.1093/plphys/kiad196), which also cannot find ROP accumulation. We are convinced that ROPs are present at the plasma membrane of the pollen germination site, but no accumulation is observable. We believe this is due to a high mobility of ROPs and that no accumulation is required, as only a few ROPs are sufficient to activate downstream signals. We will discuss these results in more detail in a revised manuscript to better explain the observed discrepancy.

      Given that calcium oscillates very rapidly in pollen and pollen tubes (with frequency ~6-20s), the profound, long-term changes in calcium levels reported by the authors can hardly be referred to as oscillations. The phenomenon observed should again be analysed using a bigger sample.

      We agree that the terminology is not good, as it suggests similarities to the oscillations found in pollen tubes. Thus, we will change the revised manuscript and refer to the changes in Ca2+ levels as “elevations”. Moreover, we will provide a more quantitative analysis and a bigger sample size, as stated in Response to Reviewer #2.

      Minor points:

      1. In Fig 1F, GEF12 also seems to be polarly localised to the future site.

      The chosen sample is not ideal, as it looks like GEF12 would also slightly accumulate. However, as seen in the quantification of this cell, GEF12 does not significantly accumulate at the pollen germination site, and we never observed any accumulation of GEF12 that is comparable to GEF8 or GEF9. We will include another sample of this colocalisation in the revised manuscript to avoid misinterpretation of the data.

      It is difficult to make any assumptions based on the AlphaFold2 predictions without showing their confidence assessments (e.g., PAE plots). The authors state this themselves in the discussion (L. 447-449).

      As the Response to Reviewer #2 stated, we will include structures with confidence values and PAE plots in the supplement. We additionally tone down our interpretation of these structure predictions to make clear that these structures should be interpreted carefully.

      On one hand the authors repeatedly state that pollen GEFs do act in a redundant manner (and provide some evidence for it), on the other hand the absence of an in vivo phenotype for single and double knockout lines and only mild phenotype for a triple ko line does suggest a level of redundancy. This should be rephrased.

      We agree that this is not clearly phrased. In a revised version, we will change the manuscript to indicate which type and level of redundancy are described. We will discriminate between genetic redundancy, as seen in the mild in vivo effects, and non-redundant molecular function, as observed by protein localisation.

    1. Reviewer #1 (Public Review):

      Summary:

      The novel advance by Wang et al is in the demonstration that, relative to a standard extinction procedure, the retrieval-extinction procedure more effectively suppresses responses to a conditioned threat stimulus when testing occurs just minutes after extinction. The authors provide some solid evidence to show that this "short-term" suppression of responding involves engagement of the dorsolateral prefrontal cortex.

      Strengths:

      Overall, the study is well-designed and the results are potentially interesting. There are, however, a few issues in the way that it is introduced and discussed. Some of the issues concern clarity of expression/communication. However, others relate to a theory that could be used to help the reader understand why the results should have come out the way that they did. More specific comments and questions are presented below.

      Weaknesses:

      INTRODUCTION & THEORY

      (1) Can the authors please clarify why the first trial of extinction in a standard protocol does NOT produce the retrieval-extinction effect? Particularly as the results section states: "Importantly, such a short-term effect is also retrieval dependent, suggesting the labile state of memory is necessary for the short-term memory update to take effect (Fig. 1e)." The importance of this point comes through at several places in the paper:

      1A. "In the current study, fear recovery was tested 30 minutes after extinction training, whereas the effect of memory reconsolidation was generally evident only several hours later and possibly with the help of sleep, leaving open the possibility of a different cognitive mechanism for the short-term fear dementia related to the retrieval-extinction procedure." ***What does this mean? The two groups in study 1 experienced a different interval between the first and second CS extinction trials; and the results varied with this interval: a longer interval (10 min) ultimately resulted in less reinstatement of fear than a shorter interval. Even if the different pattern of results in these two groups was shown/known to imply two different processes, there is absolutely no reason to reference any sort of cognitive mechanism or dementia - that is quite far removed from the details of the present study.

      1B. "Importantly, such a short-term effect is also retrieval dependent, suggesting the labile state of memory is necessary for the short-term memory update to take effect (Fig. 1e)." ***As above, what is "the short-term memory update"? At this point in the text, it would be appropriate for the authors to discuss why the retrieval-extinction procedure produces less recovery than a standard extinction procedure as the two protocols only differ in the interval between the first and second extinction trials. References to a "short-term memory update" process do not help the reader to understand what is happening in the protocol.

      (2) "Indeed, through a series of experiments, we identified a short-term fear amnesia effect following memory retrieval, in addition to the fear reconsolidation effect that appeared much later."<br /> ***The only reason for supposing two effects is because of the differences in responding to the CS2, which was subjected to STANDARD extinction, in the short- and long-term tests. More needs to be said about how and why the performance of CS2 is affected in the short-term test and recovers in the long-term test. That is, if the loss of performance to CS1 and CS2 is going to be attributed to some type of memory updating process across the retrieval-extinction procedure, one needs to explain the selective recovery of performance to CS2 when the extinction-to-testing interval extends to 24 hours. Instead of explaining this recovery, the authors note that performance to CS1 remains low when the extinction-to-testing interval is 24 hours and invoke something to do with memory reconsolidation as an explanation for their results: that is, they imply (I think) that reconsolidation of the CS1-US memory is disrupted across the 24-hour interval between extinction and testing even though CS1 evokes negligible responding just minutes after extinction.

      (3) The discussion of memory suppression is potentially interesting but, in its present form, raises more questions than it answers. That is, memory suppression is invoked to explain a particular pattern of results but I, as the reader, have no sense of why a fear memory would be better suppressed shortly after the retrieval-extinction protocol compared to the standard extinction protocol; and why this suppression is NOT specific to the cue that had been subjected to the retrieval-extinction protocol.

      3A. Relatedly, how does the retrieval-induced forgetting (which is referred to at various points throughout the paper) relate to the retrieval-extinction effect? The appeal to retrieval-induced forgetting as an apparent justification for aspects of the present study reinforces points 2 and 3 above. It is not uninteresting but needs some clarification/elaboration.

      (4) Given the reports by Chalkia, van Oudenhove & Beckers (2020) and Chalkia et al (2020), some qualification needs to be inserted in relation to reference 6. That is, reference 6 is used to support the statement that "during the reconsolidation window, old fear memory can be updated via extinction training following fear memory retrieval". This needs a qualifying statement like "[but see Chalkia et al (2020a and 2020b) for failures to reproduce the results of 6]."

      https://pubmed.ncbi.nlm.nih.gov/32580869/<br /> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115860/

      CLARIFICATIONS, ELABORATIONS, EDITS

      (5) The Abstract was not easy to follow:

      5A. What does it mean to ask: "whether memory retrieval facilitates update mechanisms other than memory reconsolidation"? That is, in what sense could or would memory retrieval be thought to facilitate a memory update mechanism?

      5B. "First, we demonstrate that memory reactivation prevents the return of fear shortly after extinction training in contrast to the memory reconsolidation effect which takes several hours to emerge and such a short-term amnesia effect is cue independent (Study 1, N = 57 adults)."<br /> ***The phrasing here could be improved for clarity: "First, we demonstrate that the retrieval-extinction protocol prevents the return of fear shortly after extinction training (i.e., when testing occurs just min after the end of extinction)." Also, cue-dependence of the retrieval-extinction effect was assessed in study 2.

      5C. "Furthermore, memory reactivation also triggers fear memory reconsolidation and produces cue-specific amnesia at a longer and separable timescale (Study 2, N = 79 adults)." ***In study 2, the retrieval-extinction protocol produced a cue-specific disruption in responding when testing occurred 24 hours after the end of extinction. This result is interesting but cannot be easily inferred from the statement that begins "Furthermore..." That is, the results should be described in terms of the combined effects of retrieval and extinction, not in terms of memory reactivation alone; and the statement about memory reconsolidation is unnecessary. One can simply state that the retrieval-extinction protocol produced a cue-specific disruption in responding when testing occurred 24 hours after the end of extinction.

      5D. "...we directly manipulated brain activities in the dorsolateral prefrontal cortex and found that both memory retrieval and intact prefrontal cortex functions were necessary for the short-term fear amnesia."<br /> ***This could be edited to better describe what was shown: E.g., "...we directly manipulated brain activities in the dorsolateral prefrontal cortex and found that intact prefrontal cortex functions were necessary for the short-term fear amnesia after the retrieval-extinction protocol."

      5E. "The temporal scale and cue-specificity results of the short-term fear amnesia are clearly dissociable from the amnesia related to memory reconsolidation, and suggest that memory retrieval and extinction training trigger distinct underlying memory update mechanisms."<br /> ***The pattern of results when testing occurred just minutes after the retrieval-extinction protocol was different from that obtained when testing occurred 24 hours after the protocol. Describing this in terms of temporal scale is unnecessary, and suggesting that memory retrieval and extinction trigger different memory update mechanisms is not obviously warranted. The results of interest are due to the combined effects of retrieval+extinction and there is no sense in which different memory update mechanisms should be identified with retrieval (mechanism 1) and extinction (mechanism 2).

      5F. "These findings raise the possibility of concerted memory modulation processes related to memory retrieval..."<br /> ***What does this mean?

      (6) "...suggesting that the fear memory might be amenable to a more immediate effect, in addition to what the memory reconsolidation theory prescribes..."<br /> ***What does it mean to say that the fear memory might be amenable to a more immediate effect?

      (7) "Parallel to the behavioral manifestation of long- and short-term memory deficits, concurrent neural evidence supporting memory reconsolidation theory emphasizes the long-term effect of memory retrieval by hypothesizing that synapse degradation and de novo protein synthesis are required for reconsolidation."<br /> ***This sentence needs to be edited for clarity.

      (8) "previous behavioral manipulations engendering the short-term declarative memory effect..."<br /> ***What is the declarative memory effect? It should be defined.

      (9) "The declarative amnesia effect emerges much earlier due to the online functional activity modulation..."<br /> ***Even if the declarative memory amnesia effect had been defined, the reference to online functional activity modulation is not clear.

      (10) "However, it remains unclear whether memory retrieval might also precipitate a short-term amnesia effect for the fear memory, in addition to the long-term prevention orchestrated by memory consolidation."<br /> ***I found this sentence difficult to understand on my first pass through the paper. I think it is because of the phrasing of memory retrieval. That is, memory retrieval does NOT precipitate any type of short-term amnesia for the fear memory: it is the retrieval-extinction protocol that produces something like short-term amnesia. Perhaps this sentence should also be edited for clarity.

      I will also note that the usage of "short-term" at this point in the paper is quite confusing: Does the retrieval-extinction protocol produce a short-term amnesia effect, which would be evidenced by some recovery of responding to the CS when tested after a sufficiently long delay? I don't believe that this is the intended meaning of "short-term" as used throughout the majority of the paper, right?

      (11) "To fully comprehend the temporal dynamics of the memory retrieval effect..."<br /> ***What memory retrieval effect? This needs some elaboration.

      (12) "We hypothesize that the labile state triggered by the memory retrieval may facilitate different memory update mechanisms following extinction training, and these mechanisms can be further disentangled through the lens of temporal dynamics and cue-specificities."<br /> ***What does this mean? The first part of the sentence is confusing around the usage of the term "facilitate"; and the second part of the sentence that references a "lens of temporal dynamics and cue-specificities" is mysterious. Indeed, as all rats received the same retrieval-extinction exposures in Study 2, it is not clear how or why any differences between the groups are attributed to "different memory update mechanisms following extinction".

      (13) "In the first study, we aimed to test whether there is a short-term amnesia effect of fear memory retrieval following the fear retrieval-extinction paradigm."<br /> ***Again, the language is confusing. The phrase, "a short-term amnesia effect" implies that the amnesia itself is temporary; but I don't think that this implication is intended. The problem is specifically in the use of the phrase "a short-term amnesia effect of fear memory retrieval." To the extent that short-term amnesia is evident in the data, it is not due to retrieval per se but, rather, the retrieval-extinction protocol.

      (14) The authors repeatedly describe the case where there was a 24-hour interval between extinction and testing as consistent with previous research on fear memory reconsolidation. Which research exactly? That is, in studies where a CS re-exposure was combined with a drug injection, responding to the CS was disrupted in a final test of retrieval from long-term memory which typically occurred 24 hours after the treatment. Is that what the authors are referring to as consistent? If so, which aspect of the results are consistent with those previous findings? Perhaps the authors mean to say that, in the case where there was a 24-hour interval between extinction and testing, the results obtained here are consistent with previous research that has used the retrieval-extinction protocol. This would clarify the intended meaning greatly.

      DATA

      (15) Points about data:

      15A. The eight participants who were discontinued after Day 1 in study 1 were all from the no-reminder group. Can the authors please comment on how participants were allocated to the two groups in this experiment so that the reader can better understand why the distribution of non-responders was non-random (as it appears to be)?

      15B. Similarly, in study 2, of the 37 participants that were discontinued after Day 2, 19 were from Group 30 min, and 5 were from Group 6 hours. Can the authors comment on how likely these numbers are to have been by chance alone? I presume that they reflect something about the way that participants were allocated to groups, but I could be wrong.

      15C. "Post hoc t-tests showed that fear memories were resilient after regular extinction training, as demonstrated by the significant difference between fear recovery indexes of the CS+ and CS- for the no-reminder group (t26 = 7.441, P < 0.001; Fig. 1e), while subjects in the reminder group showed no difference of fear recovery between CS+ and CS- (t29 = 0.797, P = 0.432, Fig. 1e)."<br /> ***Is the fear recovery index shown in Figure 1E based on the results of the 1st test trial only? How can there have been a "significant difference between fear recovery indexes of the CS+ and CS- for the no-reminder group" when the difference in responding to the CS+ and CS- is used to calculate the fear recovery index shown in 1E? What are the t-tests comparing exactly, and what correction is used to account for the fact that they are applied post-hoc?

      15D. "Finally, there is no statistical difference between the differential fear recovery indexes between CS+ in the reminder and no reminder groups (t55 = -2.022, P = 0.048; Fig. 1c, also see Supplemental Material for direct test for the test phase)."<br /> ***Is this statement correct - i.e., that there is no statistically significant difference in fear recovery to the CS+ in the reminder and no reminder groups? I'm sure that the authors would like to claim that there IS such a difference; but if such a difference is claimed, one would be concerned by the fact that it is coming through in an uncorrected t-test, which is the third one of its kind in this paragraph. What correction (for the Type 1 error rate) is used to account for the fact that the t-tests are applied post-hoc? And if no correction, why not?

      15E. In study 2, why is responding to the CS- so high on the first test trial in Group 30 min? Is the change in responding to the CS- from the last extinction trial to the first test trial different across the three groups in this study? Inspection of the figure suggests that it is higher in Group 30 min relative to Groups 6 hours and 24 hours. If this is confirmed by the analysis, it has implications for the fear recovery index which is partly based on responses to the CS-. If not for differences in the CS- responses, Groups 30 minutes and 6 hours are otherwise identical.

      15F. Was the 6-hour group tested at a different time of day compared to the 30-minute and 24-hour groups; and could this have influenced the SCRs in this group?

      15G. Why is the range of scores in "thought control ability" different in the 30-minute group compared to the 6-hour and 24-hour groups? I am not just asking about the scale on the x-axis: I am asking why the actual distribution of the scores in thought control ability is wider for the 30-minute group?

      (16) During testing in each experiment, how were the various stimuli presented? That is, was the presentation order for the CS+ and CS- pseudorandom according to some constraint, as it had been in extinction? This information should be added to the method section.

      (17) "These results are consistent with previous research which suggested that people with better capability to resist intrusive thoughts also performed better in motivated dementia in both declarative and associative memories."<br /> ***Which parts of the present results are consistent with such prior results? It is not clear from the descriptions provided here why thought control ability should be related to the present findings or, indeed, past ones in other domains. This should be elaborated to make the connections clear.

    2. Reviewer #3 (Public Review):

      SUMMARY

      Wang et al. have addressed how acquired fear and extinction memories evolve over time. Using a retrieval-extinction procedure in healthy humans, they have investigated the recovery of fear memories 30-60 minutes., 6 hours, and 24 hours after the retrieval-extinction phase. They have addressed this research question through 3 different experiments which included manipulations of the reminder cue, the time interval, and brain activity. Together, the studies suggest that early on after retrieval-extinction (30-60 min. later), retrieval-extinction may lead to an attenuation of fear recovery (after reinstatement) for all fear cues, as well as the non-reminded ones. Study 3 moreover suggests that this effect may depend on normal dlPFC function. In addition, the paper also contains data in line with prior findings suggesting that a 6-hour interval does not benefit from the reminder cue, and that a 24-hour interval does, and specifically for the reminded fear cue. The latter findings are seen as evidence of fear memory reconsolidation.

      STRENGTHS

      (1) The paper combines three related human fear conditioning studies, each with decent sample sizes. The authors are transparent about the fact that they excluded many participants and about which conditions they belonged to.

      (2) The effect that this paper investigates (short-term fear memory after a retrieval-extinction procedure) has not been studied extensively, thus making it a relevant topic.

      (3) The application of brain stimulation as a means to study causal relationships is interesting and goes beyond the purely behavioral or pharmacological interventions that are often used in human fear conditioning research. Also, the use of an active control stimulation is a strength of the study.

      WEAKNESSES

      (1) The entire study hinges on the idea that there is memory 'suppression' if (1) the CS+ was reminded before extinction and (2) the reinstatement and memory test takes place 30 minutes later (in Studies 1 & 2). However, the evidence supporting this suppression idea is not very strong. In brief, in Study 1, the effect seems to only just reach significance, with a medium effect size at best, and, moreover, it is unclear if this is the correct analysis (which is a bit doubtful, when looking at Figure 1D and E). In Study 2, there was no optimal control condition without reminder and with the same 30-min interval (which is problematic, because we can assume generalization between CS1+ and CS2+, as pointed out by the authors, and because generalization effects are known to be time-dependent). Study 3 is more convincing, but entails additional changes in comparison with Studies 1 and 2, i.e., applications of cTBS and an interval of 1 hour instead of 30 minutes (the reason for this change was not explained). So, although the findings of the 3 studies do not contradict each other and are coherent, they do not all provide strong evidence for the effect of interest on their own.

      Related to the comment above, I encourage the authors to double-check if this statement is correct: "Also, our results remain robust even with the "non-learners" included in the analysis (Fig. S1 in the Supplemental Material)". The critical analysis for Study 1 is a between-group comparison of the CS+ and CS- during the last extinction trial versus the first test trial. This result only just reached significance with the selected sample (p = .048), and Figures 1D and E even seem to suggest otherwise. I doubt that the analysis would reach significance when including the "non-learners" - assuming that this is what is shown in Supplemental Figure 1 (which shows the data from "all responded participants").

      Also related to the comment above, I think that the statement "suggesting a cue-independent short-term amnesia effect" in Study 2 is not correct and should read: "suggesting extinction of fear to the CS1+ and CS2+", given that the response to the CS+'s is similar to the response to the CS-, as was the case at the end of extinction. Also the next statement "This result indicates that the short-term amnesia effect observed in Study 2 is not reminder-cue specific and can generalize to the non-reminded cues" is not fully supported by the data, given the lack of an appropriate control group in this study (a group without reinstatement). The comparison with the effect found in Study 1 is difficult because the effect found there was relatively small (and may have to be double-checked, see remarks above), and it was obtained with a different procedure using a single CS+. The comparison with the 6-h and 24-h groups of Study 2 is not helpful as a control condition for this specific question (i.e., is there reinstatement of fear for any of the CS+'s) because of the large procedural difference with regard to the intervals between extinction and reinstatement (test).

      (2) It is unclear which analysis is presented in Figure 3. According to the main text, it either shows the "differential fear recovery index between CS+ and CS-" or "the fear recovery index of both CS1+ and CS2+". The authors should clarify what they are analyzing and showing, and clarify to which analyses the ** and NS refer in the graphs. I would also prefer the X-axes and particularly the Y-axes of Fig. 3a-b-c to be the same. The image is a bit misleading now. The same remarks apply to Figure 5.

      (3) In general, I think the paper would benefit from being more careful and nuanced in how the literature and findings are represented. First of all, the authors may be more careful when using the term 'reconsolidation'. In the current version, it is put forward as an established and clearly delineated concept, but that is not the case. It would be useful if the authors could change the text in order to make it clear that the reconsolidation framework is a theory, rather than something that is set in stone (see e.g., Elsey et al., 2018 (https://doi.org/10.1037/bul0000152), Schroyens et al., 2022 (https://doi.org/10.3758/s13423-022-02173-2)).

      In addition, the authors may want to reconsider if they want to cite Schiller et al., 2010 (https://doi.org/10.1038/nature08637), given that the main findings of this paper, nor the analyses could be replicated (see, Chalkia et al., 2020 (https://doi.org/10.1016/j.cortex.2020.04.017; https://doi.org/10.1016/j.cortex.2020.03.031).

      Relatedly, it should be clarified that Figure 6 is largely speculative, rather than a proven model as it is currently presented. This is true for all panels, but particularly for panel c, given that the current study does not provide any evidence regarding the proposed reconsolidation mechanism.

      Lastly, throughout the paper, the authors equate skin conductance responses (SCR) with fear memory. It should at least be acknowledged that SCR is just one aspect of a fear response, and that it is unclear whether any of this would translate to verbal or behavioral effects. Such effects would be particularly important for any clinical application, which the authors put forward as the ultimate goal of the research.

      (4) The Discussion quite narrowly focuses on a specific 'mechanism' that the authors have in mind. Although it is good that the Discussion is to the point, it may be worthwhile to entertain other options or (partial) explanations for the findings. For example, have the authors considered that there may be an important role for attention? When testing very soon after the extinction procedure (and thus after the reminder), attentional processes may play an important role (more so than with longer intervals). The retrieval procedure could perhaps induce heightened attention to the reminded CS+ (which could be further enhanced by dlPFC stimulation)?

      (5) There is room for improvement in terms of language, clarity of the writing, and (presentation of the) statistical analyses, for all of which I have provided detailed feedback in the 'Recommendations for the authors' section. Idem for the data availability; they are currently not publicly available, in contrast with what is stated in the paper. In addition, it would be helpful if the authors would provide additional explanation or justification for some of the methodological choices (e.g., the 18-s interval and why stimulate 8 minutes after the reminder cue, the choice of stimulation parameters), and comment on reasons for (and implications of) the large amount of excluded participants (>25%).

      Finally, I think several statements made in the paper are overly strong in light of the existing literature (or the evidence obtained here) or imply causal relationships that were not directly tested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Summary:

      In this paper, the authors performed molecular dynamics (MD) simulations to investigate the molecular basis of the association of alpha-synuclein chains under molecular crowding and salt conditions. Aggregation of alpha-synuclein is linked to the pathogenesis of Parkinson's disease, and the liquid-liquid phase separation (LLPS) is considered to play an important role in the nucleation step of the alpha-synuclein aggregation. This paper re-tuned the Martini3 coarse-grained force field parameters, which allows long-timescale MD simulations of intrinsically disordered proteins with explicit solvent under diverse environmental perturbation. Their MD simulations showed that alpha-synuclein does not have a high LLPS-forming propensity, but the molecular crowding and salt addition tend to enhance the tendency of droplet formation and therefore modulate the alpha-synuclein aggregation. The MD simulation results also revealed important intra- and inter-molecule conformational features of the alpha-synuclein chains in the formed droplets and the key interactions responsible for the stability of the droplets. These MD simulation data add biophysical insights into the molecular mechanism underlying the association of alpha-synuclein chains, which is important for understanding the pathogenesis of Parkinson's disease.

      Strengths:

      (1) The re-parameterized Martini 3 coarse-grained force field enables the large-scale MD simulations of the intrinsically disordered proteins with explicit solvent, which will be useful for a more realistic description of the molecular basis of LLPS.

      (2) This paper showed that molecular crowding and salt contribute to the modulation of the LLPS through different means. The molecular crowding minimally affects surface tension, but adding salt increases surface tension. It is also interesting to show that the aggregation pathway involves the disruption of the intra-chain interactions arising from C-terminal regions, which potentially facilitates the formation of inter-chain interactions.

      We thank the reviewer for pointing out the strengths of our study.

      Weaknesses:

      (1) Although the authors emphasized the advantage of the Martini3 force field for its explicit description of solvent, the whole paper did not discuss the water's role in the aggregation and LLPS.

      We thank the reviewer for pointing this out. We agree that we have not explored or discussed the role of water in aS aggregation or LLPS. We would like to convey that we would like to explore that in detail in a separate study altogether. However we have updated the “Discussion” section with the following lines to convey to the readers the importance water plays in aggregation and LLPS of aS.

      Page 24: “The significance of the solvent in alpha-synuclein (αS) aggregation remains underexplored. Recent studies [26, 55] underscore the pivotal role of water as a solvent in LLPS. It suggests that comprehending the solvent’s role, particularly water, is essential for attaining a deeper grasp of the thermodynamic and physical aspects of αS LLPS and aggregation. By delving into the solvent’s contribution, researchers can uncover additional factors influencing αS aggregation. Such insights hold the potential to advance our comprehension of protein aggregation phenomena, crucial for devising strategies to address diseases linked to protein misfolding and aggregation, notably Parkinson’s disease. Future investigations focusing on elucidating the interplay between αS, solvent (especially water), and other environmental elements could yield valuable insights into the mechanisms underlying LLPS and aggregation. Ultimately, this could aid in the development of therapeutic interventions or preventive measures for Parkinson’s and related diseases.”

      (2) This paper discussed the effects of crowders and salt on the surface tension of the droplets.

      The calculation of the surface tension relies on the droplet shape. However, for the formed clusters in the MD simulations, the typical size is <10, which may be too small to rigorously define the droplet shape. As shown in previous work cited by this paper [Benayad et al., J. Chem. Theory Comput. 2021, 17, 525−537], the calculated surface tension becomes stable when the chain number is larger than 100.

      We appreciate the insightful feedback from the reviewer. However, we would like to emphasize that the αS droplets exhibit a highly liquid-like behavior, characterized by frequent exchanges of chains between the dense and dilute phases, alongside a slow aggregation process. In the study by Benayad et al. (2020, JCTC) [ref. 30], FUS-LCD was the protein of choice at concentrations in the (mM) range. FUS-LCD is known to undergo very rapid LLPS at concentrations lower than 100 (μM) where for αS the critical concentration for LLPS is 500 (μM) and undergoes slower aggregation than FUS. Moreover, the diffusion constant of αS inside newly formed droplets (no liquid to solid phase transition has occurred) has been estimated to be 0.23-0.58 μm2/s (Ray et al, 2020, Nat. Comm.). The value of diffusion constant for FUS-LCD inside LLPS droplets has been estimated to be 0.17 μm2/s (Murthy et al. 2023, Nat. Struct. and Mol. Biol.). These prove that αS forms droplets that are less viscous than that formed by FUS-LCD. This dynamic nature impedes the formation of large droplets in the simulations, making it challenging to rigorously calculate surface tension from interfacial width, which, in turn, necessitates the computation of g(r) between water and the droplet.

      Furthermore, it's essential to note that our primary aim in calculating surface tension was not to determine its absolute value. Rather, we aimed to compare surface tensions obtained for the three distinct environments explored in this study. Hence, our primary objective is to compare the distributions of surface tensions rather than focusing solely on the mean values obtained. The distributions shown in Figure 4a clearly show a trend which we have stated in the article.

      (3) In this work, the Martini 3 force field was modified by rescaling the LJ parameters \epsilon and \sigma with a common factor \lambda. It has not been very clearly described in the manuscript why these two different parameters can be rescaled by a common factor and why it is necessary to separately tune these two parameters, instead of just tuning the coefficient \epsilon as did in a previous work [Larsen et al., PLoS Comput Biol 16: e1007870].

      We thank the reviewer for the comment. We think that the distance of the first hydration layer also should have an impact on aggregation/LLPS. Here we are scaling both the epsilon and sigma. A higher epsilon of water-protein interactions mean higher the energy required for removal of water molecules (dehydration) when a chain goes from the dilute to the dense phase. A higher sigma on the other hand means that the hydration shell will also be at a larger distance making dehydration easier. Moreover, tuning both (either by same or different parameter) required a change of the overall protein-water interaction by only 1%, thereby requiring only considerably minimal change in forcefield parameters (compared to the case where only epsilon is being tuned which required 6-10% change in epsilon from its original values.) . Thus we think one of the ways of tuning water-protein interactions which requires minimal retuning of Martini 3 is by optimizing both epsilon and sigma. However whether a single scaling parameter is good enough requires further exploration and is outside the scope of the current study. More importantly it would introduce another free parameter into the system and the lesser the number of free parameters, the better. For this study, a single parameter sufficed as depicted in Figure 9. To inform the readers of why we chose to scale both sigma and epsilon, we have added the following in the main text:

      Page 25-26: “Increasing the ϵ value of water-protein interactions results in a higher energy demand for removing water molecules (dehydration) as a chain transitions from the dilute to the dense phase. Conversely, a higher σ value implies that the hydration shell will be at a greater distance, facilitating dehydration if a chain moves into the dilute phase. Therefore, adjusting water-protein interactions based on the protein’s single-chain behavior may not significantly influence the protein’s phase behavior. Furthermore, fine-tuning both ϵ and σ parameters only requires a minimal change in the overall protein-water interaction (1%). As a result, this adjustment minimally alters the force field parameters.”

      (4) Both the sizes and volume fractions of the crowders can affect the protein association. It will be interesting to perform MD simulations by adding crowders with various sizes and volume fractions. In addition, in this work, the crowders were modelled by fullerenes, which contribute to protein aggregation mainly by entropic means as discussed in the manuscript. It is not very clear how the crowder effect is sensitive to the chemical nature of the crowders (e.g., inert crowders with excluded volume effect or crowders with non-specific attractive interactions with proteins, etc) and therefore the force field parameters.

      We thank the reviewer for a potential future direction. In this investigation our main focus was to simulate the inertness features of crowders only, to ensure that only entropic effect of the crowders are explored. Although this study focuses on the factors that enable aS to form an aggregates/LLPS under different environmental conditions, it would be interesting to explore in a systematic way the mechanism of action of crowders of varying shapes, sizes and interactions. Therefore we added the following lines in the “Discussion” section to let the readers know that this is also a future prospect of investigation.

      Page 22: “Under physiological conditions, crowding effects emerge prominently. While crowders are commonly perceived to be inert, as has been considered in this investigation, the morphology, dimensions, and chemical interactions of crowding agents with αS in both dilute and dense phases may potentially exert considerable influence on its LLPS. Hence, a comprehensive understanding through systematic exploration is another avenue that warrants extensive investigation.”

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure S1. The title of the figure and the description in the figure caption are inconsistent?

      We thank the reviewer for the comment and we have updated the article with the correct caption.

      (2) Page 14, line 3, the authors may want to provide more descriptions of the "ms1", "ms2", and "ms3" for better understanding.

      We are grateful to the reviewer for pointing this out. We have added a line describing in brief what “ms1”, “ms2” and “ms3” represent. It reads “Subsequent to the investigation, we utilize three representative conformations, each corresponding to one of the macrostates. We designate these macrostates as 1 (ms1), 2 (ms2), and 3 (ms3) (Figure S7)” (Page 28)

      (3) Page 20, the authors may want to briefly explain how the normalized Shannon entropy was calculated.

      We thank the reviewer for pointing this out. This is plain Shannon Entropy and the word “normalized” should not have been there. To avoid confusion we have provided the equation we have used to calculate the Shannon entropy (Eq 8) (Page 21).

      Reviewer #2 (Public Review):

      In the manuscript "Modulation of α-Synuclein Aggregation Amid Diverse Environmental Perturbation", Wasim et al describe coarse-grained molecular dynamics (cgMD) simulations of α-Synuclein (αS) at several concentrations and in the presence of molecular crowding agents or high salt. They begin by bench-marking their cgMD against all-atom simulations by Shaw. They then carry 2.4-4.3 µs cgMD simulations under the above-noted conditions and analyze the data in terms of protein structure, interaction network analysis, and extrapolated fluid mechanics properties. This is an interesting study because a molecular scale understanding of protein droplets is currently lacking, but I have a number of concerns about how it is currently executed and presented.

      We thank the reviewer for finding our study interesting.

      (1) It is not clear whether the simulations have reached a steady state. If they have not, it invalidates many of their analysis methods and conclusions.

      We have used the last 1 μs (1.5-2.5 1 μs) from each simulation for further analysis in this study. To understand whether the simulations have reached steady state or not, we plot the time profile of the concentration of the protein in the dilute phase for all three cases.

      Author response image 1.

      Except for the scenario of only αS (Figures a and b), the rest show very steady concentrations across various sections of the trajectory (Figures c-f). The larger sudden fluctuations observed inFigures a and b are due to the fact that only αS undergo very slow spontaneous aggregation and owing to the fact that the dense phase itself is very fluxional, addition/removal of a few chains to/from the dense to dilute phase register themselves as large fluctuations in the protein concentration in the dilute phase. For the other two scenarios (Figures c-f) aggregation has been accelerated due to the presence of crowders/salt. This causes larger aggregates to be formed. Therefore addition/removal of one or two chains does not significantly affect the concentration and we do not see such sudden large jumps. In summary, the large jumps seen in Figures a and b are due to slow, fluxional aggregation of pure αS and finite size effects. However as these still are only fluctuations, we posit that the systems have reached steady states. This claim is further supported by the following figure where the time profile of a few useful system wide macroscopic properties show no change between 1.5-2.5 µs.

      We also have added a brief discussion in the Methods section (Page 29-30) with these figures in the Supplementary Information.

      Author response image 2.

      “In this study, we utilized the final 1 µs from each simulation for further analysis. To ascertain whether the simulations have achieved a steady state, we plotted the time profile of protein concentration in the dilute phase for all three cases. Except for minor intermittent fluctuation involving only αS in neat water (Figures S8a and S8b), the remaining cases exhibit notably stable concentrations throughout various segments of the trajectory (Figures S8 c-f). The relatively higher fluctuations observed in Figures S8a and b stem from the slow, spontaneous aggregation of αS alone, compounded by the inherently ambiguous nature of the dense phase.

      Consequently, the addition or removal of a few chains from the dense to the dilute phase results in significant fluctuations in protein concentration within the dilute phase. Conversely, in the other two scenarios (Figures S8c-f), aggregation is expedited by the presence of crowders/salt, leading to the formation of larger aggregates. Consequently, the addition or removal of one or two chains has negligible impact on concentration, thereby mitigating sudden large jumps. In summary, the conspicuous jumps depicted in Figures S8a and b arise from the gradual, fluctuating aggregation of pure αS and finite size effects. However, since these remain within the realm of fluctuations, we assert that the systems have indeed reached steady states. This assertion is bolstered by the subsequent figure, where the time profile of several pertinent system-wide macroscopic properties reveals no discernible change between 1.5-2.5 µs (Figures S9).”

      (2) The benchmarking used to validate their cgMD methods is very minimal and fails to utilize a large amount of available all-atom simulation and experimental data.

      We disagree with the reviewer on this point. We have cited multiple previous studies [26, 27] that have chosen Rg as a metric of choice for benchmarking coarse-grained model and have used a reference (experimental or otherwise) to tune Martini force fields. Majority of the notable literature where Rg was used as a benchmark during generation of new coarse-grained force fields are works by Dignon et al. (PLoS Comp. Biol.) [ref. 25], Regy et al (Protein Science. 2021) [ref. 26], Joseph et al.(Nature Computational Science. 2021) [ref. 27] and Tesei et al (Open Research Europe, 2022) [ref. 28]. From a polymer physics perspective, tuning water-protein interactions is simply changing the solvent characteristics for the biopolymer and Rg has been generally considered a suitable metric in the case of coarse-grained model. Moreover we try to match the distribution of the Rg rather than only the mean value. This suggests that at a single molecule level, the cgMD simulations at the optimum water of water-protein interactions would allow the protein to sample the conformations present in the reference ensemble. We use the extensively sampled 70 μs all-atom data from DE Shaw Research to obtain the reference Rg distribution. Also we perform a cross validation by comparing the fraction of bound states in all-atom and cgMD dimer simulations which also seem to corroborate well with each other at optimum water-protein interactions. To let the readers understand the rationale behind choosing Rg we have added a section in the Methods section (Page 25) that explains why Rg is plausibly a good metric for tuning water-protein interactions in Martini 3, at least when dealing with IDPs.

      Our optimized model is further supported by the FRET experiments by Ray et al. [6]. They found that interchain NAC-NAC interactions drive LLPS. Residue level contact maps obtained from our simulations also show decreased intrachain NAC-NAC interactions with an increased interchain NAC-NAC interactions inside the droplet. This corroborates well with the experimental observations and furthermore validates the metrics we have used for optimization of the water-protein interactions. However the comparison with the FRET data by Ray et al. was not present earlier and we have added the following lines in the updated draft.

      Page17: “Thus we observed that increased inter-chain NAC-NAC regions facilitate the formation of αS droplets which also have previously been seen from FRET experiments on αS LLPS

      droplets[6].”

      (3) They also miss opportunities to compare their simulations to experimental data on aSyn protein droplets.

      We thank the reviewer for pointing this out. We have tried to compare the results from our simulations to existing experimental FRET data on αS. Please see the previous response where we have described our comparison with FRET observations.

      (4) Aspects such as network analysis are not contextualized by comparison to other protein condensed phases.

      For a proper comparison between other protein condensed phases, we would require the position phase space of such condensates which is not readily available. Therefore we tried to explain it in a simpler manner to paint a picture of how αS forms an interconnecting network inside the droplet phase.

      (5) Data are not made available, which is an emerging standard in the field.

      We thank the reviewer for mentioning this. We have provided the trajectories between 1.5-2.5 μs, which we used for the analysis presented in the article, via a zenodo repository along with other relevant files related to the simulations (https://zenodo.org/records/10926368).

      Firstly, it is not clear that these systems are equilibrated or at a steady state (since protein droplets are not really equilibrium systems). The authors do not present any data showing time courses that indicate the system to be reaching a steady state. This is problematic for several of their data analysis procedures, but particularly in determining free energy of transfer between the condensed and dilute phases based on partitioning.

      We have addressed this concern as stated previously in the response. We have updated the article accordingly.

      Secondly, the benchmarking that they perform against the 73 µs all-atom simulation of aSyn monomer by Shaw and coworkers provides only very crude validation of their cgMD models based on reproducing Rg for the monomer. The authors should make more extensive comparisons to the specific conformations observed in the DE Shaw work. Shaw makes the entire trajectory publicly available. There are also a wealth of experimental data that could be used for validation with more molecular detail. See for example, NMR and FRET data used to benchmark Monte Carlo simulations of aSyn monomer (as well as extensive comparisons to the Shaw MD trajectory) in Ferrie at al: A Unified De Novo Approach for Predicting the Structures of Ordered and Disordered Proteins, J. Phys. Chem. B 124 5538-5548 (2020)

      DOI:10.1021/acs.jpcb.0c02924

      I note that NMR measurements of aSyn in liquid droplets are available from Vendruscolo: Observation of an α-synuclein liquid droplet state and its maturation into Lewy body-like assemblies, Journal of Molecular Cell Biology, Volume 13, Issue 4, April 2021, Pages 282-294, https://doi.org/10.1093/jmcb/mjaa075.

      In addition, there are FRET studies by Maji: Spectrally Resolved FRET Microscopy of α-Synuclein Phase-Separated Liquid Droplets, Methods Mol Biol 2023:2551:425-447. doi: 10.1007/978-1-0716-2597-2_27.

      So the authors are missing opportunities to better validate the simulations and place their structural understanding in greater context. This is just based on my own quick search, so I am sure that additional and possibly better experimental comparisons can be found.

      We have performed a comparison with existing FRET measurements by Ray et al. (2020) as discussed in a previous response and also updated the same in the article. The doi (10.1007/978-1-0716-2597-2_27) provided by the reviewer is however for a book on Methods to characterize protein aggregates and does not contain any information regarding the observations from FRET experiments. The other doi (https://doi.org/10.1093/jmcb/mjaa075) for the article from Vendrusculo group does not contain information directly relevant to this study. Moreover NMR measurements cannot be predicted from cgMD since full atomic resolution is lost upon coarse-graining of the protein . A past literature survey by the authors found very little scientific literature on molecular level characterization of αS LLPS droplets.

      Thirdly, the small word network analysis is interesting, but hard to contextualize. For instance, the 8 Å cutoff used seems arbitrary. How does changing the cutoff affect the value of S determined? Also, how does the value of S compare to other condensed phases like crystal packing or amyloid forms of aSyn?

      The 8 Å cutoff is actually arbitrary since a distance based clustering always requires a cutoff which is empirically decided. However 8 Å is quite large compared to other cutoffs used for distance based clustering. For example in ref 26, 5 Å was used as a cutoff for calculation of protein clusters. Larger cutoffs will lead to sparser network structures. However we used the same cutoff for all distance based clustering which makes the networks obtained comparable. We wanted to perform a comparison among the networks formed by αS under different environmental conditions.

      Fourthly, I see no statement on data availability. The emerging standard in the computational field is to make all data publicly available through Github or some similar mechanism.

      We thank the reviewer for pointing this out and we have provided the raw data between 1.5-2.5 μs for each scenario along with other relevant files via a zenodo repository (https://zenodo.org/records/10926368).

      Finally, on page 16, they discuss the interactions of aSyn(95-110), but the sequence that they give is too long (seeming to contain repeated characters, but also not accurate). aSyn(95-110) = VKKDQLGKNEEGAPQE. Presumably this is just a typo, but potentially raises concerns about the simulations (since without available data, one cannot check that the sequence is accurate) and data analysis elsewhere.

      This indeed is a typographical error. We have updated the article with the correct sequence. The validity of the simulations can be verified from the data we have shared via the zenodo repository (https://zenodo.org/records/10926368).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary:

      In this manuscript, Fister et. al. investigate how amputational and burn wounds affect sensory axonal damage and regeneration in a zebrafish model system. The authors discovered that burn injury results in increased peripheral axon damage and impaired regeneration. Convincing experiments show altered axonal morphology and increased Ca2+ fluxes as a result of burn damage. Further experimental proof supports that early removal of the burnt tissue by amputation rescues axonal damage. Burn damage was also shown to markedly increase keratinocyte migration and increase localized ROS production as measured by the dye Pfbsf. These responses could be inhibited by Arp 2/3 inhibition and isotonic treatment. 

      Strengths: 

      The authors use state-of-the-art methods to study and compare transection and burn-induced tissue damage. Multiple experimental approaches (morphology, Ca2+ fluxing, cell membrane labeling) confirm axonal damage and impaired regeneration time. Furthermore, the results are also accompanied by functional response tests of touch sensitivity. This is the first study to extend the role of tissue-damage-related osmotic exposure beyond wound closure and leukocyte migration to a novel layer of pathology: axonal damage and regeneration. 

      Weaknesses: 

      The conclusions of the paper claiming a link between burn-induced epithelial cell migration, spatial redox signaling, and sensory axon regeneration are mainly based on correlative observations. Arp 2/3 inhibition impairs cell migration but has no significant effect on axon regeneration and restoration of touch sensitivity. 

      We agree with the reviewer. We have tried many experiments to address this question. The data show that Arp 2/3 inhibition with CK666 is an effective way to inhibit initial keratinocyte migration. However, later migration still proceeds. What is interesting is that just inhibition of the early migration is sufficient to restore localized ROS production in the wound area in the first  hour post-burn, even if this is not sufficient to prevent ROS accumulation over time. There is also a trend toward improved sensory neuron function late after this early treatment. However, this is not statistically significant. We think it is likely that both migration and tissue scale ROS influence the regeneration defect of sensory neurons after burn. The data using isotonic solution supports this conclusion. We have tried many other ways to limit keratinocyte migration including depletion of talin and expression of a dominant negative Rac in basal epithelial cells, but these treatments were not compatible with survival of the fish after burn.

      Pharmacological or genetic approaches should be used to prove the role of ROS production by directly targeting the known H2O2 source in the system: DUOX. 

      We agree that pharmacologic or genetic approaches to directly manipulate ROS production would provide substantial support to the hypothesis that ROS, along with keratinocyte migration, is a main factor contributing to poor burn outcomes. To address this, we first tried using a morpholino to deplete DUOX. However, the combination of DUOX morpholino and burn injury was lethal to larvae. We also used pharmacologic inhibition of ROS production using DPI (Diphenyleneiodonium). With this treatment, ROS is inhibited for only the first hour post-burn as treatment is lethal for longer periods of time. Burned larvae have marginally improved axon density and touch sensitivity, suggesting the importance of ROS in burn outcomes, however it was not statistically significant. It is likely that an increased effect would be observed with longer treatment, but treatment for more than 1 hour was toxic. We have added a supplemental figure with this new DPI data.

      While the authors provide clear and compelling proof that osmotic responses lie at the heart of the burn-induced axonal damage responses, they did not consider the option of further exploring any biology related to osmotic cell swelling. Could osmotic ATP release maybe play a role through excitotoxicity? Could cPLA2 activation-dependent eicosanoid production relate to the process? Pharmacological tests using purinergic receptor inhibition or blockage of eicosanoid production could answer these questions. 

      We agree that the role of osmotic cell swelling in the burn response is an interesting avenue for future study. However, we make use of isotonic treatment in this study specifically for its effect on keratinocyte migration and broad-scale wound healing. As a result, we feel that pursuing the biology of this swelling phenomenon is outside the scope of this paper.

      The authors provide elegant experiments showing that early removal of the burnt tissue can rescue damage-induced axonal damage, which could also be interpreted in an osmotic manner: tail fin transections could close faster than burn wounds, allowing for lower hypotonic exposure time. Axonal damage and slow regeneration in tail fin burn wounds could be a direct consequence of extended exposure time to hypotonic water. 

      We have done experiments using FM dye to test how long it takes burn and transection wounds to close (shown below). In these experiments, dye entry into wounded tissue is used as a readout of wound closure. Dye is only able to enter wounded tissue when the epithelial barrier is disrupted. Our data reveal that transections take approximately 10 minutes to fully close, while burns take approximately 20 minutes to close.

      Author response image 1.

      To test if this difference in wound closure time would have an effect on axon outcomes, we repeated, but slightly modified, the dual-wound experiment. We increased the amount of time the burn condition was exposed to hypotonic conditions by 10 additional minutes (by transecting burned tissue at 15 minutes post burn, shortly before closure) and compared axon outcomes to the 5 mpw control transection. These results show there was no difference in axon regeneration or function when secondary transection was performed at 5 or 15 minutes post burn, suggesting that increased exposure to hypotonic solution is not the reason for defects in axon outcomes after burn injury.

      Author response image 2.

      Reviewer #2 (Public Review): 

      This is an interesting study in which the authors show that a thermal injury leads to extensive sensory axon damage and impaired regrowth compared to a mechanical transection injury. This correlates with increased keratinocyte migration. That migration is inhibited by CK666 drug treatment and isotonic medium. Both restrict ROS signalling to the wound edge. In addition, the isotonic medium also rescues the regrowth of sensory axons and recovery of sensory function. The findings may have implications for understanding non-optimal re-innervation of burn wounds in mammals. 

      The interpretation of results is generally cautious and controls are robust. 

      Here are some suggestions for additional discussion: 

      The study compares burn injury which produces a diffuse injury to a mechanical cut injury which produces focal damage. It would help the reader to give a definition of wound edge in the burn situation. Is the thermally injured tissue completely dead and is resorbed or do axons have to grow into damaged tissue? The two-cut model suggests the latter. Also giving timescales would help, e.g. when do axons grow in relation to keratinocyte movement? An introductory cartoon might help. 

      We thank the reviewer for these insightful comments and questions. The burn wound is defined as the area that is directly damaged as a result of increased heat (labeled by FM dye entry), and the burn wound edge as the first line of healthy cells adjacent to the burned cells. These definitions have been added to the text to clarify the areas referenced. Recent experiments lead us to believe the wound area is composed almost completely of dead cells, but we are currently working to discover the fate of these dead cells as well as the wound adjacent cells that migrate to the wound edge after burn. As a result, we do not know whether axons grow into damaged tissue or if the damaged tissue is extruded, but we do see growth cone formation within a few hours after wounding suggesting the axons are actively trying to regenerate after a burn.

      Could treatment with CK666 or isotonic solution influence sensory axons directly, or through other non-keratinocyte cell types, such as immune cells? 

      We have done experiments looking at the density of caudal fin innervation in CK666, isotonic, or DPI treated fins. The axon density is unchanged in all these treatments compared to control treated larvae, so we do not believe these treatments affect axon health homeostatically. These data have been added to supplemental figure 3. Additionally, one of the benefits of the larval zebrafish burn model is the simplicity of the system – the epidermis is primarily composed of sensory axons, mesenchymal cells and keratinocytes. The burn environment is proinflammatory so it does promote immune cell recruitment, but we do not believe the immune cells are interacting directly with sensory axons besides clearing axonal debris. Previous papers by our lab have shown that peak immune cell recruitment occurs at 6 hpw, but they localize to the damaged tissue in the burn area and not the wound edge.

      Reviewer #3 (Public Review): 

      Fister and colleagues use regeneration of the larval zebrafish caudal fin to compare the effects of two modes of tissue damage-transection and burn-on cutaneous sensory axon regeneration. The authors found that restoration of sensory axon density and function is delayed following burn injury compared to transection. 

      The authors hypothesized that thermal injury triggers signals within the wound microenvironment that impair sensory neuron regeneration. The authors identify differences in the responses of epithelial keratinocytes to the two modes of injury: keratinocytes migrate in response to burn but not transection. Inhibiting keratinocyte migration with the small-molecule inhibitor of Arp2/3 (CK666) resulted in decreased production of reactive oxygen species (ROS) at early, but not late, time points. Preventing keratinocyte migration by wounding in isotonic media resulted in increased sensory function 24 hours after burn. 

      Strengths of the study include the beautiful imaging and rigorous statistical approaches used by the authors. The ability to assess both axon density and axon function during regeneration is quite powerful. The touch assay adds a unique component to the paper and strengthens the argument that burns are more damaging to sensory structures and that different treatments help to ameliorate this. 

      A weakness of the study is the lack of genetic and cell-autonomous manipulations. Additional comparisons between transection and burns, in particular with manipulations that specifically modulate ROS generation or cell migration without potentially confounding effects on other cell types or processes would help to strengthen the manuscript.

      The use of genetic and cell-autonomous approaches would strengthen our study, however, we were unable to do this due to the lethality of these genetic approaches (or cell autonomous approaches). Basal epithelial migration is necessary for embryonic development. We attempted to circumvent this by generation of larvae transiently expressing a dominant-negative form of Rac, a protein crucial to the migratory process. The chimeric expression of the dominant negative Rac was either damaging to the larvae or the mosaicism was too low to observe any effects on migration phenotype.

      We also attempted a genetic approach to manipulate ROS production, as discussed above. We found that the DUOX morpholino was lethal to burned larvae. Finally, we attempted pharmacological inhibition of ROS production using the inhibitor DPI (Diphenyleneiodonium). With this treatment, burned larvae have marginally improved axon density and touch sensitivity, suggesting that dampening ROS may improve outcome. The DPI data have been added to the manuscript.

      In terms of framing their results, the authors refer to "sensory neurons" and "sensory axons" throughout the text - it should be made clear what type of neuron(s)/axon(s) are being visualized/assayed. Along these lines, a broader discussion of how burn injuries affect sensory function in other systems - and how the authors' results might inform our understanding of these injury responses - would be beneficial to the reader. 

      In summary, the authors have established a tractable vertebrate system to investigate different sensory axon wound healing outcomes in vivo that may ultimately allow for the identification of improved treatment strategies for human burn patients. Although the study implicates differences in keratinocyte migration and associated ROS production in sensory axon wound healing outcomes, the links between these processes could be more rigorously established. 

      The inconsistency between “neuron” and “axon” has been noted and the text has been corrected accordingly. “Neuron” is used when referring to the cell as a whole, while “axon” is used when referring to the sensory processes in the caudal fin. We added information about burn in the introduction as suggested: “While epithelial tissue is well adapted to repair from mechanical damage, burn wounds heal poorly. Thermal injury results in chronic pain and lack of sensation in the affected tissue, suggesting that an abnormal sensory neuron response contributes to burn wound pathophysiology.”

      We thank the reviewer’s for their comments.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      Suggested experiments: 

      (1) ROS measurements with the dye Pfbsf should be validated with more established ROS probes such as HyPer. 

      Pfbsf has been used previously as a readout of ROS production, and its use is documented in zebrafish (Maeda et al., Angew Chem Int Ed Engl, 2004, and Niethammer et al, Nature, 2009). These sources have been added as references when introducing Pfbsf to provide context for its use. The probe was validated and compared to HyPer in Niethammer’s 2009 paper. In our hands, we have used both probes and have similar results with tail transection.

      (2) To better support claims on ROS and H2O2 playing a central role in mediating axonal damage, the authors should consider pharmacological approaches such as rescue experiments with H2O2 and experiments using inhibitors such as DPI ar apocynin. 

      While the above reagents and drugs have limitations and non-specific side effects, more convincing proof could result from genetic approaches including experiments on DOUX knockdown or knockout lines. 

      To further dissect the role of ROS in the burn response, we conducted experiments using DPI, a potent ROS inhibitor that is well-documented in the literature. We found that 20 uM treatment of DPI (1 hour pretreatment, 1 hour post-burn) marginally improved axon density when quantified 24 hpw. Any higher dose, when in combination with a burn, proved to be lethal. Longer treatment with DPI was also not tolerated.

      In addition to experiments with DPI, we attempted to burn larvae that were injected with DUOX morpholino. The combined use of burn and DUOX MO was lethal. We have dampened the conclusions and include the new data with the DPI in the revised manuscript.

      Minor corrections: 

      (1)A phrase/expression in the abstract is confusing: isotonic treatment does not "induce osmotic regulation". Cells exposed to hypo- or hypertonicity will respond by regulatory volume decrease or increase, respectively. Isotonic treatment maintains homeostasis. 

      We appreciate this point and agree with the distinction. Revisions have been made in the text accordingly.

      (2) Figures 4E and 5E would be better to show as an average of multiple experiments with statistical significance. 

      The purpose of figures 4E and 5E are to demonstrate changes in fluorescence intensity and localization of ROS using the representative time series shown in 4D and 5D. The figure legend has been updated accordingly.

      Reviewer #2 (Recommendations For The Authors): 

      Figure 3D How can one distinguish between the two cellular elements that randomly meet or that there is actual coordination? Can the interactions be quantified? It is also unclear what the authors mean by "sensory neuron movement". The authors show that the neuronal cell bodies stay in their position, so only the axons change position. Do they do this by growth, i.e. the neuronal growth cones follow the keratinocytes or do keratinocytes displace the axon shafts? 

      We have included supplemental movies that address this question in the new uploaded document. Figure 3D is comprised of still images taken from supplemental movie 2, which is a timelapse of keratinocytes/axons moving together after a burn injury.  This movie clearly shows keratinocytes and their ensheathed axons moving simultaneously, so keratinocytes are mechanically pulling sensory axon shafts with them. We have revised the text to say axon movement, not sensory neuron movement.

      Over the time course of axonal movement (1 hour post-burn), it is not possible that neuronal growth cones contribute to movement, as this is too slow – previous work by other labs has shown that it takes several hours for axons to fully regenerate into amputated tissue, with movement not even noticeable until about 3 hours post-wound (Rieger and Sagasti, PLOS Biology, 2011).

      Regarding the second point, “neuron” vs. “axon” is an inconsistency in the text that has been corrected. “Neuron” is used when referring to the cell as a whole, “axon” is used when referring to the processes that innervate the caudal fin. The axons are physically pulled along with keratinocytes as they migrate after burn application. From our observations, growth cones appear closer to the wound site after the movement has stopped.

      Figure 4G It is surprising that the visual differences in the distribution of values are not statistically significant. 

      The distribution of values in 4G was large and that is why there is no statistically-significant difference – we were also surprised at this result. We did all statistics with a statistician and this included rigorous criteria for significance.

      Figure 4H The images seem to show a difference, whereas the quantification does not. I suggest choosing more representative images. 

      Figure 4H has been updated to include a more representative image of axon patterning with CK666 treatment.

      Figure 6A The text states that axon damage in the control and isotonic condition is comparable, yet in the image, it appears that the damage in the isotonic treatment at 0 hpw is more distal. 

      This is a good observation that we consistently see in isotonic-treated fish after burn. Axon damage localizes more proximally in isotonic-treated samples because the keratinocytes distal to the notochord are likely dead, and the axons innervating those cells are likely immediately destroyed upon burn application. As a result, the distal axons are not present to express GCaMP. We believe isotonic treatment allows keratinocytes to live slightly longer, so axon damage is therefore prevented for longer. This is also the focus of continuing work to further understand the burn microenvironment.

      Finally, the materials section could mention bias mitigation measures, e.g. withholding the treatment condition from the experimenter in the touch test. 

      We minimized bias in experiments whenever possible, and the conservative statistical measures that were applied to our data further reduce the likelihood of false significance.

      Reviewer #3 (Recommendations For The Authors): 

      - Line numbers would have facilitated reviewer feedback. 

      - Supplementary movies were missing in the submission. 

      The lack of supplementary movies upon submission was a mistake and the movies have been uploaded along with the revised manuscript.

      Introduction: 

      - Pg. 3: "In response to tissue damage, sensory neurons undergo rapid and localized axonal degeneration 4,5." Not sure reference 4 (Reyes et al) is appropriate here as this study was not in the context of tissue damage. 

      We have revised this section as suggested by the reviewer.

      Results: 

      - The expected expression pattern/localization of several transgenes was unclear. Please clearly state what cell type(s) each should label. For example, pg. 5 - "We next sought to further investigate sensory neuron function in burned tissue. For this, we assessed wound-induced axonal damage using zebrafish larvae that express the calcium probe GCaMP." Where is GCaMP expressed? 

      The manuscript has been updated to include expression patterns for the included transgenes – in this mentioned case, GCaMP is expressed in neurons under the pan-neuronal Elavl3 promoter.

      - Introducing the GCaMP labeling could use some clarification. Pg. 5 - "As shown previously by other groups, GCaMP labels degenerating neurons in real time35." This is confusing. Do the authors mean that GCaMP increases immediately prior to Wallerian degeneration as shown by Vargas et al. (PMID: 26558774)? 

      Sustained elevated calcium levels are associated with axon damage. Previous work from other labs has shown that calcium influx follows axon injury (Ziv and Spira, EJN 1993, Adalbert et al., Neuroscience 2012). In these experiments, whenever there are CGaMP-positive punctae, this indicates axon damage. We have revised the manuscript to address this critique.

      The Elavl3-GCaMP5 transgenic line will label when calcium levels increase in neurons. However, given the parameters used for imaging in our study (20x magnification, 100 ms exposure, and collection speed every 30 seconds for timelapses), we believe that only sufficiently large increases in calcium that are indicative of cell damage, and not physiological function, are being visualized.

      - Figure 1E - Are these panels images of the same fish? Please specify in the legend. 

      Figure 1E is comprised of one transected and one burned larva each, live-imaged over the course of six hours. The legend has been updated to include this information.

      - Figure 1F - How was the damage area measured? Consider doing this measurement over time to match Figure 1E. 

      Axon damage area measurements were performed similar to axon density measurements – maximum intensity z-projected confocal images of the caudal fin were generated using FIJI. For all experiments, the caudal fin area posterior to the notochord was outlined using the Polygon tool and measured to obtain a total surface area ROI. Axon fragments inside the outlined area were manually thresholded so all fragments posterior to the notochord were labeled and no saturated pixels were present, and an area measurement of these thresholded pixels was taken. We have added a section describing these measurements in the Methods section under “Axon damage quantification.”

      - Pg. 5 - When introducing the ngn1 MO - please state the expected phenotype and cite the appropriate background literature_._ 

      The ngn1 morpholino was cited in the Methods section with the appropriate literature (Cornell and Eisen, Development, 2002), from which we got the morpholino sequence. We thank the reviewer for pointing out the need for more introduction and clarification in the main text, so the ngn1 morpholino has been discussed in greater depth and cited in the main text as well using the same citation.

      - The two-wound model is an elegant approach but could be more clearly described in the main text. 

      An improved explanation of the two-wound experiment has been added to the text.

      - For Figure 3, it would be helpful to have a schematic of the anatomy illustrating the relative positions of axons and epidermal cell types. 

      - Figure 3C - should an additional control here be transected? Given that the krt4:lifeact transgene labels both layers of the epidermis, how were the superficial and basal keratinocytes separated? Interpretation of this section should be carefully worded. The authors state that "...suggesting that the superficial keratinocytes are being pulled by the motile basal keratinocytes" (pg.7 ) but isn't another possibility that the superficial cells are stationary? 

      It is correct that the krt4:lifeact transgene labels both layers of keratinocytes, which together span 20-30 microns. These layers were separated from the same z-stack collected by confocal imaging. The first z-slice and last z-slice of the same stack were separated using FIJI and pseudocolored to appear as different colors. This clarification has been added to the Methods.

      Prior observations with the krt4:lifeact and krt4:utrch (figure 3A) transgenic lines reveal that both keratinocyte layers will move distally after burn application.

      - Pg. 7 - "The axons of sensory neurons are ensheathed within actin-rich channels running through basal keratinocytes 50,51." ref 51 is a C. elegans paper which does not have basal keratinocytes.

      This was in error. The correct reference has replaced reference 51 (O’brien, J Comp. Neurol., 2012), in which electron microscopy is used to document the development of two layers of epithelial cells that also ensheath sensory neurons in a protective manner similar to glial cells in the central nervous system.

      - Figures S1E and F - the authors state that RB and DRG soma don't move. However, it was unclear from the figure panels and legend whether the authors imaged neurons that actually innervate the caudal fin (rather than some other region of the animal). Please clarify. For comparison, Fig S1F needs a pre-injury image to be meaningful. 

      The imaged cell bodies were those in the posterior trunk region, which are responsible for innervating the posterior sections of the fish including the caudal fin. From our observations, there was no movement of neuronal cell bodies after the burn.

      - Figure 5 title - can the authors clarify what aspect of this figure relates to "sustained epidermal damage" 

      The figure 5 title has been updated in response to the reviewer comments.

      - Figure 6 - is touch sensitivity really "restored" as the authors suggest? Alternatively, sensitivity may never be lost in isotonic treatment. Or the loss may be delayed? 

      We have modified the text accordingly by updating our phrasing – “restored” has been replaced with “improved” to indicate benefit over time.

      - Can the authors further disentangle the effects of keratinocyte migration, ROS, and isotonic treatment on axon regeneration? For example, would the addition of CK666 to the Isotonic +1 hpw treatment improve axon regeneration? Can the authors directly manipulate ROS signaling (e.g., through exogenous addition of H2O2 or duox1 MO) to alter regeneration outcomes in their wounding assays? 

      See the comments above.

      - Figure 6 title - consider removing or clarifying the word "excessive" here 

      The title has been revised according to the reviewer suggestion.

      - hpw vs hpb were used inconsistently throughout the text 

      The manuscript has been revised to use “hpw” when referring to the timeframe after injury application.

      Methods: 

      - Zebrafish transgenics are missing allele names 

      References: 

      - Many mistakes were noted in this section e.g., journal names missing, wrong authors, typos, DOIs misformatted 

      The references section has been corrected to use formatting consistent with APA citation and eLife preferred guidelines.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Review, 3D SIM + AO, Wang and coworkers

      In this manuscript, Wang and coworkers report an upright 3D SIM system with adaptive optics (AO) correction. They demonstrate that AO improves imaging into thick 3D samples, including Drosophila larval brain. They also explore the use of remote focusing with their setup. The authors clearly demonstrate a gain with AO, and we are convinced that the microscope they build offers some utility over existing state of the art, particularly in samples thicker than a single cell. That said, we have concerns with the manuscript that we would like to see addressed before recommending publication:

      • Given the emphasis on super-resolution imaging deep inside a sample, we were surprised to see no mention of other forms of structured illumination that allow super-resolution imaging in samples thicker than a single cell. These include the 'spot-scanning' implementations of SIM that offer better imaging at depth by virtue of pinholes, and include MSIM, iSIM, and rescan confocal technologies. The two-photon / AO implementation of iSIM seems particularly germane, e.g. https://pubmed.ncbi.nlm.nih.gov/28628128/ Please consider citing these works, as they help place the existing work into context.
      • As we're sure the authors appreciate, besides aberrations, a major additional obstacle to 3D SIM in thick tissues is the presence of out-of-focus background. Indeed, this point was mentioned by Gustafsson in his classic 2008 paper on 3D SIM (https://pubmed.ncbi.nlm.nih.gov/18326650/): 'The application area of three-dimensional structured illumination microscopy overlaps with that of confocal microscopy, but the two techniques have different and complementary strengths. Structured illumination microscopy offers higher effective lateral resolution, because it concentrates much of the excitation light at the very highest illumination angles, which are most effective for encoding high-resolution information into the observed data, whereas confocal microscopy spreads out its illumination light more or-less uniformly over all available angles to form a focused beam. For very thick and compactly fluorescent samples, however, confocal microscopy has an advantage in that its pinhole removes out-of focus light physically. Structured illumination microscopy is quite effective at removing out-of-focus light computationally, because it is not subject to the missing-cone problem, but computational removal leaves behind the associated shot noise. Therefore confocal microscopy may be preferable on very thick and dense samples, for which the in-focus information in a conventional microscope image would be overwhelmed by out-of-focus light, whereas structured illumination microscopy may be superior in a regime of thinner or sparser samples.' This point is not mentioned at all in the manuscript, yet we are certain it is at least partially responsible for the residual image artifacts the authors mention. Please discuss the problem of out of focus light on 3D samples, particularly with an eye to the 'spot-scanning' papers mentioned above.
      • The authors use a water dipping lens, yet they image into samples that are mounted on coverslips, i.e. they use a dipping lens to image through a coverslip: see attached pdf for reference

      This almost certainly introduces spherical aberration, which the authors seem to observe: see attached pdf for reference

      We find this troubling, as it seems that in the process of building their setup, the authors have made a choice of objective lens that introduces aberrations - that they later correct. At the very least, this point needs to be acknowledged in the manuscript (or please correct us if we're wrong) - as it renders the data in Figs. 3-4 somewhat less compelling than if the authors used an objective lens that allowed correction through a coverglass, e.g. a water dipping lens with a correction collar. In other words, in the process of building their AO setup, the authors have introduced system aberrations that render the comparison with 3D SIM somewhat unfair. Ideally the authors would show a comparison with an objective lens that can image through a glass coverslip. - The authors tend to include numbers for resolution without statistics. This renders the comparisons meaningless in my opinion; ideally every number would have a mean and error bar associated with it. We have included specific examples in the minor comments below. - In Fig. 5, after the 'multipoint AO SIM', the SNR in some regions seems to decrease after AO: see attached pdf for reference

      Please comment on this issue.

      • Please provide timing costs for the indirect AO methods used in the paper, so the reader understands how this time compares to the time required for taking a 3D SIM stack. In a similar vein, the authors in Lines 213-215, mention a 'disproportionate measurement time' when referring to the time required for AO correction at each plane - providing numbers here would be very useful to a reader, so they can judge for themselves what this means. What is the measurement time, why is it so long, and how does it compare to the time for 3D SIM? It would also be useful to provide a comparison between the time needed for AO correction at each (or two) planes without remote focusing (RF) vs. with RF, so the reader understands the relative temporal contributions of each part of the method. We would suggest, for the data shown in Fig. 5, to report a) the time to acquire the whole stack without AO (3D SIM only); b) the time to acquire the data as shown; c) the time to acquire the AO stack without RF. This would help bolster the case for remote focusing in general; as is we are not sure we buy that this is a capability worth having, at least for the data shown in this paper.
      • Some further discussion on possibly extending the remote focusing range would be helpful. We gather that limitations arose from an older model of the DM being used, due to creep effects. We also gather from the SI that edge effects at the periphery of the DM was also problematic. Are these limitations likely non-issues with modern DMs, and how much range could one reasonably expect to achieve as a result? We are wondering if the 10 um range is a fundamental practical limitation or if in principle it could be extended with commercial DMs.

      Minor comments

      • The paper mentions Ephys multiple times, even putting micromanipulators into Fig. 1 - although it is not actually used in this paper. If including in Figure 1, please make it clear that these additional components are aspirational and not actually used in the paper.
      • The abstract mentions '3D SIM microscopes', 'microscopes' redundant as the 'm' in 'SIM' stands for 'microscope'.
      • 'fast optical sectioning', line 42, how can optical sectioning be 'fast'? Do they mean rapid imaging with optical sectinong?
      • line 59, 'effective imaging depth may be increased to some extent using silicone immersion objectives', what about water immersion objectives? We would guess these could also be used.
      • line 65 - evidence for 'water-dipping objectives are more sensitive to aberrations' ? Please provide citation or remove. They are certainly more prone to aberrations if used with a coverslip as done here.
      • 'fast z stacks' is mentioned in line 103. How fast is fast?
      • line 116 'we imaged 100 nm diameter green fluorescent beads'. Deposited on glass? Given that this paper is about imaging deep this detail seems worth specifying in the main text.
      • lines 127-130, when describing changes in the bead shape with numbers for the FWHM, please provide statistics - quoting single numbers for comparison is almost useless and we cannot conclude that there is a meaningful improvement without statistics.
      • In the same vein, how can we understand that remote focus actually improves the axial FWHM of the widefield bead? Is this result repeatable, or it just noise?
      • line 155, 'Because of the high spatial information...' -> 'Because of the high resolution spatial information...'
      • When quoting estimated resolution #s from microtubules (lines 158-163) similarly please provide statistics as for beads.
      • It seems worth mentioning the mechanism of AO correction (i.e. indirect sensing) in the main body of the text, not just the methods.
      • How long do the AO corrections take for the datasets in the paper?
      • Were the datasets in Fig. 2-4 acquired with remote focusing, or in conventional z stack mode? Please clarify this point in the main text and the figure captions.
      • It would be helpful when showing z projections in Figs. 3-5 to indicate the direction of increasing depth (we assume this is 'down' due to the upright setup, but this would be good to clarify)
      • line 174, 'showed significant improvements in both intensity and contrast after reconstruction' - we see the improvements in contrast and resolution, it is harder to appreciate improvements in intensity. Perhaps if the authors showed some line profiles or otherwise quantified intensity this would be easier to appreciate.
      • line 195 'reduced artefacts' due to AO. We would agree with this statement - the benefit from AO is obvious, and yet there are still artefacts. If the authors could clarify what these (residual) artefacts are, and their cause (out of focus light, uncorrected residual aberrations, etc) this would be helpful for a reader that is not used to looking at 3D SIM images.
      • Line 197, 'expected overall structure', please clarify what is expected about the structure and why.
      • Line 199, what is a 'pseudo structure'?
      • Fig. 4B, 'a resolution of ~200 nm is retained at depth', please clarify how this estimate was obtained, ideally with statistics.
      • Fig. 4D, please comment on the unphysical negative valued intensities in Fig. 4D, ideally explaining their presence in the caption. It would also be helpful to highlight where in the figure these plots arise, so the reader can visually follow along.
      • Line 245, 'rapid mitosis'. What does rapid mean, i.e. please provide the expected timescale for mitosis.
      • For the data in Fig. 6, was remote refocusing necessary?
      • What is the evidence for 'reduced residual aberrations', was a comparative stack taken without AO? In general we feel that the results shown in Fig. 6 would be stronger if there were comparative results shown without AO (or remote focusing).
      • Line 350, 'incorporation of denoising algorithms' - citations would be helpful here.
      • Line 411, 'All three were further developed and improved' - vague, how so?
      • Sensorless AO description; how many Zernike modes were corrected?
      • Multi-position aberration correction. Was the assumption of linearity in the Zernike correction verified or met? Why is this a reasonable assumption?
      • Fig. S1B is not useful; if the idea is to give a visual impression of the setup, we would recommend providing more photos with approximate distances indicated so that the reader has a sense of the scale of the setup. As is - it looks like a photograph of some generic optical setup.
      • SI pattern generation - 'the maximum achievable reconstruction resolution was only slightly reduced to about 95% of the theoretical maximum'. We don't understand this sentence, as the resolution obtained on the 100 nm beads is considerably worse than 95% of the theoretical maximum. Or do the authors mean 95% of the theoretical maximum given their pitch size of 317 nm for green and 367 nm for red? SI Deformable mirror calibration

      'spanning the range [0.1, 0.9]' - what are the units here?

      What are the units in Fig. S5C, S5D?

      It would be useful to define 'warmup' also in the caption of SI Fig. S6A. SI Remote Focusing, 'four offsets, {-5 mm, -2.5 mm, 2.5 mm, 5 mm}...' are the units mm or um? '...whereas that of the 10 beads was...' here, do the authors mean the position of the beads derived from the movement of the piezo stage, as opposed to the remote focusing? The authors refer to the 'results from Chapter 3.2'. What are they talking about? Do they mean a supplementary figure, or earlier supplementary results? In general, we found the discussion in this paragraph difficult to follow. Supplementary Fig. 9 seems to be not referred to anywhere in the text. - Since the paper emphasizes 3D SIM, OTFs along the axial direction would also be useful to show, in addition to the lateral OTFs shown in Fig. 2D. - When the sample is moved by the piezo, the axial phase of the 3D-SIM illumination pattern is stable as the sample is scanned through the illumination pattern. When remote focusing is performed, the sample is always stable so the axial phase of the 3D-SIM illumination pattern is presumably changing with remote focusing. Can the authors clarify if the 3D SIM illumination pattern is scanned when remote focusing is applied, or is the intensity pattern stable in z? - In Supplementary Fig. 9, primary spherical is referred to twice, both at index 11 and 22. The latter is presumably secondary spherical? - we do not understand the x axis label, in Fig. S4D, is it really [0, 50, 50, 50] as written? see attached pdf for reference

      Referee Cross-Commenting

      I don't have much to add; the other reviewers raise good points and I think it would be good if the authors could respond to their feedback in a revised manuscript.

      Significance

      Nearly all fluorescence images deteriorate as a function of depth. Methods to ameliorate this depth-dependent degradation are thus of great practical value, as they improve the information content of images and thus (hopefully) biological insight. In this work, the authors develop a method to improve super-resolution imaging (3D SIM) at depth, by combining it with adaptive optics.

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:

      This study provides negative in vivo evidence for the use of two PERK inhibitors and of TUDCA for the treatment of Sli1-related Marinesco-Sjögren syndrome (MSS).

      Overall, the manuscript reports a substantial amount of work and the study could be published in its present format. The experiments are well described in terms of methodology and appropriate analysis has been applied. Claims are proportionate and not overstated

      I would have only minor comments related to some clarifications that the authors could make in the present manuscript and a suggestion for experiments that could improve the manuscript.

      First, although this is not my expertise, the in vitro analysis of CHOP luciferase assays suggests that very high concentrations, in particular of TUDCA, are needed to observe an effect. The authors may wish to clarify their opinion and whether this could be the reason why in vivo they have been unable to obtain any inhibition of the PERK pathway.

      The reviewer is correct in pointing out that high concentrations of trazodone, DBM and TUDCA were required to inhibit the PERK pathway in the CHOP::luciferase reporter cell lines. However, as we state in the Discussion, we do not think that their lack of effect in vivo was due to insufficient drug levels, since woozy mice were treated with trazodone, DBM or TUDCA according to dose regimens and administration routes that have proved effective in other neurodegenerative disease mouse models. Moreover, our analysis did not find major differences in drug bioavailability between mice with the woozy genetic background (CXB5/ByJ) and C57BL/6J mice in which these drugs had shown neuroprotective effects (see also the response to the next point).

      Second, it seems to me that when measuring the Trazodone metabolism there is a difference between acute and chronic treatment. It would be worth discussing what the authors make of that and what is more relevant (I assume chronic) to the disease model outcome.

      We realized that the nomenclature used in Figures 6 and 7 was confusing, leading the reader to think there were differences in trazodone levels between chronically and acutely treated mice.

      The experiment shown in Figure 6 was designed to test whether there were differences in trazodone pharmacokinetics and metabolism between mice of the woozy strain, which have the CXB5/ByJ genetic background, and C57BL/6J mice in which trazodone had shown neuroprotective effects in previous studies. In contrast, Figure 7 illustrates the levels of trazodone and m-CPP in control and woozy mice (both of which have the CXB5/ByJ genetic background) that had been chronically treated with trazodone for 5 weeks. These are the same animals as in Figure 3, as we state in Figure 7 legend. Therefore one should compare the levels of trazodone and m-CPP in Figure 7 with those of the "woozy" group (CXB5/ByJ genetic background) in Figure 6. This comparison shows that trazodone and m-CPP levels are comparable after chronic and acute (6h) treatment.

      To avoid confusion, we have changed the mouse nomenclature. We have renamed the control group of mice as "CT" (previously "WT") throughout the text and figures. In Figure 6, we have used CXB5/ByJ instead of "woozy" to emphasize the comparison between the different genetic backgrounds (CXB5/ByJ vs C57BL/6J). Finally, we have replaced the colors of symbols in Figure 7 in order to match those of Figure 3. We have also made the description and discussion of these results clearer in the revised manuscript.

      With respect to the experiments a simple and informative addition would be the evaluation of the PERK pathway in mice treated with TUDCA, as this is missing.

      The effect of TUDCA treatment on the PERK pathway is shown in Figure 5, where we measured CHOP mRNA levels in Purkinje cells microdissected from mice treated with 0.4% TUDCA in the chow, and in Figure 9C and D, where we measured the percentage of CHOP-immunopositive Purkinje cells in the cerebellum of same groups of mice by immunohistochemistry.

      Figure 10 illustrates the results of an additional experiment in which woozy mice were treated with 500 mg/kg TUDCA intraperitoneally (ip), to test whether this alternative dosing regimen was any better. Like the treatment per os, TUDCA ip had no beneficial effect on motor dysfunction. Therefore we deemed it unnecessary to check the effect on PERK pathway inhibition in this group of mice.

      A more difficult but potentially more interesting line of investigation is that of searching for potential actions of Trazodone that are PERK independent and might be responsible for the partial rescue observed in the beam walking test, which is much more sensitive and specific than rotarod, so worth considering. Assuming authors want to go down this route and add significance to their study my suggestion would be an unbiased RNA seq from the brain samples they already have. However, this is a suggestion to steer the study towards a more positive outcome and it is not necessary to support their current conclusions.

      We agree with the reviewer that it would be interesting to investigate the mechanism by which trazodone slightly ameliorated the motor performance of woozy mice in the beam walking test. In the Discussion, we speculated that this could be due to an effect of trazodone on cerebellar serotonergic neurotransmission, which would require electrophysiological investigations to demonstrate. Of course, other mechanisms may also be operative, which RNA seq may help identify, as the reviewer suggests. However, this would be a complex and lengthy investigation, the results of which would not change the main conclusions of the present paper. We plan to explore this line of investigation in a future study.

      Reviewer #2:

      Lavigna et al. described the effect of Trazodone in Marinesco-Sjögren syndrome model mice. Although the results are somewhat disappointing, this research has provided fundamental evidence for the future development of MSS therapeutics. There are few minor comments to further improve the manuscript

      Major comment<br /> P14<br /> "Trazodone metabolism to m-CPP was slightly impaired in woozy mice compared to C57BL/6J mice. This was evident from the m-CPP/trazodone ratio, calculated on the AUC0-t in the plasma, which was 0.34 in woozy and 0.67 in C57BL/6J mice."

      Why was the concentration different between WT and woozy mice? Which organ mainly contributes to the metabolism of trazodone? Is the function of this target organ different between WT and woozy mice?<br /> Similar to trazodone, m-CPP clearance from plasma was slightly faster in woozy than in C57BL/6J mice.<br /> Is m-CPP eliminated via the kidney? Or liver? Why is there a difference? Does SIL1 functions in liver or kidney? Needs discussion. This is the same for brain m-CPP levels.

      As explained in the response to the second comment of reviewer #1, "woozy" in Figure 6 referred to mice with the CXB5/ByJ genetic background, and in this experiment we compared trazodone pharmacokinetics and metabolism between CXB5/ByJ and C57BL/6J mice. We have modified the nomenclature of Figure 6 and the Results to make this clear.

      Trazodone undergoes extensive hepatic metabolism, and only a small percentage is excreted unchanged in the urine. Metabolism involves hydroxylation, oxidation and dealkylation reactions, forming in particular the 5HT-active metabolite m-CPP (by CYP3A4). This and other metabolites are mainly excreted in urine, as conjugates [1-3]. The slight differences in trazodone pharmacokinetics and metabolism between the CXB5/ByJ and C57BL6/J mice shown in Figure 6 is not attributable to loss of SIL1 function, since both groups of mice carried wild-type Sil1 alleles, but is most likely due to subtle differences between the two strains, for example in the binding to plasma proteins, metabolic enzymes, transporters and/or the excretion processes. The available data do not allow to clarify this issue.

      The main point, however, is that no major differences were found in the plasma and brain concentrations of trazodone between these two strains of mice, which could have explained the lack of efficacy of trazodone in woozy mice, as we now further stress in the revised Discussion.

      Minor comments

      P3 L5 mutation should be variant.

      This has been changed.

      P4 L1 eIF2a-P should be phosphorylated eIF2α (p-eIF2α). The reviewer prefers (p-eIF2α) than (eIF2α-p) throughout the manuscript.

      There is no standard rule for indicating phosphorylated proteins, and phosphorylated eIF2α is referred to in various ways in different papers, with the "p" in capital or lowercase, preceding or following the protein name, separated by a dash or not. We would prefer to maintain the current nomenclature for consistency with our previous publications, unless the Editor deems otherwise.

      P9 L11 M-CPP should be fully spelled out the first time it appears. m-Chlorophenylpiperazine (m-CPP)

      M-CPP is spelled out the first time it appears in the Material and Methods, subheading Drug treatments and bioanalysis.

      Please explain the difference between the expected function of trazodone and its metabolite m-CPP. Why m-CPP is not effective.

      Based on the observation that mice of the woozy strain had lower brain levels of m-CPP than C57BL6/J mice (Figure 6), we hypothesized that the lack of effect of trazodone in woozy mice could be due to m-CPP mediating the PERK signaling inhibitory activity of trazodone. However, experiments in CHOP::luciferase reporter cells demonstrated that m-CPP does not inhibit PERK signaling (Figure 2D). The precise mechanism by which trazodone inhibits PERK signaling is not known [4], which makes it difficult to speculate why its main metabolite, m-CPP, does not exhibit this activity.

      P11 L3 Fig. 3 Fig. 3A and B?

      Yes, we specifically refer to panels A and B of Figure 3. We have indicated this in the revised manuscript.

      P11 L6 at 7 weeks of age?

      We have re-done the statistical analysis by two-way ANOVA and reported the results in the legend to Figure 3. There is a significant difference between vehicle- and trazodone-treated woozy mice in the number of missteps when the two groups are compared globally. No statistically significant difference in the number of missteps is detected at specific time points by post-hoc analysis. There is no statistically significant difference between vehicle- and trazodone-treated woozy mice in the time to traverse the beam. The Results section has been revised accordingly.

      P12 L17 ~4 times, 4 times? Please state the exact value.

      Done.

      Figure 7 Why are brain m-CPP levels higher than plasma levels? Is trazodone metabolized in brain tissue?

      Trazodone is extensively metabolized in the liver through Cytochrome P450 (Rotzinger et al., 1999). It is well documented that m-CPP readily passes the blood-brain barrier, much better than the parent compound, explaining its high brain levels [2].

      P19 L7 ISRIB; please fully spell out the first time it appears.

      Done.

      References

      1. Rotzinger S, Bourin M, Akimoto Y, Coutts RT, Baker GB (1999) Metabolism of some “second”- and “fourth”-generation antidepressants: iprindole, viloxazine, bupropion, mianserin, maprotiline, trazodone, nefazodone, and venlafaxine. Cell Mol Neurobiol 19:427– 442. https://doi.org/10.1023/a:1006953923305
      2. Caccia S, Ballabio M, Samanin R, Zanini MG, Garattini S (1981) (--)-m-Chlorophenyl- piperazine, a central 5-hydroxytryptamine agonist, is a metabolite of trazodone. J Pharm Pharmacol 33:477–478. https://doi.org/10.1111/j.2042-7158.1981.tb13841.x
      3. DeVane CL, Boulton DW, Miller LF, Miller RL (1999) Pharmacokinetics of trazodone and its major metabolite m-chlorophenylpiperazine in plasma and brain of rats. Int J Neuropsychopharm 2:17–23. https://doi.org/10.1017/S1461145799001303
      4. Halliday M, Radford H, Zents KAM, Molloy C, Moreno JA, Verity NC, Smith E, Ortori CA, Barrett DA, Bushell M, Mallucci GR (2017) Repurposed drugs targeting eIF2alpha-P-mediated translational repression prevent neurodegeneration in mice. Brain 140:1768– 1783. https://doi.org/10.1093/brain/awx074
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Author responses


      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      In their manuscript, Dutta and colleagues compared the meiotic recombination landscapes between five budding yeast species. In the first part of the work, the authors constructed a high-resolution map of meiotic recombination events in Kluyveromyces lactis supported by high-quality genome assemblies for two strains of this yeast. Then, partially repeating their CO and NCO mapping strategy, they compared a number of meiotic recombination parameters between the five species (sometimes three, depending on the quality of the data for each species). They particularly focused on key parameters for meiotic recombination, such as crossover interference and homeostasis and obligate crossover. Although the analysis is interesting, it is underdeveloped in many places and lacks the general conclusions regarding the evolution of recombination and the broader perspective that would be expected from a comparison of these phenomena in budding yeasts.

      [R] Tackling the evolution of recombination is ambitious. Here, with a dataset of five species, it is hard to draw strong evolutionary conclusions besides the variations in the crossover (CO) landscapes and the control of CO formation that we observed, which is already significant. The multiple losses of CO interference that we describe here may constitute our strongest evolutionary conclusion. It potentially underscores the minor evolutionary advantage associated to CO interference at least in budding yeasts. In this context, we changed the title to be more factual and updated the text to better highlight the significance and implications of our findings.

      Major comments:

      The authors indicate that the distribution of hotspots and coldspots is not preserved between species, but this finding is not properly documented. I think it would be useful to include recombination maps in a main figure for all species (or at least for S. cerevisiae, K. lactis and L. waltii) with the elements highlighted. This will allow for a visual illustration of the variability in the recombination landscape between the studied species. [R] The genomes of the species show blocks of synteny but overall, they are not collinear and therefore, it is not possible to have a direct comparison of the recombination maps. In our previous work, we have highlighted the non-conservation of CO hotspots between S. cerevisiae, L. kluyveri and L. waltii (Brion et al. 2017; Dutreux et al. 2023). Briefly, we retrieved conserved syntenic blocks in L. kluyveri and L. waltii genomes containing at least two S. cerevisiae orthologs associated with one hotspot. L. waltii shares only five out of the 92 S. cerevisiae crossover hotspots (RHO5, SLS1, GYP6, OLE1 and MRPL8), while L. kluyveri shares only one. L. waltii and L. kluyveri share no crossover hotspots. In addition, our current study shows that none of the K. lactis hotspot is conserved in any of the four other species (response figure 1 and new supplementary figure S11).

      Response Figure 1. Density of crossovers along the genome using a 5 kb window in the S. cerevisiae genome (Mancera et al. 2008; Oke et al. 2014; Krishnaprasad et al. 2015 combined dataset). Horizontal dotted green line represents crossover hotspot significance threshold. Solid spheres represent the conserved CO hotspots with either L. kluyveri (red) or L. waltii (blue). None of the 92 S. cerevisiae crossover hotspot is conserved in L. lactis.

      Although analyses analogous to those presented in Fig. S5 had already been published in other comparisons of the recombination landscape in yeast (e.g. Dutreux et al., 2023), I think that Figs. S5A and S5B are worth to be presented in the main figures (not supplementary data). In many species of eukaryotes, the detection of NCOs is practically impossible, therefore only results for COs are presented. Therefore, it is perhaps also worth discussing the fact that the relationship applies to all recombination events and not only COs, and therefore is related to the regulation of DSBs frequency and not individual DSBs repair pathways.

      [R] Figures S5A-B are now included in the main figure, Figure 2B. The association holds true for all total recombination (CO+NCO) events as well, new supplementary figure S6A.

      The authors find that CO coldspots were associated with DNA repair genes. Unfortunately, an equivalent analysis was not performed for all recombination events (CO + NCO). I presume this approach is based on the belief that COs are more mutagenic than NCOs. However, recent studies in humans suggest that, at least in mammals, meiotic DSBs themselves are mutagenic, regardless of the pathway used for their repair (Hinch et al., Science 2023). Therefore, I would suggest repeating the analysis also considering NCOs (although I am aware that the picture of NCOs may be incomplete). I would also like to see some graphical representation of the analysis. Is it possible to perform a classic analysis of coldspots/hotspot enrichment in relation to gene ontology?

      [R] As suggested, we performed the analysis to independently detect coldspots for all recombination events (CO+NCO). Based on a threshold of

      In relation to the previous point - it may be worth repeating this type of analysis also for other yeasts used in this study, or at least for S. cerevisiae, to be able to consider the extent to which this relationship is universal and dependent on the meiotic DSB repair pathway.

      [R] The analysis regarding the CO coldspots has been performed in the other species as well. As mentioned in the main text, although some overlap between CO coldspots and DNA repair genes has been observed in the other species as well, we observed a significant enrichment in K. lactis only, maybe because the dataset is larger than in the other species.

      In Fig. S7, the point where WGD occurred is marked in the wrong place, or at least that is what the sentence in the text says ("The Lachancea and Kluyveromyces species branched from the Saccharomyces lineage more than 100 million years ago, before to the ancestral whole-genome duplication (WGD) event specific of the S. cerevisiae lineage").

      [R] We regret the oversight and have corrected the figure.

      The result presented in Fig. S8 is interesting and should be shown in the main figures. Perhaps it would be worth adding an illustration illustrating simple versus complex COs.

      [R] The old Figure S8 is now a part of main Figure 2C-D with the illustrations describing the CO types.

      The last part of the results includes an analysis of the evolutionary rates of the ZMM genes. In the discussion, the authors should also refer the results of this analysis to the previous analysis of the overrepresentation of DNA repair genes in recombination coldspots. I understand that ZMM are not DNA repair proteins in the strict sense, but I think it is worth familiarizing readers with the authors' view on this matter. Moreover, I would suggest showing where MLH1 and MLH3 are located on the plot in Fig. 6 (especially the meiosis-specific MLH3), whether the selection pressure acts on them as on ZMM proteins, or rather as on DNA repair proteins. Showing the SLX4 and MUS81 would also be interesting.

      [R] Figure 6 has been updated as suggested and now shows the Mlh1, Mlh3, Slx4 and Mus81 dN/dS values for the three species.

      I feel like the discussion is underdeveloped. I missed a deeper summary of the comparison between meiotic recombination among the tested budding yeasts in the context of the presence and absence of functional ZMM. Even the title of the work is not properly developed in the manuscript text. The analysis shows that it is not the presence of a functional ZMM pathway or its lack that introduces differences between the individual recombination landscapes, although ZMM determines the presence of proper CO interference. With the caveat that for L. kluyveri it is basically unknown whether it has a functional ZMM or not. Maybe confirming the lack of expression of some ZMM genes in meiosis of this species would answer the question of how it should be treated?

      [R] We agree with this reviewer that our original title was imprecise, so we changed it to be more factual, emphasizing on the multiple losses of crossover interference in budding yeasts. As stated above, it potentially underscores the minor/negligible evolutionary advantage associated to CO interference at least in budding yeasts. From there, it is hard to draw deeper conclusions since the actual roles/functions of CO interference are still under debate, notably in yeasts where the CO frequency tends to be high. We improved the discussion to better highlight these points.

      We also agree that a deeper characterization of the ZMM factors persisting in the non-Saccharomyces yeasts would be informative, but we believe it is beyond the scope of the current manuscript and more suitable for a follow up work. However, our recent publication about L. kluyveri (Legrand et al 2024) shows that Zip3 is properly expressed in meiosis and behaves as in S. cerevisiaesince it is located at DSB sites. Furthermore, we have unpublished transcriptomic data (Response Figure 2) showing that all the ZMM genes from L. kluyveri are specifically induced in meiosis (fold increase >16 at least compared to pre-sporulation conditions). Therefore, so far, although the level of CO interference in L. kluyveri is minimal, there is no indication that the ZMM genes are mis regulated.

      Response Figure 2. Transcriptomic data showing that all the ZMM genes from L. kluyveri are specifically induced in meiosis (Unpublished data from Llorente Lab, CRCM, Marseille).






      Minor comments:

      In general, Figure captions are imprecise, many of them lack clear information explaining what is depicted. Authors should remember that figure legends should be self-sufficient. [R] The figure legends have been updated and are now self-sufficient.

      In the revised manuscript, I would suggest placing figure numbers on the figures and using line numbering, which would facilitate the reception of the work and possible reference to its individual elements in the review.

      [R] We regret the omission. Figure numbers, Line numbers and Page numbers have been added.

      Reviewer #1 (Significance (Required)):

      The study provides a new insight into the variation in recombination landscape within budding yeast species with a special emphasis on crossover control. This includes also de novo assemblies of Kluyveromyces lactis genome and high-resolution tetrad-based maps of meiotic recombination events. Previously, recombination maps of different yeast species were compared, however this study focuses on budding yeasts, some of which lost ZMM pathway and differ in some crossover parameters, like interference and homeostasis. Although the analysis is interesting, it lacks the general conclusions regarding the evolution of recombination and the broader perspective that would be expected from a comparison of these phenomena in budding yeasts.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      This paper describes the genome-wide mapping of meiotic recombination in non-Saccharomyces yeast, Kluyveromyces lactis. By using heterologous parental strains, the authors mapped crossovers (COs) and noncrossovers (NCOs) on the genome of K. lactis which lacks proteins necessary for CO formation such as S. cerevisiae, mammals and plants. This is an extension of previous works by the authors' group which mapped CO and NCO in different yeast, Lachancea kluyveri and L. waltii by a similar approach. The authors found that CO frequencies in K. lactis are much lower than those in S. cerevisiae and COs showed weaker interference, which facilitates the non-random distribution of COs along a chromosome. Overall, the experiments and informatic analyses have been done in good quality and the results are convincing. The paper provides additional new information on the landscape of meiotic recombination in different yeast species. These results are of great interest to researchers in the field of meiotic recombination and evolution of meiosis. There are some issues that the authors may be able to address before the publication.

      Major points: While the authors noted that K. lactic shows the loss of a pro-CO factors (ZMM protein), Spo16, and Msh5 (due to the introduction of an in-frame stop codon), it still possesses other proteins such as Zip1, Zip2, Zip3, Zip4/Spo22, Mer3, and Msh4. It is still likely that these pro-CO factors control CO formation (and interference) in this yeast. It would be nice for the authors to study whether the knockout of these genes is dispensable for CO formation and interference in meiosis. A similar analysis should be done for L. kluyveri which retains all ZMM genes, but this is clearly out of the scope of this paper.

      [R] The question of the functions of the remaining ZMM factors is indeed interesting and related to point #8 from reviewer 1 (please see above). Although this is beyond the scope of our work, we would like to refer here to work from Amy McQueen's lab using L. lactis Zip1 in S. cerevisiae (Voelkel-Meiman 2015). This study shows that L. lactis Zip1 does not allow synaptonemal complex assembly in S. cerevisiae but allows CO formation independently of the Msh4/5 complex but that depend on Zip2/4/Spo16 and Mlh1/3 for their resolution. Overall, these results suggests that L. lactis Zip1 at least retained ancestral functions shared with S. cerevisiae Zip1. However, it is not possible to conclude if the lack of full complementation of L. lactis Zip1 in S. cerevisiae comes from functional divergence or simply by the inability of L. lactis Zip1 to function properly in a heterologous context.

      Minor points:

      No page number, no main Figure number. It is hard to review this paper. [R] We regret the oversight. Figure numbers, Line numbers and Page numbers have been added.

      References: In some cases, in the Introduction, the authors referred to review papers such as Pyatnitskaya et al. (2019) for ZMM proteins while in the other parts, they referred to original papers; for example, three papers for Mlh1-Mlh3. If the number of references is not limited, original papers should be cited in the text.

      [R] We regret this omission. Original papers have now been included in the citations.

      Figure 3A, page 9, second paragraph: When the authors compared CO and NCO densities, it would be nice to show P-values for the comparison.

      [R] p-values have now been added to the updated figure.

      Please show a ratio of CO to NCO in each yeast in Figure 3B in the second paragraph of page 9 in the main text.

      [R] The ratios have now been included in the figure for both the CO:NCO ratios and CO:corrected_NCO ratios, in the main text and figure legends.

      Figure S5 and page 7, the first paragraph and page 9, third paragraph: CO/NCO densities (negative correlation to chromosome sizes) in S. cerevisiae should be checked with or without short chromosomes (I, III, and VI), which show very unique regulation of meiotic DSB formation (see Murakami et al. Nature 2020).

      [R] Even excluding the small chromosomes, the size dependent trend persists for S. cerevisiae and S. paradoxus.

      Table S7: Please add the S. cerevisiae gene name such as ZIP1 next to S. cerevisiae orthologs such as YDR285W. Moreover, please explain the column in detail or clarify the data. What does "meiosis" mean here? For example, YJL074C is SMC3, which is expressed in mitosis as well as in meiosis. The same is true for YGL163C, which is RAD54, which plays a minor role in meiosis, but plays a critical in mitotic DSB repair.

      [R] We corrected Table S7 as desired by systematically including the standardized gene names.

      The Gene Ontology (GO) annotation is a statement about the function of a particular gene. It offers a structured framework and a comprehensive set of concepts to describe the functions of gene products across all organisms. It is specifically crafted to support the computational representation of biological systems. In our specific case, we only looked at genes with the gene ontology annotation "meiosis". Together, these statements comprise a "snapshot" of current biological knowledge and is by no means absolute. This has been detailed in the supplementary Table S7.

      Reviewer #2 (Significance (Required)):

      This study provides the landscape of meiotic recombination in non-Saccharomyces yeast, Kluyveromyces lactis. The genome-wide recombination map in K. lactis shows lower crossover frequencies with weaker crossover interference than those in S. cerevisiae. Overall, the experiments and informatic analyses have been done in good quality and the results are convincing. The paper provides additional new information on the landscape of meiotic recombination in different yeast species, particularly in terms of the evolution of meiotic recombination. These results are of great interest to researchers in the field of meiotic recombination and evolution of meiosis.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Dutta et al. have compiled a genome-wide meiotic recombination map for Kluyveromyces lactis and compared it to a compilation of meiotic recombination maps for four other species, two of which (Lachancea kluyveri and Lachancea waltii), like K. lactis, predate the genome duplication event that produced the other two (Saccharomyces cerevisiae and S. paradoxus). Meiosis in many species studied (including metazoans and plants) shows control over the number and distribution of crossovers, which are critical for faithful chromosome segregation during meiosis. This takes the form of crossover interference, where crossovers are spaced more evenly than expected by chance, and crossover homeostasis, where many fewer chromosomes lack a crossover than is expected by chance. While both of the post-duplication species show both crossover interference and homeostasis, none of the pre-duplication species show crossover homeostasis, and crossover interference is very weak. In two cases (K. lactis and L. waltii), this can be explained by mutational loss of a few of the genes (called the ZMM genes) that promote meiotic crossovers in many species. However, L. kluyveribehavior cannot be explained in this way. Recombination hotspots are present but are not shared between the pre-duplication species or between the pre- and post-duplication species, perhaps not surprising for species that diverged more that 100 million years ago. Overall, this work will be a useful contribution to our understanding of the different possible flavors of meiotic recombination mechanisms and control that are possible (and, one might add, promote long-term species viability). A) Evaluation, reproducibility and clarity The work presented in this paper is straightforward and unimpeachable and will largely be of interest to those studying meiotic recombination, be it mechanistic studies or studies of the implications for population genetics. The analysis is technically correct, although there are some aspects where a slightly different emphasis should be considered (see comments below). However, the data and the analysis could stand as they currently are, without further revision.

      Suggestions are below. 1. (trivial) it would have been useful if pages and lines were numbered.

      [R] We regret the oversight. Figure numbers, Line numbers and Page numbers have been added.

      "Across the 205 meioses...". In general, it would be desirable to apply compensation for the fact that NCOs and COs are differently detected. Since, in K. lactis, 35% of COs are not accompanied by detectable gene conversion, it seems reasonable to apply a correction to measured NCOs here and throughout the paper, regardless of the species. For example, if one assumes that 35% of NCOs are not detected, how does this affect estimates of chromosomes that do not appear to have undergone interhomolog recombination? Estimates of CO/NCO bias? In a similar vein, if the CO event is not considered (just the conversion events associated with it), how does this affect measures of conversion tract lengths in COs and NCOs?

      [R] We thank the reviewer for this suggestion. We have performed the correction for the NCO estimates as described in Mancera et al. 2008, on a per tetrad basis across all the species. The fraction of missed NCOs were 7%, 34%, 30%, 23% and 25% respectively for S. paradoxus, S. cerevisiae, K. lactis, L. waltii and L. kluyveri. The fraction of missed NCOs depend upon the parental marker density. In addition, we performed the CO:NCO bias analysis both with the detected and the corrected NCO frequencies and the trends remain unchanged (Now included in figure 3). Finally, we refrain from using the corrected NCO frequencies while reporting the NCO frequencies (Table 1, main text) to maintain uniformity with our previous work and since, these corrections do not alter any results.

      It might be useful to report recombination event frequencies in terms of events/chromosome, as this, rather than event/unit distance, is functionally more relevant. In the same vein, it might be useful to consider total event homeostasis, in addition to just crossover homeostasis.

      [R] This has been updated as suggested. .

      An interesting observation is that two of the three pre-duplication species clearly at one time had a full complement of ZMM genes but lost some due to mutation. Have there ever been attempts to detect either synaptonemal complex or axial elements in these species?

      [R] This is related to point #8 from reviewer 1 and to the major point of reviewer 2 (please see above).

      To our knowledge, cytological observations of synaptonemal complex (SC) or axial elements have been performed in L. kluyverionly by us and the SC is clearly visible (Legrand et al 2024).

      However, it is key to remind here that K. lactis axis protein encoding genes HOP1 and RED1 have been cloned by the Roeder's lab by functional complementation of S. cerevisiae corresponding mutants, supporting the functional conservation of these genes (Smith and Roeder 2000). Finally, as mentioned above, K. lactis Zip1 retained at least some function of the ancestral Zip1 protein that are also shared by the S. cerevisiae protein (Voelkel-Meiman 2015).

      The observation of elevated evolutionary rates in ZMM genes is also intriguing, but it would help if "dN/dS ratio" was defined.

      [R] It is now defined in the text.

      The observation of frequent E0 chromosomes is taken to suggest efficient achiasmate segregation; has the "corrected" NCO frequency been considered? Do the different frequencies of E0 chromosomes predict the different spore viabilities seen between species?

      [R] E0 is not predictive at all of the spore viability as we have shown in previous studies (see L. kluyveri - Brion et al. 2017, L. waltii-Dutreux et al. 2023). In addition, this has been shown is S. cerevisiae as well (Nishant et al. 2009).

      Figure 3A-what would this look like if it were plotted as "Events per chromosome" rather than per megabase?

      [R] We changed the figure (now figure 2A) and plotted as events per chromosome to show the variability of events at the chromosome level.

      Figure legends tend to be unreasonably terse, which makes figures more difficult to interpret.

      [R] This has been updated as suggested.

    1. Author response:

      The following is the authors’ response to the current reviews. 

      eLife assessment:

      This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because of insufficient grounding in prior experimental results and insufficient consideration of alternative explanations. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala.

      We disagree with the overall assessment of our paper. The current reviews published below focus on two kinds of perceived inadequacies. Reviewer 1 (R1) was concerned that the fear conditioning paradigm used in the model is not compatible with some of the experiments we are modeling. The reviewer helpfully suggested in the Recommendations for the Authors some papers, which R1 believed exposed this incompatibility. In our reading, those data are indeed compatible with our hypotheses, as we will explain in our reply. Furthermore, the point raised by R1 is an issue for the entire field. We will suggest a solution to that issue based on published data.

      Reviewer 2 (R2) said that there is no evidence that the BLA is capable of producing, by itself, the rhythms that have been observed during fear conditioning in BLA and, furthermore, that the paper we cited to support such evidence, in fact, refutes our argument. We believe that the reasoning used by reviewer 2 is wrong and that the framework of R2 for what counts as evidence is inadequate. We spell out our arguments below in the reply to the reviewers.

      Finally, we believe this work is of interest far beyond investigators studying fear conditioning. The work shows how rhythms can create the timing necessary for spike-timing-dependent plasticity using multiple time scales that come from multiple different kinds of interneurons found both in BLA and, more broadly, in cortex. Thus, the work is relevant for all kinds of associative learning, not just fear conditioning. Furthermore, it is one of the first papers to show how rhythms can be central in mechanisms of higher-order cognition.

      Reviewer #1

      We thank Reviewer 1 for his kind remarks about our first set of responses and their understanding of the importance of the work. There was only one remaining point to be addressed:

      Deficient in this study is the construction of the afferent drive to the network, which does elicit activities that are consistent with those observed to similar stimuli. It still remains to be demonstrated that their mechanism promotes plasticity for training protocols that emulate the kinds of activities observed in the BLA during fear conditioning.

      It is true that some fear conditioning protocols involve non-overlapping US and CS, raising the question of how plasticity happens or whether behavioral effects may happen without plasticity. This is an issue for the entire field (Sun et al., F1000Research, 2020). Several papers (Quirk, Repa and LeDoux, 1995; Herry et al, 2007; Bordi and Ledoux 1992) show that the pips in auditory fear conditioning increase the activity of some BLA neurons: after an initial transient, the overall spike rate is still higher than baseline activity. The question remains as to whether the spiking is sustained long enough and at a high enough rate for STDP to take place when US is presented sometime after the stop of the CS.

      Experimental recordings cannot speak to the rate of spiking of BLA neurons during US due to recording interference from the shock. However, evidence seems to suggest that ECS activity should increase during the US due to the release of acetylcholine (ACh) from neurons in the basal forebrain (BF) (Rajebhosale et al., 2024). Pyramidal cells of the BLA robustly express M1 muscarinic ACh receptors (Muller et al., 2013; McDonald and Mott, 2021) and M1 receptors target spines receiving glutamatergic input (McDonald et al., 2019). Thus, ACh from BF should elicit a long-lasting depolarization in pyramidal cells. Indeed, the pairing of ACh with even low levels of spiking of BLA neurons results in a membrane depolarization that can last 7 – 10 s (Unal et al., 2015). This implies that the release of ACh can affect the consequences of the CS in successive trials. This should include higher spiking rates and more sustained activity in the ECS neurons after the first presentation of US, thus ensuring a concomitant activation of ECS and fear (F) neurons necessary for STDP to take place. Hence, we suggest that a solution to the problem raised by R1 may be solved by considering the role of ACh release by BF. To the best of our knowledge, there is nothing in the literature that contradicts this potential solution. The model we have may be considered a “minimal” model that puts in by hand the higher frequency due to the cholinergic drive without explicitly modeling it. As R1 says, it is important for us to give the motivation of that higher frequency; in the next revision, we will be explicit about how the needed adequate firing rate can come about without an overlap of CS and US in any given trial.

      Reviewer #2

      The authors of this study have investigated how oscillations may promote fear learning using a network model. They distinguished three types of rhythmic activities and implemented an STDP rule to the network aiming to understand the mechanisms underlying fear learning in the BLA.

      After the revision, the fundamental question, namely, whether the BLA networks can or cannot intrinsically generate any theta rhythms, is still unanswered. The author added this sentence to the revised version: "A recent experimental paper, (Antonoudiou et al., 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone." In the cited paper, the authors studied gamma oscillations, and when they applied 10 uM Gabazine to the BLA slices observed rhythmic oscillations at theta frequencies. 10 uM Gabazine does not reduce the GABA-A receptor-mediated inhibition but eliminates it, resulting in rhythmic populations burst driven solely by excitatory cells. Thus, the results by Antonoudiou et al., 2022 contrast with, and do not support, the present study, which claims that rhythmic oscillations in the BLA depend on the function of interneurons. Thus, there is still no convincing evidence that BLA circuits can intrinsically generate theta oscillations in intact brain or acute slices. If one extrapolates from the hippocampal studies, then this is not surprising, as the hippocampal theta depends on extra-hippocampal inputs, including, but not limited to the entorhinal afferents and medial septal projections (see Buzsaki, 2002). Similarly, respiratory related 4 Hz oscillations are also driven by extrinsic inputs. Therefore, at present, it is unclear which kind of physiologically relevant theta rhythm in the BLA networks has been modelled.

      Reviewer 2 (R2) says “the fundamental question, namely, whether the BLA networks can or cannot intrinsically generate any theta rhythms, is still unanswered.” In our revision, we cited (Antonoudiou et al., 2022), who showed that BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings. R2 pointed out that this paper produces such theta under conditions in which the inhibition is totally removed. R2 then states that the resulting rhythmic populations burst at theta “are driven solely by excitatory cells. Thus, the results by (Antonoudiou et al., 2022) contrast with, and do not support, the present study, which claims that rhythmic oscillations in the BLA depend on the function of interneurons. Thus, there is still no convincing evidence that BLA circuits can intrinsically generate theta oscillations in intact brain or acute slices.”

      This reasoning of R2 is faulty. With all GABAergic currents omitted, the LFP is composed of excitatory currents and intrinsic currents. Our model of the LFP includes all synaptic and membrane currents. In our model, the high theta comes from the spiking activity of the SOM cells, which increase their activity if the inhibition from VIP cells is removed. We are including a new simulation, which models the activity of the slice in the presence of kainate (as done in Antonoudiou et al., 2022), providing additional excitation to the network. If the BLA starts at high excitation, our model produces an ongoing gamma in the VIP cells that suppress SOM cells and allows a PING gamma to form between PV and F cells; with Gabazine (modeled as the removal of all the GABAergic synapses), this PING is no longer possible and so the gamma rhythm disappears. As expected, the simulation shows that the model produces theta with Gabazine; the model also shows that a PING rhythm is produced without Gabazine, and that this rhythm goes away with Gabazine because PING requires feedback inhibition (see Author response image 1). Thus, the theta increase with Gabazine in the (Antonoudiou et al., 2022) paper can be reproduced in our model, so that paper does support the model.

      Author response image 1.

      Spectral properties of the BLA network without (black) versus with Gabazine (magenta). Power spectra of the LFP proxy, which is the linear sum of AMPA, GABA (only present in the absence of Gabazine, D-, NaP-, and H-currents. Both power spectra are represented as mean and standard deviation across 10 network realizations. Bottom: inset between 35 and 50 Hz.

      Nevertheless, we agree that this paper alone is not sufficient evidence that the BLA can produce a low theta. We have recently learned of a new paper (Bratsch-Prince et al., 2024) that is directly related to the issue of whether the BLA by itself can produce low theta, and in what circumstances. In this study, intrinsic BLA theta is produced in slices with ACh stimulation (without needing external glutamate input) which, in vivo, would be produced by the basal forebrain (Rajebhosale et al., eLife, 2024) in response to salient stimuli. The low-theta depends on muscarinic activation of CCK interneurons, a group of interneurons that overlaps with the VIP neurons in our model (Krabbe 2017; Mascagni and McDonald, 2003).

      We suspect that the low theta produced in (Bratsch-Prince et al., 2024) is the same as the low theta in our model. We do not explicitly include ACh modulation of BLA in our paper, but in current work with experimentalists, we aim to show that ACh is essential to the theta by activating the BLA VIP cells. In our re-revised version, we will discuss Bratsch-Prince et al., 2024 and its connection to our hypothesis that the theta oscillations can be produced within the BLA.

      Note that we have already included a paragraph stating explicitly that our hypothesis in no way contradicts the idea that inputs to the BLA may include theta oscillations. Indeed, the following paragraphs in the revised paper describe the complexity of trying to understand the origin of brain rhythms in vivo. R2 did not appear to take this complexity, and the possible involvement of neuromodulation, into account in their current position that the theta rhythms cannot be produced intrinsically in the BLA.

      From revised paper: “Where the rhythms originate, and by what mechanisms. A recent experimental paper, (Antonoudiou et al. 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. They draw this conclusion in mice by removing the hippocampus, which can volume conduct to BLA, and noticing that other nearby brain structures did not display any oscillatory activity. Our model also supports the idea that intrinsic mechanisms in the BLA can support the generation of the low theta, high theta, and gamma rhythms.

      Although the BLA can produce these rhythms, this does not rule out that other brain structures also produce the same rhythms through different mechanisms, and these can be transmitted to the BLA. Specifically, it is known that the olfactory bulb produces and transmits the respiratory-related low theta (4 Hz) oscillations to the dorsomedial prefrontal cortex, where it organizes neural activity (Bagur et al., 2021). Thus, the respiratory-related low theta may be captured by BLA LFP because of volume conduction or through BLA extensive communications with the prefrontal cortex. Furthermore, high theta oscillations are known to be produced by the hippocampus during various brain functions and behavioral states, including during spatial exploration (Vanderwolf, 1969) and memory formation/retrieval (Raghavachari et al., 2001), which are both involved in fear conditioning. Similarly to the low theta rhythm, the hippocampal high theta can manifest in the BLA. It remains to understand how these other rhythms may interact with the ones described in our paper.”

      We believe our current paper is important to show how detailed biophysical modeling can unearth the functional implications of physiological details (such as the biophysical bases of rhythms), which are often (indeed, usually) ignored in models, and why rhythms may be essential to some cognitive processes (including STDP). Indeed, for evaluating our paper it is necessary to go back to the purpose of a model, especially one such as ours, which is “hypothesis/data driven”. The hypotheses of the model serve to illuminate the functional roles of the physiological details, giving meaning to the data. Of course, the hypotheses must be plausible, and we think that the discussion above easily clears that bar. Hypotheses should also be checked experimentally, and a model that explains the implications of a hypothesis, such as ours, provides motivation for doing the hard work of experimental testing. We think that R1 understands this and has been very helpful.

      —————

      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala. 

      Most of our comments below are intended to rebut the sentence: “The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered”. 

      We believe this work will be interesting to investigators interested in dynamics associated with plasticity, which goes beyond fear learning. It will also be of interest because of its emphasis on the interactions of multiple kinds of interneurons that produce dynamics used in plasticity, in the cortex (which has similar interneurons) as well as BLA. We note that the model has sufficiently detailed physiology to make many predictions that can be tested experimentally. Details are below in the answer to reviewers.

      Reviewer #1 (Public Comments):  

      (1) … the weakness is that their attempt to align with the experimental literature (specifically Krabbe et al. 2019) is performed inconsistently. Some connections between cell types were excluded without adequate justification (e.g. SOM+ to PV+). 

      In order to constrain our model, we focused on what is reported in (Krabbe et al., 2019) in terms of functional connectivity instead of structural connectivity. Thus, we included only those connections for which there was strong functional connectivity. For example, the SOM to PV connection is shown to be small (Krabbe et al., 2019, Supp. Fig. 4, panel t). We also omitted PV to SOM, PV to VIP, SOM to VIP, VIP to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning. 

      We reply with more details below to the Recommendations for the Authors, including new text.

      (2) The construction of the afferent drive to the network does not reflect the stimulus presentations that are given in fear conditioning tasks. For instance, the authors only used a single training trial, the conditioning stimulus was tonic instead of pulsed, the unconditioned stimulus duration was artificially extended in time, and its delivery overlapped with the neutral stimulus, instead of following its offset. These deviations undercut the applicability of their findings.  

      Regarding the use of a single long presentation of US rather than multiple presentations (i.e., multiple trials): in early versions of this paper, we did indeed use multiple presentations. We were told by experimental colleagues that the learning could be achieved in a single trial. We note that, if there are multiple presentations in our modeling, nothing changes; once the association between CS and US is learned, the conductance of the synapse is stable. Also, our model does not need a long period of US if there are multiple presentations.  

      We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like Poisson.

      Our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US coterminates with CS (Lindquist et al., 2004), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs existing in the literature, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect, as suggested in the Discussion of our paper, or by metabotropic effects as suggested above, or by the contribution from other brain regions. We will emphasize in our revision that the overlap in time, however instantiated, is a hypothesis of our model. It is hard to see how plasticity can occur without some memory trace of US. This is a consequence of our larger hypothesis that fear learning uses spiketiming-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature. 

      We reply with more details below to the Recommendations for the Authors, including new text.

      Reviewer #1 (Recommendations For The Authors): 

      Major points: 

      (1) This paper draws extensively from Krabbe et al. 2019, but it does not do so consistently. The paper would be strengthened if it tried to better match the circuit properties and activations.

      Specifically: 

      a. Krabbe found that PV interneurons were comparably activated by the US (see Supp Fig 1). Your model does not include that. The basis for the Krabbe 2019 claim that PV US responses are weaker is that they have a slightly larger proportion of cells inhibited by the US, but this is not especially compelling. In addition, their Fig 2 showed that VIP and SOM cells receive afferents from the same set of upstream regions. 

      b. The model excluded PV-SOM connections, but this does not agree with Krabbe et al. 2019, Table 2. PV cells % connectivity and IPSC amplitudes were comparable to those from VIP interneurons. 

      c. ECS to PV synapses are not included. This seems unlikely given the dense connectivity between PV interneurons and principal neurons in cortical circuits and the BLA (Woodruff and Sah 2007 give 38% connection probability in BLA). 

      We thank the Reviewer for raising these points, which allow us to clarify how we constrained our model and to do more simulations. Specifically: 

      a. (Wolff et al., Nature, 2014), cited by (Krabbe et al. 2018), reported that PV and SOM interneurons are on average inhibited by the US during the fear conditioning. However, we agree that (Krabbe et al., 2019) added to this by specifying that PV interneurons respond to both CS+ and US, although the fraction of US-inhibited PV interneurons is larger. As noted by the Reviewer, in the model we initially considered the PV interneurons responding only to CS+ (identified as “CS” in our manuscript). For the current revision, we ran new simulations in which the PV interneuron receives the US input, instead of CS+. It turned out that this did not affect the results, as shown in the figure below: all the network realizations learn the association between CS and fear. In the model, the PING rhythm between PV and F is the crucial component for establishing fine timing between ECS and F, which is necessary for learning. Having PV responding to the same input as F, i.e., US, facilitates their entrainment in PING and, thus, successful learning. 

      As for afferents of VIP and SOM from upstream regions, in (Krabbe et al., 2019) is reported that “[…] BLA SOM interneurons receive a different array of afferent innervation compared to that of VIP and PV interneurons, which might contribute to the differential activity patterns observed during fear learning.” Thus, in the model, we are agnostic about inputs to SOM interneurons; we modeled them to fire spontaneously at high theta.

      To address these points in the manuscript, we added some new text in what follows:

      (1) New Section “An alternative network configuration characterized by US input to PV, instead of CS, also learns the association between CS and fear” in the Supplementary information:

      “We constrained the BLA network in Fig. 2 with CS input to the PV interneuron, as reported in (Krabbe et al., 2018). However, (Krabbe et al., 2019) notes that a class of PV interneurons may be responding to US rather than CS. Fig. S3 presents the results obtained with this variation in the model (see Fig. 3 A,B for comparison) and shows that all the network realizations learn the association between CS and fear. In the model, the PING rhythm between PV and F is the crucial component for establishing fine timing between ECS and F, which is necessary for learning. Having PV responding to the same input as F, i.e., US, facilitates their entrainment in PING and, thus, successful fear learning.

      We model the VIP interneuron as affected by US; in addition, (Krabbe et al. 2019) reports that a substantial proportion of them is mildly activated by CS. Replacing the US by CS does not change the input to VIP cells, which is modeled by the same constant applied current. Thus, the VIP CS-induced activity is a bursting activity at low theta, similar to the one elicited by US in Fig. 2.”

      (2) Section “With the depression-dominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning” in Results: “Finally, since (Krabbe et al., 2019) reported that a fraction of PV interneurons are affected by US, we have also run the simulations for single neuron network with the PV interneuron affected by US instead of CS. In this case as well, all the network realizations are learners (see Fig. S3). ”

      (3) Section “Conditioned and unconditioned stimuli” in Materials and Methods: “To make Fig. S3, we also considered a variation of the model with PV interneurons affected by US, instead of CS, as reported in (Krabbe et al. 2019).”

      b. Re the SOM to PV connection: As reported in the reply to the public reviews, we considered the prominent functional connections reported in (Krabbe et al., 2019), instead of structural connections. That is, we included only those connections for which there was strong functional connectivity. For example, the SOM to PV connection is shown to be small (Supp. Fig. 4, panel t, in (Krabbe et al., 2019)). We also omitted PV to SOM, PV to VIP, SOM to VIP, and VIP to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning.

      In order to clarify this point, in Section “Network connectivity and synaptic currents” in Materials and Methods, we now say:

      “We modeled the network connectivity as presented in Fig. 2B, derived from the prominent functional, instead of structural, connections reported in (Krabbe et al., 2019).”

      c. Re the ECS to PV synapses: We thank the Reviewer for the reference provided; as the Reviewer says, the ECS to PV synapses are not included. Upon adding this connection in our network, we found that, unlike the connection suggested in part a above, introducing these synapses would, in fact, change the outcome. Thus, the omission of this connection must be considered an implied hypothesis. Including those synapses with a significant strength would alter the PING rhythm created by the interactions between F and PV, which is crucial for ECS and F fine timing. Thanks very much for showing us that this needs to be said. Our hypothesis does not contradict the dense connections mentioned by the Reviewer; such dense connectivity does not mean that all pyramidal cells connect to all interneurons. This hypothesis may be taken as a prediction of the model.

      The absence of this connection is now discussed at the end of a new Section of the Discussion entitled “Assumptions and predictions of the model”, which reads as follows:

      “Finally, the model assumes the absence of significantly strong connections from the excitatory projection cells ECS to PV interneurons, unlike the ones from F to PV. Including those synapses would alter the PING rhythm created by the interactions between F and PV, which is crucial for ECS and F fine timing. We note that in (Woodruff and Sah, 2007) only 38% of the pyramidal cells are connected to PV cells. The functional identity of the connected pyramidal cells is unknown. Our model suggests that successful fear conditioning requires F to PV connections and that ECS to PV must be weak or absent.”

      (2) Krabbe et al. 2019 and Davis et al. 2017 were referenced for the construction of the conditioned and unconditioned stimulus pairing protocol. The Davis citation is not applicable here because that study was a contextual, not cued, fear conditioning paradigm. Regarding Krabbe, the pairing protocol was radically different from what the authors used. Their conditioned stimulus was a train of tone pips presented at 0.9 Hz, which lasted 30 s, after which the unconditioned stimulus was presented after tone offset. The authors should determine how their network behaves when this protocol is used. Also, note that basolateral amygdala responses to tone stimuli are primarily brief onset responses (e.g. Quirk, Armony, and LeDoux 1997), and not the tonic activation used in the model.  

      We replied to this point in our responses to the Reviewer’s Public Comments as follows:

      “We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like

      Poisson.”

      Current answer to the Reviewer:

      There are several distinct issues raised by the Reviewer in the more detailed critique. We respectfully disagree that the model is not applicable to context-dependent fear learning where the context acts as a CS, though we should have been more explicit. Specifically, our CS input can describe both the cue and the context. We included the following text in the Results section “Interneuron rhythms provide the fine timing needed for depression-dominated STDP to make the association between CS and fear”:

      “In our simulations, the CS input describes either the context or the cue in contextual and cued fear conditioning, respectively. For the context, the input may come from the hippocampus or other non-sensory regions, but this does not affect its role as input in the model.”

      The second major issue is whether the specific training protocols used in the cited papers need to be exactly reproduced in the signals received by the elements of our model; we note that there are many transformations that can occur between the sensory input and the signals received by the BLA. In the case of auditory fear conditioning, a series of pips, rather than individual pips, are considered the CS (e.g., (Stujenske et al., 2014; Krabbe et al. 2019)). Our understanding is that a single pip does not elicit a fear response; a series of pips is required for fear learning. This indicates that it is not the neural code of a single pip that matters, but rather the signal entering the amygdala that incorporates any history-dependent signaling that could lead to spiking throughout the sequence of pips.  Also, as mentioned above, intense inputs at frequencies about 6kHz and 12kHz can lead to metabotropic effects that last much longer than each brief pip (~200 ms), thus possibly producing continuous activity in neurons encoding the input. Thus, we believe that our use of the Poisson spike train is reasonable. 

      However, we are aware that the activity of neurons encoding CS can be modulated by the pips: neurons encoding auditory CS display a higher firing rate when each pip is presented and a Poisson-like spike train between pips (Herry et al., Journal of Neuroscience, 2007). Here we confirm that potentiation is present even in the presence of the fast transient response elicited by the pips. We said in the original manuscript that there is learning for a Poisson spike train CS input at ~50 Hz; this describes the neuronal activity in between pips. For the revision, we asked whether learning is preserved when CS is characterized by higher frequencies, which would describe the CS during and right after each pip. We show in the new Fig. S4 that potentiation is ensured for a range of CS frequencies. The figure shows the learning speed as a function of CS and US frequencies. For all the CS frequencies considered, i) there is learning, ii) learning speed increases with CS frequency. Thus, potentiation is present even when pips elicit a faster transient response.

      To better specify this in the manuscript, 

      We added the following sentences in the Results section “With the depressiondominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning”: 

      “We note that the CS and US inputs modeled as independent Poisson spike trains represent stimuli with no structure. Although we have not explicitly modeled pulsating pips, as common in auditory fear conditioning (e.g., (Stujenske 2014; Krabbe 2019)), we show in Fig. S4 that potentiation can be achieved over a relatively wide range of gamma frequencies. This indicates that overall potentiation is ensured if the gamma frequency transiently increases after the pip.”

      We added the section “The full network potentiates for a range of CS frequencies“ and figure S4 in the Supplementary Information:

      We included in Materials and Methods “Conditioned and unconditioned stimuli” the following sentences:

      “Finally, for Fig.S4, we considered a range of frequencies for the CS stimulus. To generate the three Poisson spike trains with average frequencies from 48 to 64 Hz in Fig. S4, we set 𝜆 = 800, 1000, 1200.”

      Finally, to address the comment about the need for CS and US overlapping in time to instantiate fear association, we added the following text in the Results section “Assumptions and predictions of the model”:

      “Finally, our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US co-terminates with CS (e.g., (Lindquist et al., 2004)), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs exist, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect due to metabotropic effects (Whittington et al., Nature, 1995) as suggested above, or by the contribution from other brain regions (see section “Involvement of other brain structures” in the Discussion). The fact that plasticity occurs with US memory trace is a consequence of our larger hypothesis that fear learning uses spike-timing-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature.”

      (3) As best as I could tell, only a single training trial was used in this study. Fair enough, especially given that fear learning can occur with a single trial. However, most studies of amygdala fear conditioning have multiple trials (~5 or more). How does the model perform when multiple trials are given?  

      The association between CS and fear acquired after one trial, i.e., through a potentiated ECS to F connection, is preserved in the presence of multiple trials.  Indeed, the association would be weakened or erased (through depression of the ECS to F connection) only if ECS and F did not display good fine timing, i.e., F does not fire right after ECS most of the time. However, the implemented circuit supports the role of interneurons in providing the correct fine timing, thus preventing the association acquired from being erased.  

      In the second paragraph of the Results section “With the depression-dominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning”, we made the above point by adding the following text:

      “We note that once the association between CS and fear is acquired, subsequent presentations of CS and US do not weaken or erase it: the interneurons ensure the correct timing and pauses in ECS and F activity, which are conducive for potentiation.”

      (4) The LFP calculations are problematic. First, it is unclear how they were done. Did the authors just take the transmembrane currents they included and sum them, or were they scaled by distance from the 'electrode' and extracellular conductivity (as one would derive from the Laplace equation)? Presumably, the spatial arrangement of model neurons was neglected so distance was not a factor. 

      Second, if this is the case, then the argument for excluding GABAergic conductances seems flawed. If the spatial arrangement of neurons is relevant to whether to include or exclude GABAergic conductances, then wouldn't a simulation without any spatial structure not be subject to the concern of laminar vs. nuclear arrangement? 

      Moreover, to the best I can tell, the literature the authors use to justify the exclusion of

      GABAergic currents does not make the case for a lack of GABAergic contribution in non-laminar structures. Instead, those studies only argue that in a non-laminar structure, AMPA currents are detectable, not that GABA cannot be detected. Thus, the authors should either include the GABAergic currents when calculating their simulated LFP, or provide a substantially better argument or citation for their exclusion. 

      We thank the Reviewer for pointing this out; this comment helped us rethink how to model the LFP. The origin of the LFP signal in BLA has not been fully determined, but factors thought to be important include differences in the spatial extension of the arborization in excitatory and inhibitory neurons, in the number of synaptic boutons, and spatial distributions of somata and synapses (Lindén et al 2011; Łęski 2013; Mazzoni et al. 2015). In the first version of the manuscript, we excluded the GABAergic currents because it is typically assumed that they add very little to the extracellular field as the inhibitory reversal potential is close to the resting membrane potential. For the revision, we re-ran the simulations during pre and post fear conditioning and we modeled the LFP as the sum of the AMPA, GABA and NaP-/H-/D- currents. With this new version of the LFP, we added a new Fig. 6 showing that there is a significant increase in the low theta power, but not in the high theta power, with fear learning (Fig. 6 C, D, E). This increase in the low theta power was mainly due to the AMPA currents created by the newly established connection from ECS to F, which allowed F to be active after fear conditioning in response to CS. 

      However, as the Reviewer mentioned, our network has no spatial extent: neurons are modeled as point cells. Thus, our current model does not include the features necessary to model some central aspects of the LFP. Despite that, our model does clearly demonstrate how rhythmic activity in the spike timing of neurons within the network changes due to fear learning (Fig. 6B). The spiking outputs of the network are key components of the inputs to the LFP, and thus we expect the rhythms in the spiking to be reflected in more complex descriptions of the LFP. But we also discovered that different LFP proxies provide different changes in rhythmic activity comparing pre- and post-fear learning; although we have no principled way to choose a LFP proxy, we believe that the rhythmic firing is the essential finding of the model.

      We have added the following to the manuscript:

      (1) In the new version of Fig. 6, we present the power spectra of the network spiking activity (panel B), along with the power spectra of the LFP proxy that includes the GABA, AMPA, and NaP-/H-/D- currents (panels C, D, E). 

      (2) We modified the conclusion of the Results section entitled “Increased low-theta frequency is a biomarker of fear learning” by saying:

      “In this section, we explore how plasticity in the fear circuit affects the network dynamics, comparing after fear conditioning to before. We first show that fear conditioning leads to an increase in low theta frequency power of the network spiking activity compared to the pre-conditioned level (Fig. 6 A,B); there is no change in the high theta power. We also show that the LFP, modeled as the linear sum of all the AMPA, GABA, NaP-, D-, and H- currents in the network, similarly reveals a low theta power increase and no significant variation in the high theta power (Fig. 6 C,D,E). These results reproduce the experimental findings in (Davis et al., 2017), and (Davis et al., 2017), and Fig 6 F,G show that the low theta increase is due to added excitation provided by the new learned pathway. The additional unresponsive ECS and F cells in the network were included to ensure we had not biased the LFP towards excitation. Nevertheless, although both the AMPA and GABA currents contribute to the power increase in the low theta frequency range (Fig. 6F), the AMPA currents show a dramatic power increase relative to the baseline (the average power ratio of AMPA and GABA post- vs pre-conditioning across 20 network realizations is 3*103 and 4.6, respectively). This points to the AMPA currents as the major contributor to the low theta power increase. Specifically, the newly potentiated AMPA synapse from ECS to F ensures F is active after fear conditioning, thus generating strong currents in the PV cells to which it has strong connections (Fig. 6G). Finally, the increase in power is in the low theta range because ECS and F are allowed to spike only during the active phase of the low theta spiking VIP neurons. We have also explored another proxy for the LFP (see Supplementary Information and Fig. S6).”

      In the Supplementary Information, we included a figure and some text in the new section entitled “A higher low theta power increase emerges in LFP approximated with the sum of the absolute values of the currents compared to their linear sum”:

      “Given that our BLA network comprises a few neurons described as single-compartment cells with no spatial extension and location, the LFP cannot be computed directly from our model’s read-outs. In the main text, we choose as an LFP proxy the linear sum of the AMPA, GABA, and P-/H-/D-currents. We note that if the LFP is modeled as the sum of the absolute value of the currents, as suggested by (Mazzoni et al. 2008; Mazzoni et al. 2015), an even higher low theta power increase arises after fear conditioning compared to the linear sum. Differences in the power spectra also arise if other LFP proxies (e.g., only AMPA currents, only GABA currents) are considered. A principled description of an LFP proxy would require modeling the three-dimensional BLA anatomy, including that of the interneurons VIP and SOM; this is outside the scope of the current paper. (See (Feng et al. 2019) for a related project in the BLA.)”

      (3) We updated the Materials and Methods section “Local field potentials and spectral analysis” to explain how we compute the LFP in the revised manuscript: 

      “We considered as an LFP proxy as the linear sum of all the AMPA, GABA, NaP, D, and H currents in the network. The D-current is in the VIP interneurons, and NaP-current and H-current are in SOM interneurons.”

      Although it is beyond the scope of the current work, an exploration of the most accurate proxy of the LFP in the amygdala is warranted. Such a study could be accomplished by adopting a similar approach as in (Mazzoni et al., 2015), where several LFP proxies based on point-neuron leaky-integrate and fire neuronal network were compared with a “groundtruth” LFP obtained in an analogous realistic three-dimensional network model. 

      To explicitly mention this issue in the paper, we add a paragraph in the “Limitations and caveats” section in the Discussion, which reads as follows:

      “LFPs recorded in the experiments are thought to be mainly created by transmembrane currents in neurons located around the electrode and depend on several factors, including the morphology of the arborization of contributing neurons and the location of AMPA and GABA boutons (Katzner et al. 2009; Lindén et al 2011; Łęski 2013; Mazzoni et al. 2015). Since our model has no spatial extension, we used an LFP proxy; this proxy was shown to reflect the rhythmic output of the network, which we believe to be the essential result (for more details see Results “Increased low-theta frequency is a biomarker of fear learning”, and Supplementary Information “A higher low theta power increase emerges in LFP approximated with the sum of the absolute values of the currents compared to their linear sum”).”

      (4)     We have removed the section “Plasticity between fear neuron and VIP slows down overall potentiation” in Results and sections “Plasticity between the fear neuron (F) and VIP slows down overall potentiation” and “Plastic F to VIP connections further increase lowtheta frequency power after fear conditioning” in the Supplementary Information. This material is extraneous since we are using a new proxy for LFP.

      Minor points: 

      (1) In Figure 3C, the y-axis tick label for 0.037 is written as "0.37."

      We thank the reviewer for finding this typo; we fixed it.

      (2) Figure 5B is unclear. It seems to suggest that the added ECS and F neurons did not respond to either the CS or UCS. Is this true? If so, why include them in the model? How would their inclusion change the model behavior? 

      It is correct that the added ECS and F neurons did not respond to the CS or US (UCS); they are constructed to be firing at 11 Hz in the absence of any connections from other cells.  These cells were included to be part of our computation of the LFP.  Specifically, adding in those cells would make the LFP take inhibition into account more, and we wanted to make sure that were not biasing our computation away from the effects of inhibition.  As shown in the paper (Fig. 6B), even with inhibition onto these non-responsive cells, the LFP has the properties claimed in the paper concerning the changes in the low theta and high-theta power, because the LFP is dominated by new excitation rather than the inhibition. 

      First, in the Results section “Network with multiple heterogeneous neurons can establish the association between CS and fear”, we commented on the added ECS and F neurons that do not respond to either CS or US by saying the following:

      “The ECS cells not receiving CS are inhibited by ongoing PV activity during the disinhibition window (Fig. 5B); they are constructed to be firing at 11 Hz in the absence of any connections from other cells. The lack of activity in those cells during fear conditioning implies that there is no plasticity from those ECS cells to the active F. Those cells are included for the calculation of the LFP (see below in “Increased low-theta frequency is a biomarker of fear learning”.)”

      Furthermore, we add the following sentence in the Results section “Increased low-theta frequency is a biomarker of fear learning”: 

      “The additional unresponsive ECS and F cells in the network were included to ensure we had not biased the LFP towards excitation.”

      (3) Applied currents are given as current densities, but these are difficult to compare with current levels observed from whole-cell patch clamp recordings. Can the currents be given as absolute levels, in pA/nA. 

      In principle, it is possible to connect current densities with absolute levels, as requested. However, we note that the number of cells in models is orders of magnitude smaller than the number being modeled. It is common in modeling to adjust physiological parameters to achieve the qualitative properties that are important to the model, rather than trying to exactly match particular recordings.

      We added to the Methods description why we choose units per unit area, rather than absolute units. 

      “All the currents are expressed in units per area, rather than absolute units, to avoid making assumptions about the size of the neuron surface.”

      (4) Regarding: "We note that the presence of SOM cells is crucial for plasticity in our model since they help to produce the necessary pauses in the excitatory projection cell activity. However, the high theta rhythm they produce is not crucial to the plasticity: in our model, high theta or higher frequency rhythms in SOM cells are all conducive to associative fear learning. This opens the possibility that the high theta rhythm in the BLA mostly originates in the prefrontal cortex and/or the hippocampus (Stujenske et al., 2014, 2022)." The chain of reasoning in the above statement is unclear. The second sentence seems to be saying contradictory things. 

      We agree that the sentence was confusing; thank you for pointing it out. We have revised the paragraph to make our point clearer. The central points are: 1) having the SOM cells in the BLA is critical to the plasticity in the model, and 2) these cells may or may not be the source of the high theta observed in the BLA during fear learning.

      We deleted from the discussion the text reported by the Reviewer, and we added the following one to make this point clearer:

      “We note that the presence of SOM cells is crucial for plasticity in our model since they help to produce the necessary pauses in the excitatory projection cell activity. The BLA SOM cells do not necessarily have to be the only source of the high theta observed in the BLA during fear learning; the high theta detected in the LFP of the BLA also originates from the prefrontal cortex and/or the hippocampus (Stujenske et al., 2014, 2022).”

      (5) Regarding: "This suggests low theta power change is not just an epiphenomenon but rather a biomarker of successful fear conditioning." Not sure this is the right framing for the above statement. The power of the theta signal in the LFP reflects the strengthening of connections, but it itself does not have an impact on network activity. Moreover, whether something is epiphenomenal is not relevant to the question of whether it can serve as a successful biomarker. A biomarker just needs to be indicative, not causal. 

      We intended to say why the low theta power change is a biomarker in the sense of the Reviewer. That is: experiments have shown that, with learning, the low theta power increases. The modeling shows in addition that, when learning does not take place, the low power does not increase. That means that the low theta power increases if and only if there is learning, i.e., the change in low theta power is a biomarker. To make our meaning clearer, we have changed the quoted sentences to read: 

      “This suggests that the low theta power change is a biomarker of successful fear conditioning: it occurs when there is learning and does not occur when there is no learning.”

      Reviewer #2 (Public Comments): 

      We thank the Reviewer for raising these interesting points. Below are our public replies and the changes we made to the manuscript to address the Reviewer’s objections.

      (1) Gamma oscillations are generated locally; thus, it is appropriate to model in any cortical structure. However, the generation of theta rhythms is based on the interplay of many brain areas therefore local circuits may not be sufficient to model these oscillations.

      Moreover, to generate the classical theta, a laminal structure arrangement is needed (where neurons form layers like in the hippocampus and cortex)(Buzsaki, 2002), which is clearly not present in the BLA. To date, I am not aware of any study which has demonstrated that theta is generated in the BLA. All studies that recorded theta in the BLA performed the recordings referenced to a ground electrode far away from the BLA, an approach that can easily pick up volume conducted theta rhythm generated e.g., in the hippocampus or other layered cortical structure. To clarify whether theta rhythm can be generated locally, one should have conducted recordings referenced to a local channel (see Lalla et al., 2017 eNeuro). In summary, at present, there is no evidence that theta can be generated locally within the BLA. Though, there can be BLA neurons, firing of which shows theta rhythmicity, e.g., driven by hippocampal afferents at theta rhythm, this does not mean that theta rhythm per se can be generated within the BLA as the structure of the BLA does not support generation of rhythmic current dipoles. This questions the rationale of using theta as a proxy for BLA network function which does not necessarily reflect the population activity of local principal neurons in contrast to that seen in the hippocampus.

      In both modeling and experiments, a laminar structure does not seem to be needed to produce a theta rhythm. A recent experimental paper, (Antonoudiou et al. 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. The authors draw this conclusion by looking at mice ex vivo slices. The currents that generate these rhythms are in the BLA, since the hippocampus was removed to eliminate hippocampal volume conduction and other nearby brain structures did not display any oscillatory activity. Also, in the modeling literature, there are multiple examples of the production of theta rhythms in small networks not involving layers; these papers explain the mechanisms producing theta from non-laminated structures (Dudman et al., 2009, Kispersky et al., 2010, Chartove et al. 2020).  We are not aware of any model description of the mechanisms of theta that do require layers.

      We added the following text in the introduction of the manuscript to make this point clearer:  “A recent rodent experimental study (Antonoudiou et al. 2022) suggests that BLA can intrinsically generate theta oscillations (3-12 Hz).”

      (2) The authors distinguished low and high theta. This may be misleading, as the low theta they refer to is basically a respiratory-driven rhythm typically present during an attentive state (Karalis and Sirota, 2022; Bagur et al., 2021, etc.). Thus, it would be more appropriate to use breathing-driven oscillations instead of low theta. Again, this rhythm is not generated by the BLA circuits, but by volume conducted into this region. Yet, the firing of BLA neurons can still be entrained by this oscillation. I think it is important to emphasize the difference.

      Many rhythms of the nervous system can be generated in multiple parts of the brain by multiple mechanisms. We do not dispute that low theta appears in the context of respiration; however, this does not mean that other rhythms with the same frequencies are driven by respiration. Indeed, in the response to question 1 above, we showed that theta can appear in the BLA without inputs from other regions. In our paper, the low theta is generated in the BLA by VIP neurons. Using intrinsic currents known to exist in VIP neurons (Porter et al., 1998), modeling has shown that such neurons can intrinsically produce a low theta rhythm. This is also shown in the current paper. This example is part of a substantial literature showing that there are multiple mechanisms for any given frequency band. 

      To elaborate more on this in the manuscript, we added the following new section in the discussion:

      “Where the rhythms originate, and by what mechanisms. A recent experimental paper, (Antonoudiou et al. 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. They draw this conclusion in mice by removing the hippocampus, which can volume conduct to BLA, and noticing that other nearby brain structures did not display any oscillatory activity. Our model also supports the idea that intrinsic mechanisms in the BLA can support the generation of the low theta, high theta, and gamma rhythms. 

      Although the BLA can produce these rhythms, this does not rule out that other brain structures also produce the same rhythms through different mechanisms, and these can be transmitted to the BLA. Specifically, it is known that the olfactory bulb produces and transmits the respiratory-related low theta (4 Hz) oscillations to the dorsomedial prefrontal cortex, where it organizes neural activity (Bagur et al., 2021). Thus, the respiratory-related low theta may be captured by BLA LFP because of volume conduction or through BLA extensive communications with the prefrontal cortex. Furthermore, high theta oscillations are known to be produced by the hippocampus during various brain functions and behavioral states, including during spatial exploration (Vanderwolf, 1969) and memory formation/retrieval (Raghavachari et al., 2001), which are both involved in fear conditioning. Similarly to the low theta rhythm, the hippocampal high theta can manifest in the BLA. It remains to understand how these other rhythms may interact with the ones described in our paper.”

      We also note that the presence of D-currents in the BLA VIP interneurons should be confirmed experimentally, and that the ability of VIP interneurons to generate the BLA low theta rhythm constitutes a prediction of our computational model. These points are specified in the first paragraph in the Discussion entitled “Assumptions and predictions of the model”:

      “The interneuron descriptions in the model were constrained by the electrophysiological properties reported in response to hyperpolarizing currents (Sosulina et al., 2010). Specifically, we modeled the three subtypes of VIP, SOM, and PV interneurons displaying bursting behavior, regular spiking with early spike-frequency adaptation, and regular spiking without spike-frequency adaptation, respectively. Focusing on VIP interneurons, we were able to model the bursting behavior by including the D-type potassium current. This current is thought to exist in the VIP interneurons in the cortex (Porter et al., 1998), but whether this current is also found in the VIP interneurons the BLA is still unknown. Similarly, we endowed the SOM interneurons with NaP- and H-currents, as the OLM cells in the hippocampus. Due to these currents, the VIP and SOM cells are able to show  low- and high-theta oscillations, respectively. The presence of these currents and the neurons’ ability to exhibit oscillations in the theta range during fear conditioning and at baseline in BLA, which are assumptions of our model, should be tested experimentally.”

      (3) The authors implemented three interneuron types in their model, ignoring a large fraction of GABAergic cells present in the BLA (Vereczki et al., 2021). Recently, the microcircuit organization of the BLA has been more thoroughly uncovered, including connectivity details for PV+ interneurons, firing features of neurochemically identified interneurons (instead of mRNA expression-based identification, Sosulina et al., 2010), synaptic properties between distinct interneuron types as well as principal cells and interneurons using paired recordings. These recent findings would be vital to incorporate into the model instead of using results obtained in the hippocampus and neocortex. I am not sure that a realistic model can be achieved by excluding many interneuron types.

      The interneurons and connectivity that we used were inspired by the functional connectivity reported in (Krabbe et al., 2019) (see above answer to Reviewer #1). As reported in (Vereczki et al., 2021), there are multiple categories and subcategories of interneurons; that paper does not report on which ones are essential for fear conditioning. We did use all the highly represented categories of the interneurons, except NPYcontaining neurogliaform cells.

      The Reviewer says “I am not sure that a realistic model can be achieved by excluding many interneuron types”. We agree with the Reviewer that discarding the introduction of other interneurons subtypes and the description of more specific connectivity (soma-, dendrite-, and axon-targeting connections) may limit the ability of our model to describe all the details in the BLA. However, this work represents a first effort towards a biophysically detailed description of the BLA rhythms and their function. As in any modeling approach, assumptions about what to describe and test are determined by the scientific question; details postulated to be less relevant are omitted to obtain clarity. The interneuron subtypes we modeled, especially VIP+ and PV+, have been reported to have a crucial role in fear conditioning (Krabbe et al., 2019). Other interneurons, e.g. cholecystokinin and SOM+, have been suggested as essential in fear extinction. Thus, in the follow-up of this work to explain fear extinction, we will introduce other cell types and connectivity. In the current work, we have achieved our goals of explaining the origin of the experimentally found rhythms and their roles in the production of plasticity underlying fear learning. Of course, a more detailed model may reveal flaws in this explanation, but this is science that has not yet been done.

      We elaborate more on this in a new section in the Discussion entitled “Assumptions and predictions of the model”. The paragraph related to this point reads as follows:

      “Our model, which is a first effort towards a biophysically detailed description of the BLA rhythms and their functions, does not include the neuron morphology, many other cell types, conductances, and connections that are known to exist in the BLA; models such as ours are often called “minimal models” and constitute the majority of biologically detailed models. Such minimal models are used to maximize the insight that can be gained by omitting details whose influence on the answers to the questions addressed in the model are believed not to be qualitatively important. We note that the absence of these omitted features constitutes hypotheses of the model: we hypothesize that the absence of these features does not materially affect the conclusions of the model about the questions we are investigating. Of course, such hypotheses can be refuted by further work showing the importance of some omitted features for these questions and may be critical for other questions. Our results hold when there is some degree of heterogeneity of cells of the same type, showing that homogeneity is not a necessary condition.”

      (4) The authors set the reversal potential of GABA-A receptor-mediated currents to -80 mV. What was the rationale for choosing this value? The reversal potential of IPSCs has been found to be -54 mV in fast-spiking (i.e., parvalbumin) interneurons and around -72 mV in principal cells (Martina et al., 2001, Veres et al., 2017).

      A GABA-A reversal potential around -80 mV is common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020). Other computational works of the amygdala, e.g. (Kim et al., 2016), consider GABA-A reversal potential at -75 mV based on the cortex (Durstewitz et al., 2000). The papers cited by the reviewer have a GABA-A reversal potential of -72 mV for synapses onto pyramidal cells; this is sufficiently close to our model that it is not likely to make a difference. For synapses onto PV+ cells, the papers cited by the reviewer suggest that the GABA-A reversal potential is -54 mV; such a reversal potential would lead these synapses to be excitatory instead of inhibitory. However, it is known (Krabbe et al., 2019; Supp. Fig. 4b) that such synapses are in fact inhibitory. Thus, we wonder if the measurements of Martina and Veres were made in a condition very different from that of Krabbe. For all these reasons, we consider a GABA-A reversal potential around -80 mV in amygdala to be a reasonable assumption.

      In section “Network connectivity and synaptic currents” in “Materials and Methods” we provided references to motivate our choice of considering a GABA-A reversal potential around -80 mV:

      “The GABAa current reversal potential (𝐸!) is set to −80        𝑚𝑉, as common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020).”

      (5) Proposing neuropeptide VIP as a key factor for learning is interesting. Though, it is not clear why this peptide is more important in fear learning in comparison to SST and CCK, which are also abundant in the BLA and can effectively regulate the circuit operation in cortical areas.

      Other peptides seem to be important in overall modulation of fear, but VIP is especially important in the first part of fear learning, the subject of our paper. Re SST: we hypothesize that SST interneurons are critical in fear extinction and preventing fear generalization, but not to initial fear learning. The peptide of the CCK neurons, which overlap with VIP cells, has been proposed to promote the switch between fear and safety states after fear extinction (Krabbe al. 2018). Thus, these other peptides are likely more important for other aspects of fear learning.  

      In the Discussion, we have added:

      “We hypothesize that SST peptide is critical in fear extinction and preventing fear generalization, but not to initial fear learning. Also, the CCK peptide has been proposed to promote the switch between fear and safety states after fear extinction (Krabbe al. 2018).”

      Reviewer #2 (Recommendations For The Authors): 

      We note that Reviewer #2’s Recommendations For The Authors have the same content as the Public Comments. Thus, the changes to the manuscript we implemented above address also the private critiques listed below.

      (1) As the breathing-driven rhythm is a global phenomenon accompanying fear state, one might restrict the analysis to this oscillation. The rationale beyond this restriction is that the 'high' theta in the BLA has an unknown origin (since it can originate from the ventral hippocampus, piriform cortex etc.). 

      In response to point 4 made by Reviewer 1 (Recommendations for the Authors) (p. 13), referring to high theta in the BLA, we previously wrote: 1) having the SOM cells in the BLA is critical to the plasticity in the model, and 2) these cells may or may not be the source of the high theta observed in the BLA during fear learning.

      In the Public Critiques, Reviewer 2 relates the respiratory rhythm to the low theta. We answered this point in point 2 of the Reviewer’s Public Comments (at p. 15).

      (2) I would include more interneurons in the network model incorporating recent findings. 

      This point was answered in our response to point 3 of the Reviewer’s Public Comments.

      (3) The reversal potential for GABA-A receptor-mediated currents would be good to set to measured values. In addition, I would use AMPA conductance values that have been measured in the BLA. 

      We addressed this objection in our response to point 4 of the Reviewer’s Public Comments.

      Reviewer #3 (Public comments):

      Weaknesses: 

      (1) The main weakness of the approach is the lack of experimental data from the BLA to constrain the biophysical models. This forces the authors to use models based on other brain regions and leaves open the question of whether the model really faithfully represents the basolateral amygdala circuitry. 

      (2) Furthermore, the authors chose to use model neurons without a representation of the morphology. However, given that PV+ and SOM+ cells are known to preferentially target different parts of pyramidal cells and given that the model relies on a strong inhibition form SOM to silence pyramidal cells, the question arises whether SOM inhibition at the apical dendrite in a model representing pyramidal cell morphology would still be sufficient to provide enough inhibition to silence pyramidal firing.

      3) Lastly, the fear learning relies on the presentation of the unconditioned stimulus over a long period of time (40 seconds). The authors justify this long-lasting input as reflecting not only the stimulus itself but as a memory of the US that is present over this extended time period. However, the experimental evidence for this presented in the paper is only very weak.

      We are repeating here the answers we gave in response to the public comments, adding further relevant points.

      (1) Our neurons were constrained by electrophysiology properties in response to hyperpolarizing currents in the BLA (Sosulina et al., 2010). We can reproduce these electrophysiological properties by using specific membrane currents known to be present in similar neurons in other brain regions (D-current in VIP interneurons in the cortex, and NaP- and H-currents in OLM/SOM cells in the hippocampus). Also, though a much more detailed description of BLA interneurons was given in (Vereczki et al., 2021), it is not clear that this level of detail is relevant to the questions that we were asking, especially since the experiments described were not done in the context of fear learning.

      (2) It is true that we did not include the morphology, which undoubtedly makes a difference to some aspects of the circuit dynamics. Furthermore, it is correct that the model relies on a strong inhibition from SOM and PV to silence the excitatory projection neurons. We agree that the placement of the SOM inhibition on the pyramidal neurons can make a difference on some aspects of the circuit behavior. We are assuming that the inhibition from the SOM cells can inhibit the pyramidal cells firing, which can be seen as a hypothesis of our model. It is well known that VIP cells disinhibit pyramidal cells through inhibition of SOM and PV cells (Krabbe et al. 2019); hence, this hypothesis is generally believed. This choice of parameters comes from using simplified models: it is standard in modeling to adjust parameters to compensate for simplifications.

      Re points 1) and 2), in a new paragraph (“Assumptions and predictions of the model”) in the Discussion reported in response to Reviewer #2 (public comments)’s point 3, we stated that modeling requires the omission of many details to bring out the significance of other details.

      (3) 40 seconds is the temporal interval we decided to use to present the results. In the Results, we also showed that there is learning over a shorter interval of time (15 seconds) where CS and US/memory of US should both be present. Thus, our model requires 15 seconds over a single or multiple trials for associative learning to be established. We included references to additional experimental papers to support our reasoning in the last paragraph of section “Assumptions and predictions of the model” in the Discussion, also reported in response to Reviewer #1 point 2 (Recommendations for the Authors). We said there that some form of memory or overlap in the activity of the excitatory projection neurons is necessary for spike-timing-dependent plasticity.

      The authors achieved the aim of constructing a biophysically detailed model of the BLA not only capable of fear learning but also showing spectral signatures seen in vivo. The presented results support the conclusions with the exception of a potential alternative circuit mechanism demonstrating fear learning based on a classical Hebbian (i.e. non-depression-dominated) plasticity rule, which would not require the intricate interplay between the inhibitory interneurons. This alternative circuit is mentioned but a more detailed comparison between it and the proposed circuitry is warranted.

      Our model accounts for the multiple rhythms observed in the context of fear learning, as well as the known involvement of multiple kinds of interneurons. We did not say explicitly enough why our complicated model may be functionally important in ways that cannot be fulfilled with a simpler model with the non depression-dominated Hebbian rule. To explain this, we have added the following in the manuscript discussion: 

      “Although fear learning can occur without the depression-dominated rule, we hypothesize that it is necessary for other aspects of fear learning and regulation. That is, in pathological cases, there can be overgeneralization of learning. We hypothesize that the modulation created by the involvement of these interneurons is normally used to prevent such overgeneralization. However, this is beyond the scope of the present paper.”

      We have also written an extra paragraph about generalization in the Discussion “Synaptic plasticity in our model”:

      “With the classical Hebbian plasticity rule, we show that learning can occur without the involvement of the VIP and SOM cells. Although fear learning can occur without the depressiondominated rule, we hypothesize that the latter is necessary for other aspects of fear learning and regulation. Generalization of learning can be pathological, and we hypothesize that the modulation created by the involvement of VIP and SOM interneurons is normally used to prevent such overgeneralization. However, in some circumstances, it may be desirable to account for many possible threats, and then a classical Hebbian plasticity rule could be useful. We note that the involvement or not of the VIP-SOM circuit has been implicated when there are multiple strategies for solving a task (Piet et al., 2024). In our situation, the nature of the task (including reward structure) may determine whether the learning rule is depression-dominated and therefore whether the VIP-SOM circuit plays an important role.”

      Reviewer #3 (Recommendations For The Authors): 

      We thank the Reviewer for all the recommendations. We replied to each of them below.

      In general, there are some inconsistencies in the naming (e.g. sometimes you write PV sometimes PV+,...), please use consistent abbreviations throughout the manuscript. You also introduce some of the abbreviations multiple times. 

      We modified the manuscript to remove all the inconsistencies in the naming. 

      Introduction: 

      - In the last section you speak about one recent study but actually cite two articles. 

      We removed the reference to (Perrenoud and Cardin, 2023), which is a commentary on the Veit et al. article.

      Results: 

      - 'Brain rhythms are thought to be encoded and propagated largely by interneurons' What do you mean by encoded here? 

      We agree with the Reviewer that the verb “to encode” is not accurate. We modified the sentence as follows:

      “Brain rhythms are thought to be generated and propagated largely by interneurons”.

      - The section 'Interneurons interact to modulate fear neuron output' could be clearer. Start with describing the elements of the circuit, then the rhythms in the baseline. 

      We reorganized the section as follows:

      “Interneurons interact to modulate fear neuron output. Our BLA network consists of interneurons, detailed in the previous section, and excitatory projection neurons (Fig. 2A). Both the fear-encoding neuron (F), an excitatory projection neuron, and the VIP interneuron are activated by the noxious stimulus US (Krabbe et al., 2019). As shown in Fig. 2A (top, right), VIP disinhibits F by inhibiting both SOM and PV, as suggested in (Krabbe et al., 2019). We do not include connections from PV to SOM and VIP, nor connections from SOM to PV and VIP, since those connections have been shown to be significantly weaker than the ones included (Krabbe et al., 2019). The simplest network we consider is made of one neuron for each cell type. We introduce a larger network with some heterogeneity in the last two sections of the Results.

      Fig. 2A (bottom) shows a typical dynamic of the network before and after the US input onset, with US modeled as a Poisson spike train at ~50 Hz; the network produces all the rhythms originating from the interneurons alone or through their interactions with the excitatory projection neurons (shown in Fig. 1). Specifically, since VIP is active at low theta during both rest and upon the injection of US, it then modulates F at low theta cycles via SOM and PV. In the baseline condition, the VIP interneuron has short gamma bursts nested in low theta rhythm. With US onset, VIP increases its burst duration and the frequency of low theta rhythm. These longer bursts make the SOM cell silent for long periods of each low theta cycle, providing F with windows of disinhibition and contributing to the abrupt increase in activity right after the US onset. Finally, in Fig. 2A, PV lacks any external input and fires only when excited by F. Thanks to their reciprocal interactions, PV forms a PING rhythm with F, as depicted in Fig.1C.”

      - Figure 3C: The lower dashed line has the tick label '0.37' which should read '0.037'. 

      We fixed it.

      - The section describing the network with multiple neurons could be clearer, especially, it is not really clear how these different ECS and F neurons receive their input. 

      We answered the same objection in the reply to Reviewer #1 in point 2 under “minor issues.”

      Discussion: 

      - The paragraph 'It has also been suggested that ventral tegmental area has a role in fear expression (Lesas et al.,2023). Furthermore, it has been reported that the prelimbic cortex (PL) modulates the BLA SOM cells during fear retrieval, and the latter cells are crucial to discriminate non-threatening cues when desynchronized by the PL inputs (Stujenske et al., 2022).' is merely stating facts but I don't see how they relate to the presented work. 

      We thank the Reviewer for pointing out that this was confusing. What we meant to emphasize was that later stages of fear conditioning and extinction appear to require more than the BLA. We specifically mention the discrimination of non-threatening cues at the end of the paragraph, which now reads as follows:

      “Other brain structures may be involved in later stages of fear responsiveness, such as fear extinction and prevention of generalization. It has been reported that the prelimbic cortex (PL) modulates the BLA SOM cells during fear retrieval, and the latter cells are crucial to discriminate non-threatening cues when desynchronized by the PL inputs (Stujenske et al., 2022). Brain structures such as the prefrontal cortex and hippocampus have been documented to play a crucial role also in fear extinction, the paradigm following fear conditioning aimed at decrementing the conditioned fearful response through repeated presentations of the CS alone. As reported by several studies, fear extinction suppresses the fear memory through the acquisition of a distinct memory, instead of through the erasure of the fear memory itself (Harris et al., 2000; Bouton, 2002; Trouche et al., 2013; Thompson et al., 2018). Davis et al., 2017 found a high theta rhythm following fear extinction that was associated with the suppression of threat in rodents. Our model can be extended to include structures in the prefrontal cortex and the hippocampus to further investigate the role of rhythms in the context of discrimination of non-threatening cues and extinction. We hypothesize that a different population of PV interneurons plays a crucial role in mediating competition between fearful memories, associated with a low theta rhythm, and safety memories, associated with a high theta rhythm; supporting experimental evidence is in (Lucas et al., 2016; Davis et al., 2017; Chen et al., 2022).”

      - The comparison to other models BLA is quite short and seems a bit superficial. A more indepth comparison seems warranted. 

      We thank the reviewer for suggesting that a more in-depth comparison between our and other models in the literature would improve the manuscript. We rewrote entirely the first paragraph of that section. The new content reads as follows:

      “Comparison with other models. Many computational models that study fear conditioning have been proposed in the last years; the list includes biophysically detailed models (e.g., (Li 2009; Kim et al., 2013a)), firing rate models (e.g., Krasne 2011; Ball 2012; Vlachos 2011), and connectionist models (e.g., Moustafa 2013; Armony 1997; Edeline 1992) (for a review see (Nair et al., 2016)). Both firing rate models and connectionist models use an abstract description of the interacting neurons or regions. The omission of biophysical details prevents such models from addressing questions concerning the roles of dynamics and biophysical details in fear conditioning, which is the aim of our model.  There are also biophysically detailed models (Li 2009; Kim 2013; Kim 2016; Feng 2019), which differ from ours in both the physiology included in the model and the description of how plastic changes take place.  One main difference in the physiology is that we differentiated among types of interneurons, since the fine timing produced for the latter was key to our use of rhythms to produce spike-time dependent plasticity. The origin of the gamma rhythm (but not the other rhythms) was investigated in Feng et al 2019, but none of these papers connected the rhythms to plasticity.

      The most interesting difference between our work and that in (Li 2009; Kim 2013; Kim 2016) is the modeling of plasticity.  We use spike-time dependent plasticity rules.  The models in (Li 2009; Kim 2013; Kim 2016) were more mechanistic about how the plasticity takes place, starting with the known involvement of calcium with plasticity.  Using a hypothesis about back propagation of spikes, the set of papers together come up with a theory that is consistent with STDP and other instantiations of plasticity (Shouval 2002a; Shouval 2002b).  For the purposes of our paper, this level of detail, though very interesting, was not necessary for our conclusions.  By contrast, in order for the rhythms and the interneurons to have the dynamic roles they play in the model, we needed to restrict our STDP rule to ones that are depression-dominated.  Our reading of (Shouval 2002) suggests to us that such subrules are possible outcomes of the general theory.  Thus, there is no contradiction between the models, just a difference in focus; our focus was on the importance of the much-documented rhythms (Seidenbecher et al., 2003; Courtin et al., 2014b; Stujenske et al., 2014; Davis et al., 2017) in providing the correct spike timing.  We showed in the Supplementary Information (“Classical Hebbian plasticity rule, unlike the depression-dominated one, shows potentiation even with no strict pre and postsynaptic spike timing”) that if the STDP rule was not depression dominated, the rhythms need not be necessary.  We hypothesize that the necessity of strict timing enforced by the depression-dominated rule may foster the most appropriate association with fear at the expense of less relevant associations.”

      - The paragraph 'This could happen among some cells responding to weaker sensory inputs that do not lead to pre-post timing with fear neurons. This timing could be modified by the "triconditional rule", as suggested in (Grewe et al., 2017).' is not very clear. What exactly is 'this' in the first sentence referring to? If you mention the 'tri-conditional rule' here, please briefly explain it and how it would solve the issue at hand here.  

      We apologize that the sentence reported was not sufficiently clear. “This” refers to “depression”. We meant that, in our model, depression during fear conditioning happens every time there is no pre-post timing between neurons encoding the neutral stimuli and fear cells; poor pre-post timing can characterize the activity of neurons responding to weaker sensory inputs and does not lead to associative learning. We modified that paragraph as follows:

      “The study in (Grewe et al., 2017) suggests that associative learning resulting from fear conditioning induces both potentiation and depression among coactive excitatory neurons; coactivity was determined by calcium signaling and thus did not allow measurements of fine timing between spikes. In our model, we show how potentiation between coactive cells occurs when strict pre-post spike timing and appropriate pauses in the spiking activity arise. Depression happens when one or both of these components are not present. Thus, in our model, depression represents the absence of successful fear association and does not take part in the reshaping of the ensemble encoding the association, as instead suggested in (Grewe et al., 2017). A possible follow-up of our work involves investigating how fear ensembles form and modify through fear conditioning and later stages. This follow-up work may involve using a tri-conditional rule, as suggested in (Grewe et al. 2017), in which the potential role of neuromodulators is taken into account in addition to the pre- and postsynaptic neuron activity; this may lead to both potentiation and depression in establishing an associative memory.”

      - In the limitations and caveats section you mention that the small size of the network implies that they represent a synchronous population. What are the potential implications for the proposed rhythm-dependent mechanism? What are your expectations for larger networks? 

      We apologize if we were not adequately clear. We are guessing that the Reviewer thought we meant the entire population was synchronous, which it is not. We meant that, when we use a single cell to represent a subpopulation of cells of that type, that subpopulation is effectively synchronous. For larger networks in which each subtype is represented by many cells, there can be heterogeneity within each subtype. We have shown in the paper that the basic results still hold under some heterogeneity; however, they may fail if the heterogeneity is too large.

      We mentioned in a new section named “Assumptions and predictions of the model” in response to point 3 made by Reviewer #2.

      - The discussion is also missing a section on predictions/new experiments that can be derived from the model. How can the model be confirmed, what experiments/results would break the model? 

      To answer this question, we put in a new section in the Discussion entitled “Assumptions and predictions of the model”. The first paragraph of this section is in the reply to Reviewer #2 point 2; the second paragraph is in the reply to Reviewer #2 point 3; the last paragraph is in the Reply to Reviewer #1 point c; the rest of the section reads as follows:

      “Our study suggests that all the interneurons are necessary for associative learning provided that the STDP rule is depression-dominated. This prediction could be tested experimentally by selectively silencing each interneuron subtype in the BLA: if the associative learning is hampered by silencing any of the interneuron subtypes, this validates our study. Finally, the model prediction could be tested indirectly by acquiring more information about the plasticity rule involved in the BLA during associative learning. We found that all the interneurons are necessary to establish fear learning only in the case of a depression-dominated rule. This rule ensures that fine timing and pauses are always required for potentiation: interneurons provide both fine timing and pauses to pyramidal cells, making them crucial components of the fear circuit. 

      The modeling of the interneurons assumes the involvement of various intrinsic currents; the inclusion of those currents can be considered hypotheses of the model. Our model predicts that blockade of D-current in VIP interneurons (or silencing VIP interneurons) will both diminish low theta and prevent fear learning. Finally, the model assumes the absence of significantly strong connections from the excitatory projection cells ECS to PV interneurons, unlike the ones from F to PV. Including those synapses would alter the PING rhythm created by the interactions between F and PV, which is crucial for fine timing between ECS and F needed for LTP.”

    1. We would like to thank you and the reviewers for your thoughtful comments that assisted us to improve the manuscript. We carefully followed the reviewers’ recommendations and provide a detailed point-by-point account of our responses to the comments. 

      Please find below the important changes in the updated manuscript.

      (1) We changed the title according to the comments provided by reviewer #1.

      (2) We edited the introduction, results, and discussion to improve the link between the objectives of the study, the findings, and their discussion, as reviewer #2 recommended.

      (3) We clarified the link between camouflage and fitness, which is now presented as a hypothesis, as reviewer #1 suggested.

      (4) We added new analyses and figures in the main text and in the supplementary materials to better emphasize sex differences in landing force, foraging strategies and hunting success, following reviewer #1 suggestion.

      (5) According to reviewer #2 comments, we edited the results adding key information about methods to help the reader understand the findings without reading the Methods section.

      (6) We added important details about the model selection approach along with a discussion of the low R-square values reported in our analyses on hunting success, as reviewer #2 suggested.

      eLife assessment 

      This fundamental work substantially advances our understanding of animals' foraging behaviour, by monitoring the movement and body posture of barn owls in high resolution, in addition to assessing their foraging success. With a large dataset, the evidence supporting the main conclusions is convincing. This work provides new evidence for motion-induced sound camouflage and has broad implications for understanding predator-prey interactions. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this paper, Schalcher et al. examined how barn owls' landing force affects their hunting success during two hunting strategies: strike hunting and sit-and-wait hunting. They tracked tens of barn owls that raised their nestlings in nest boxes and utilized high-resolution GPS and acceleration loggers to monitor their movements. In addition, camcorders were placed near their nest boxes and used to record the prey they brought to the nest, thus measuring their foraging success. 

      This study generated a unique dataset and provided new insights into the foraging behavior of barn owls. The researchers discovered that the landing force during hunting strikes was significantly higher compared to the sit-and-wait strategy. Additionally, they found a positive relationship between landing force and foraging success during hunting strikes, whereas, during the sit-and-wait strategy, there was a negative relationship between the two. This suggests that barn owls avoid detection by generating a lower landing force and producing less noise. Furthermore, the researchers observed that environmental characteristics affect barn owls' landing force during sit-and-wait hunting. They found a greater landing force when landing on buildings, a lower landing force when landing on trees, and the lowest landing force when landing on poles. The landing force also decreased as the time to the next hunting attempt decreased. These findings collectively suggest that barn owls reduce their landing force as an acoustic camouflage to avoid detection by their prey. 

      The main strength of this work is the researchers' comprehensive approach, examining different aspects of foraging behavior, including high-resolution movement, foraging success, and the influence of the environment on this behavior, supported by impressive data collection. The weakness of this study is that the results only present a partial biological story contained within the data. The focus is on acoustic camouflage without addressing other aspects of barn owls' foraging strategy, leaving the reader with many unanswered questions. These include individual differences, direct measurements of owls' fitness, a detailed analysis of the foraging strategy of males and females, and the collective effort per nest box. However, it is possible that these data will be published in a separate paper. 

      We greatly appreciate your recognition of the comprehensive approach and extensive data collection. Our primary objective was to study the role of acoustic camouflage. Nonetheless, the manuscript now includes a detailed analysis of the foraging strategy and hunting success of males and females (lines 164-225).

      The results presented support the authors' conclusion that lower landing force during sit-andwait hunting increases hunting success, likely due to a decreased probability of detection by their prey, resulting in acoustic camouflage. The authors also argue that hunting success is crucial for survival, and thus, acoustic camouflage has a direct link to fitness. While this statement is reasonable, it should be presented as a hypothesis, as no direct evidence has been provided here.

      Thank you for the comment. We agree and thus have edited the language accordingly.  

      However, since information about nestling survival is typically monitored when studying behavior during the breeding period, the authors' knowledge of the effect of acoustic camouflage on owls' fitness can probably be provided. Furthermore, it will be interesting to further examine the foraging strategies used by different individuals during foraging, the joint foraging success of both males and females within each nest box, and the link between landing force and foraging success if the data are available.

      We are currently writing a manuscript on these topics. We are aware that several scientific questions regarding the foraging ecology of the barn owl still need our attention. Regarding the link between landing force and foraging success, we believe that our revised manuscript addresses this specific topic, please see specific responses below.

      However, even without this additional analysis on survival, this paper provides an unprecedented dataset and the first measurement of landing force during hunting in the wild. It is likely to inspire many other researchers currently studying animal foraging behavior to explore how animals' movements affect foraging success.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors provide new evidence for motion-induced sound camouflage and can link the hunting approach to hunting success (detailing the adaptation and inferring a fitness consequence). 

      Strengths: 

      Strong evidence by combining high-resolution accelerometer data with a ground-truthed data set on prey provisioning at nest boxes. A good set of co-variates to control for some of the noise in the data provides some additional insights into owl hunting attempts. 

      Weaknesses: 

      There is a disconnect between the hypotheses tested and the results presented, and insufficient detail is provided on the statistical approach. R2 values of the presented models are very small compared to the significance of the effect presented. Without more detail, it is impossible to assess the strength of the evidence.

      In the revised manuscript, we changed the way results are presented and we improved the link between the hypotheses and the results. The R2 values are indeed small. It is however important to keep in mind that we are assessing the outcome of one specific behavior (i.e. landing force during sit-and-wait hunts) on hunting success in a wild environment, where many complex ecological interactions likely influence hunting success. Nonetheless, the coefficients (as reported in the results) show that for every 1 N increase in landing force, there is a 15% reduction in hunting success, which is substantial. In the discussion we also note that 50 Hz is a relatively low sampling frequency for estimating the peak ground reaction force. We have gone back over the presentation of our results and made our discussion more nuanced to acknowledge this aspect. 

      We have also added a detailed description about our model selection process in the methods section and provide a model selection table for each analysis in the supplementary materials.

      The authors seem to overcome persisting challenges associated with the validation and calibration of accelerometer data by ground-truthing on-board measures with direct observations in captivity, but here the methods are not described any further and sample sizes (2 owls - how many different loggers were deployed?) might be too small to achieve robust behavioural classifications.

      Thank you for the comment. Details of our methods of behavioural identification are provided in lines 385 – 429. There are two reasons why our results should not be limited by the sample size. First, we used the temporal sequence of changes in acceleration, and rates of change in acceleration data, which make the methods robust to individual differences in acceleration values. Furthermore, our methods for behavioural identification were not based on machine learning. Instead, we use a Boolean based approach (as described in Wilson et al. 2018. MEE), which is more robust to small differences in absolute values that might occur e.g. in relation to slight changes in device position. 

      Recommendation for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Comment 1. This study provides new insights into animals' foraging behavior and will probably inspire other researchers to examine foraging behavior in such high resolution.

      We hope so, thank you.

      Comment 2. However, it is necessary to describe better the measured landing force and the hunting strike and perching behavior so the readers can understand these methods when reading the results (and without reading the Methods).

      We have now changed the text in the “Results” to help the reader understand the key methods while reading the results.

      Comment 3. In addition, make sure you use the same terminology for hunting strategies during the entire paper and especially in all figures and corresponding result descriptions.

      We now use consistent terminology throughout the text and figures. We hope that this is now clear in the revised manuscript.

      Comment 4. In addition, although I find your statement about the link between acoustic camouflage and fitness reasonable, it should be described as a hypothesis or examined if you want to keep the direct link statement. I believe showing a direct link can add an additional outstanding aspect to this paper, but I also understand that it can be addressed in a separate paper.

      We agree that the relationship between hunting success and barn owl fitness is an important topic, but it necessitates a consideration of both hunting strategies, including hunting on the wing, which extends beyond the limits of our current study. Indeed, our primary objective was to conduct a detailed examination of the interplay between acoustic camouflage and the success of the sit-and-wait technique.

      However, we have edited the manuscript to explicitly describe the link between acoustic camouflage and fitness as a hypothesis. We believe this adjustment provides a more accurate representation of our approach. We hope this clarifies the specific emphasis of our work and its contribution to the understanding of barn owl hunting behavior.

      Here are my detailed comments about the paper: 

      Comment 5. Title: Consider changing the title to "Acoustic camouflage predicts hunting success in a wild predator." 

      We would like to thank you for your nice proposition. However, we opted for a different title, which is now “Landing force reveals new form of motion-induced sound camouflage in a wild predator”.

      Comment 6. Line 91-93: Please provide additional information about the collected dataset, including: 

      Description of the total period of observations, an average and standard deviation of perching and hunting attempt events per individual per night, number of foraging trips per individual per night, details about the geographic location and characteristics of the habitat, season, and reproductive state. 

      The revised manuscript now includes detailed information about the collected dataset (i.e. study area, reproductive state, etc…). “We used GPS loggers and accelerometers to record high resolution movement data during two consecutive breeding seasons (May to August in 2019 and 2020) from 163 wild barn owls (79 males and 84 females) breeding in nest boxes across a 1,000 km² intensive agricultural landscape in the western Swiss plateau.” Results section, lines 79 – 82

      Details about the number of foraging trips per individuals and per night are now presented in the results: “Sexual dimorphism in body mass was marked among our sampled individuals. Males were lighter than females (84 females, average body mass: 322 ± 22.6 g; 79 males, average body mass 281 ± 16.5 g, Fig S6) and provided almost three times more prey per night than females (males: 8 ± 5 prey per night; females: 3 ± 3 prey per night; Fig.S7). Males also displayed higher nightly hunting effort than females (Males: 46 ± 16 hunting attempts per night, n= 79; Females: 25 ± 11 hunting attempts per nights, n=84; Fig. 3A, Fig S8). However, females were more likely to use a sit and wait strategy than males (females: 24% ± 15%, males: 13% ± 10%, Fig.S9). As a result, the number of perching events per night was similar between males and females (Females: 76 ± 23 perching events per nights; Males: 69 ± 20 perching events per night; Fig S8).” (lines 165 – 174) 

      Comment 7. In addition, state if the information describes breeding pairs of males and females and provides statistics on the number of tracked pairs and the number of nest boxes.

      The revised manuscript now includes a description of the number of tracked breeding pairs and the number of nest boxes. “Of these individuals, 142 belonged to pairs for which data were recovered from both partners (71 pairs in total, 40 in 2019, 31 in 2020). The remaining 21 individuals belonged to pairs with data from one partner (11 females and 1 male in 2019; 4 females and 5 males in 2020).” (lines 82 – 85.)

      Comment 8. Line 93: Briefly define the term "landing force" and explain how it was measured (and let the reader know that there is a detailed description in the Methods).

      We now include a brief definition of the “landing force” along with a brief explanation of how it was measured in the results section. “We extracted the peak vectoral sum of the raw acceleration during each landing and converted this to ground reaction force (hereafter “landing force”, in Newtons) using measurements of individual body mass (see methods for detailed description).” (lines 92 – 95).

      Comment 9. Line 94: All definitions, including "pre-hunting force," need to be better described in the Results section.

      Thank you for this suggestion. We now provided a better description of those key definitions directly in the results section: 

      Measurement of landing force: “Barn owls employing a sit-and-wait strategy land on multiple perches before initiating an attack, with successive landings reducing the distance to the target prey (Fig. 2C). 

      We used the acceleration data to identify 84,855 landings. These were further categorized into perching events (n = 56,874) and hunting strikes (n = 27,981), depending whether barn owls were landing on a perch or attempting to strike prey on the ground (Fig. 1A and B, see methods for specific details on behavioral classification).” (lines 88 – 95)

      Pre-hunt perching force predicts hunting success: “Finally, we analyzed whether the landing force in the last perching event before each hunting attempt (i.e. pre-hunt perching force) predicted variation in hunting success” (lines 229 – 230)

      Comment 10. Line 102: Remove "Our analysis of 27,981 hunting strikes showed that" and add "n = 27,981" after the statistics. You have already stated your sample size earlier. There is no need to emphasize it again, although your sample size is impressive.

      We modified the text in the results section as suggested.

      Comment 11. Line 104: The results so far suggest that the difference in landing force between males and females is an outcome of their different body masses. However, it is not clear what is the reason for the difference in the number of hunting strike attempts between males and females (Lines 104-106). Can you compare the difference in landing force between males and females with similar body mass (females from the lower part of the distribution and males from the upper part)? Is there still a difference?

      Thank you, following your comment we made some new analyses that clarified the situation around landing force involved in perching and hunting strike events between sexes. But firstly, we wanted to clarify why there is a difference in number of hunting attempts between males and females. During the breeding season, females typically perform most of the incubation, brooding, and feeding of nestlings in the nest, while the male primarily hunts food for the female and chicks. The female supports the male providing food in a very irregular way, and this changes from pair to pair (paper in prep.). The differences in number of hunting attempts between males and females reflects this asymmetry in food provisioning between sexes during this specific period. We specified this in the revised version of the manuscript (lines 164 – 174). 

      We also provide a new analysis to investigate sex differences in mass-specific landing force (force/body mass). We found that males and females produce similar force per unit of body mass during perching events. This demonstrates that the overall higher perching force in females (see Fig. 4C in the manuscript) is therefore driven by their higher body mass. (lines 194 – 199)

      Comment 12. Line 154: I believe Boonman et al. (2018) is relevant to this part of the discussion. Boonman, Arjan, et al. found that barn owl noise during landing and taking off is worth considering. ["The sounds of silence: barn owl noise in landing and taking off."

      Behavioral Processes 157 (2018): 484-488.]

      We now cited this paper in the discussion.

      Comment 13. Line 164: Your results do not directly demonstrate a link to fitness, although they potentially serve as a proxy for fitness (add a reference). However, you might have information regarding nestlings' survival - that will provide a direct link for fitness. Change your statement or add the relevant data.

      We appreciated your feedback, and we adjusted the language accordingly.

      Comment 14. Line 213: If the poles are closer to the ground - is it possible that the higher trees and buildings serve for resting and gathering environmental information over greater distances? For example, identifying prey at farther distances or navigating to the next pole?

      Yes, this is indeed the most likely explanation for the fact that owls land more on buildings and trees than on poles until the last period (about 6 minutes) before hunting. In these last minutes, barn owls preferentially use poles, as we showed in figure 2B. The revised manuscript now includes this explanation in the discussion (lines 269 – 284).

      Comment 15. Line 250: The product "AXY-Trek loggers" does not appear on the Technosmart website (there are similar names, but not an exact match). Are you sure this is the correct name of the tracking device you used? 

      Thank you for pointing out this detail that we missed. The device we used is now called "AXY-Trek Mini" (https://www.technosmart.eu/axy-trek-mini/). We have corrected this error directly in the revised manuscript.

      Comment 16. Line 256: Please explain how the devices were recovered. Did you recapture the animals? If so, how? Additionally, replace "after approximately 15 days" with the exact average and standard deviation. Furthermore, since you have these data, please state the difference in body mass between the two measurements before and after tagging.

      The birds were recaptured to recover the devices. Adults barn owls were recaptured at their nest sites, again using automatic sliding traps that are activated when birds enter the nest box. The statement "after approximately 15 days" was replaced by the exact mean and standard deviation, which were 10.47 ± 2.27 days. Those numbers exclude five individuals from the total of 163 individuals included in this study. They could not be recaptured in the appropriate time window but were re-encountered when they initiated a second clutch later in the season (4 individuals) or a new clutch the year after (1 individual).

      We integrated this previously missing information in the revised manuscript (lines 370 – 372).

      Comment 17. Line 259: What was the resolution of the camera? What were the recording methods and schedule? How did you analyze these data? 

      The resolution was set to 3.1 megapixel. Motion sensitive camera traps were installed at the entrance to each nest box throughout the period when the barn owls were wearing data loggers, and each movement detected triggered the capture of three photos in bursts. The photos recorded were not analyzed as such for this study, but were used to confirm each supply of prey, which had previously been detected from the accelerometer data. We added these details in the revised manuscript (lines 377 – 380)

      Comment 18_1. Figure 1: 

      Panel A) Include the sex of the described individual. 

      The sex of the described individual is now included in the figure caption.

      Comment 18_2. It would be interesting to show these data for both males and females from the same nest box (choose another example if you don't have the data for this specific nest box). 

      Although we agree that showing tracks of males and females from the same nest is very interesting, the purpose of this figure was to illustrate our data annotation process and we believe that adding too many details on this figure will make it appear messy. However, the revised manuscript now includes a new figure (Fig. 3A) which shows simultaneous GPS tracks of a male and a female during a complete night, with detailed information about perching and hunting behaviors.

      Comment 18_3. Add the symbol of the nest box to the legend. 

      Done

      Comment 18_4. Provide information about the total time of the foraging trip in the text below. 

      The duration of the illustrated foraging trip has been included in the figure caption.

      Comment 18_5. To enhance the figure’s information on foraging behavior, consider color coding the trajectory based on time and adding a background representing the landscape. Since this paper may be of interest to researchers unfamiliar with barn owl foraging behavior, it could answer some common questions. 

      For similar reasons explained in our answer above (Comment 18_2), we would rather keep this figure as clean as possible. However, we followed your recommendations and included these details in the new Figure 3 described above. In this new figure, GPS tracks are color coded according to the foraging trip number and includes a background representing the landscape. To provide even more detail about the landscape, we added another figure in the supplementary materials (Fig. S2) which provides illustration of barn owls foraging ground and nest site that we think might be of interest for people unfamiliar with barn owls.

      Comment 18_6. Inset panels) provide a detailed description of the acceleration insert panels. 

      Done

      Comment 18_7. Color code the acceleration data with different colors for each axis, add x and y axes with labels, and ensure the time frame on the x-axis is clear. How was the self-feeding behavior verified (should be described in the methods section)? 

      We kept both inset panels as simple as possible since they serve here as examples, but a complete representation of these behaviors (with time frame, different colors and labels) is provided in the supplementary materials (figure S3). We included this statement in the figure caption and added a reference to the full representations from the supplementary materials: 

      In the Figure caption: “Inset panels show an example of the pattern of the tri-axial acceleration corresponding to both nest-box return and self-feeding behaviors (but see Fig S3for a detailed representation of the acceleration pattern corresponding to each behavior).” 

      In the Method section: “Self-feeding was evident from multiple and regular acceleration peaks in the surge and heave axes (resulting in peaks in VeDBA values > 0.2 g and < 0.9 g, Fig.S3D), with each peak corresponding to the movement of the head as the prey was swallowed whole.”.

      Comment 18_8. Panel B) Note in the caption that you refer to the acceleration z-axis.

      We believe that keeping the statement “the heave acceleration…” in the figure caption is more informative than referring to the “z-axis” as it describes the real dimension to which we are referring. The use of the x, y and z axes can be misleading as they can be interchanged depending on the type and setting of recorders used.

      Comment 18_9. Present the same time scale for both hunting strategies to facilitate comparison. You can achieve this by showing only part of the flight phase before perching. 

      Done

      Comment 18_10. Panel C) Presenting the data for both hunting strategy and sex would provide more comprehensive information about the results and would be relatively easy to implement. 

      We agree with your comment. We present the differences in landing force for both landing contexts and sexes in the new Figure 3 as well as in the supplementary materials (Figure S10) of this revised manuscript.

      Comment 19. Figure 2: Please provide an explanation of the meaning of the circles in the figure caption.  

      Done

      Comment 20. Figure 3: 

      Panel A) It is unclear how the owl illustration is relevant to this specific figure, unlike the previous figures where it is clear. Also, suggest removing the upper black line from the edge of the figure or add a line on the right side. 

      Done (now in Figure 2).

      Panel B) "Density" should be capitalized. 

      Done

      Panel C) Add a scale in meters, and it would be helpful to include an indication of time before hunting for each data point. 

      Done

      Comment 21. Figure S1: Mark the locations of the nest boxes and ensure that trajectories of different individuals and sexes can be identified. 

      The purpose of this figure was to show the spatial distribution of the data. We think that adding nest locations and coloring the paths according to individuals and/or sex will make the figure less clear. However, the new Figure 3 highlights those details.

      Comment 22. Figure S2: Show the pitch angle similarly to how you showed the acceleration axes, and explain what "VeDBA" stands for. Provide a description of the perching behavior, clearly indicating it on the figure. Add axes (x, y, z) to the illustration of the acceleration explanation. 

      We edited this figure (now figure S3) to show the pitch angle and provide an explanation of what “VeDBA” stands for in the figure caption. The figure caption now also provides a better description of the perching behavior. For the axes (i.e. X, Y, Z), we prefer to refer to the heave, surge, and sway as this is more informative and refers to what is usually reported in studies working with tri-axial accelerometers.

      Comment 23. Table S1: Improve the explanation in the caption and titles of the table. 

      Done

      Reviewer #2 (Recommendations For The Authors): 

      Comment 1. From the public review and my assessment there, the authors can be assured that I thoroughly enjoyed the read and am looking forward to seeing a revised and improved version of this paper. 

      We thank the reviewer for this comment. We revised the manuscript according to their comments.

      Comment 2. In addition to my major points stated above, I would like to add the following recommendations: 

      The manuscript is overall well written, but it uses a very pictorial language (a little as if we were in a David Attenborough documentary) that I find inappropriate for a research paper (especially in the abstract and introduction, "remarkable" (2x), "sophisticated" (are there any unsophisticated adaptations? We are referring to something under selection after all) etc.

      We appreciated that you found the paper overall well written, and we understand the comment about pictorial language. We therefore slightly changed the text to make sure that the adjective used to describe adaptive strategies are not over-emphasized.

      Comment 3. Abstract 

      "While the theoretical benefits of predator camouflage are well established, no study has yet been able to quantify its consequences for hunting success." - This claim is actually not fully true: 

      Nebel Carina, Sumasgutner Petra, Pajot Adrien and Amar Arjun 2019: Response time of an avian prey to a simulated hawk attack is slower in darker conditions, but is independent of hawk colour morph. Soc. open sci.6:190677 

      We edited our claim to specify that the consequences of predator camouflage on hunting success has never been quantified in natural conditions and cited the reference in the introduction.

      Comment 4. Line 23. Rephrase to: "We used high-resolution movement data to quantify how barn owls (Tyto alba) conceal their approach when using a sit-and-wait strategy, as well as the power exerted during strikes." 

      We edited this sentence in the abstract, as suggested.

      Comment 5. Results 

      There is a disconnect between the objectives outlined at the end of the introduction and the following results that should be improved. 

      The authors state: "Using high-frequency GPS and accelerometer data from wild barn owls (Tyto alba), we quantify the landing dynamics of this sit-and-wait strategy to (i) examine how birds adjust their landing force with the behavioral and environmental context and (ii) test the extent to which the magnitude of the predator cue affects hunting success." But one of the first results presented are sex differences. 

      This is a fair point. We have now changed our statement in the end of the introduction as well as the order of the results to improve the link between the objectives outlined in the introduction and the way result are presented. 

      Comment 6. At this stage, the reader does not even know yet that we are presented with a size-dimorphic species that also has very different parental roles during the breeding season. This should be better streamlined, with an extra paragraph in the introduction. And these sex differences are then not even discussed, so why bring them up in the first place (and not just state "sex has been fitted as additional co-variate to account for the size-dimorphism in the species" without further details). 

      We edited the way the objectives are outlined in the introduction to cover the size dimorphism (lines 70 – 76). We also completely changed the way the sex differences are presented in the results, including a new analysis that we believe provides a better comprehensive understanding of barn owl foraging behavior (lines 164 – 206). Finally, we added a new paragraph in the discussion to consider those results (lines 319 – 339).

      Comment 7. It is not clear to me where and how high-resolution GPS data were used? The results seem to concentrate on ACC – why GPS was used and how it features should be foreshadowed in a few lines in the introduction. I definitively prefer having the methods at the end of a manuscript, but with this structure, it is crucial to give the reader some help to understand the storyline. 

      GPS data were used to validate some behavioral classifications (prey provisioning for example), but most importantly they were used to link each landing event with perch types. We edited the text in the result section to clarify where GPS and/or ACC data were used.

      Comment 8. Discussion 

      Move the orca example further down, where more detail can be provided to understand the evidence. 

      After our extensive edits in the discussion, we felt this example was interrupting the flow. We now cite this study in the introduction. 

      Comment 9. Size dimorphism and evident sex differences are not discussed. 

      The revised manuscript now includes a new paragraph in the discussion in which sex differences are discussed (lines 319 – 339).

      Comment 10. Be more precise in the terminology used (for example, land use seems to be interchangeable with habitat characteristics?). 

      We modified “land use” with “habitat data” in the revised manuscript.

      Comment 11. Methods 

      Please provide a justification for the very high weight limit (5%; line 256). This limit is outdated and does not fulfill the international standard of 3% body weight. I assume the ethics clearance went through because of the short nature of the study (i.e., the birds were not burdened for life with the excess weight? But a line is needed here or under the ethics considerations to clarify this). 

      The 5% weight limit was considered acceptable due to the short deployment period, and we now edited the ethics statement to emphasize this point. However, it is important to note that there is no real international standard, with both 3% and 5% weight limits being commonly used. Both limits are arbitrary and the impact of a fixed mass on a bird varies with species and flight style. All owls survived and bred similarly to the non-tagged individuals in the population (lines 373 – 376 & lines 558 – 561)

      EDITORIAL COMMENT: We strongly encourage you to provide further context and clarification on this issue, as suggested by the Reviewer. On a related point, the ethics statement refers to GPS loggers, rather than GPS and ACC devices; we encourage you to clarify wording here.

      Thank you for highlighting this point that indeed needed some clarifications.

      Although we have used the terminology "GPS recorders", the authorization granted by the Swiss authorities for this study effectively covers the entire tracking system, which combines both GPS and ACC recorders in the same device. We have therefore changed the wording used in the ethics statement to avoid any misunderstanding (lines 373 – 376 & lines 558 – 561)

      Comment 12. Please provide more information on the model selection approach, what does "Non-significant terms were dropped via model simplification by comparing model AIC with and without terms." mean? Did the authors use a stepwise backward elimination procedure (drop1 function)? Or did they apply a complete comparison of several candidate models? I think a model comparison approach rather than stepwise selection would be more informative, as several rather than only one model could be equally probable. This might also improve model weights or might require a model averaging procedure - current reported R2values are very small and do not seem to support the results well. 

      We apologize for the lack of details about this important aspect of the statistical analysis. We applied an automated stepwise selection using the dredge function from the R package “MuMin”, therefore applying a complete comparison of several candidate models. The final models were chosen as the best models since the number of candidate models within ∆AIC<2 was relatively low in each analysis and thus a model averaging was not appropriate here. We edited the methods section to ensure clarity, and added model selection tables for each analysis, ranked according to AICc scores, in the supplementary materials (lines 532 – 552)

      In addition, we agree that the reported R-squared values in our analyses are quite low, specifically regarding the influence of pre-hunt perching force on hunting success (cond R2 = 0.04). Nonetheless, landing impact still has a notable effect size (an increase of 1N reduces hunting success by 15%). The reported values are indicative of the inherent complexity in studying hunting behavior in a wild setting where numerous variables come into play. We specifically investigated the hypothesis that the force involved during pre-hunt landings, and consequently the emitted noise, influences the success of the next hunting attempt in wild barn owls. Factors such as prey behavior and micro-habitat characteristics surrounding prey (such as substrate type and vegetation height) are most likely to be influential but hard, or nearly impossible, to model. We now cover this in a more nuanced way in the discussion (lines 266 – 268)

      Comment 13. Please explain why BirdID was nested in NightID - this is not clear to me.

      Probably here there is a misunderstanding because we wrote that we nested NightID in BirdID (and not BirdID in NightID). 

      Comment 14. I hope the final graphs and legends will be larger, they are almost impossible to read. 

      We enlarged the graphs and legends as much as possible to improve readability. However, looking at the graphs in the published version they seem clear and readable.

      Comment 15. Figure S1: Does "representation" mean the tracks don't show all of the 163 owls? If so, be precise and tell us how many are illustrated in the figure. 

      Figure S1 represent the tracks for each of the 163 barn owls used in the study. We changed the terminology used in the figure caption to avoid any misunderstanding.

      Comment 16. Figure S4: Please adjust the y-axis to a readable format. 

      Done

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1 comments:

      (1) SY1 aggregation enhances (in terms of number of aggregates) when Sphingolipid biosynthesis is blocked.

      a. Line no 132-133: I agree that there is circumstantial evidence that the maturation pathway of SY1 IB is perturbed by knocking down sphingolipid biosynthesis. However, to prove this formally, a time course of IB maturation needs to be reported in the knock-down strains.

      Please see Figure 2-figure supplement 1 for the time course of SY1 IB maturation in the knock-down strains. We have added the result to the manuscript, please see lines 129-131on page 5 in the revised version.

      b. It will be good to have formal evidence that sphingolipids are indeed downregulated when these genes are downregulated (knocked down).

      This issue has been clearly evidenced in previous reports, and we have added the appropriate references in the main text. For example, down-regulation of LCB1 or SPT in yeast decreased sphingolipid levels by Huang et al (https://doi.org/10.1371/journal.pgen.1002493). According to the report from Tafesse FG, et al (https://doi.org/10.1371/journal.ppat.1005188), in mammalian cells in which Sptlc2 was knocked down by CRISPR/Cas9, sphingolipid and glucosylceramide production is almost completely blocked. In addition, the levels of sphingosine, sphingomyelin, and ceramide were significantly lower compared to control cells. Please see lines 143-144 on pages 6 and lines 232-233 on pages 9 in the revised version.

      (2) In a normal cell (where sphingolipid biosynthesis is not hampered), the aggregate of SY1 (primarily the Class I aggregate) is localized only on the mitochondrial endomembrane system. These results have been published for other aggregation-prone proteins and are partly explained in the literature. However, their role in the context of maturation is relatively unclear. The authors however provide no strong evidence to show if mitochondria are preferentially involved in any of the stages of IB maturation. Specifically:

      a. Line 166-167: It is not clear from Figure 4B that this is indeed the case. Only the large IB seems to colocalize in all three panels (Class I, 2, 3) with Mitotracker. The smaller IBs in 2 and 3 do not show any obvious co-localization. It is also possible that they do co-localize, but it is not clear from the images. I would appreciate it if the authors either provide stronger evidence (better image) or revise this statement. This point is crucial in some claims made later in the manuscript. (pls see comment #5A).

      Based on the reviewer's suggestion, we replaced the images in Figure 4B. In addition, we added the 3D reconstruction results of the interrelationship between Class 3 and Mitotracker in Figure 4-figure supplement 1B, to further show their relationship.

      (3) The localization is due to the association of SY1 (aggregates) with mitochondrial proteins like Tom70, Tim44 etc. There are some critical points (that can strengthen the manuscript) that are not addressed here. Primarily, the important role of mitochondria in the context of toxicity is neglected. Although the authors have mentioned in the discussion that it was not their main focus, I believe that this is the novel part of the manuscript and this part is potentially a beautiful addition to literature. The questions I found unanswered are:

      a. Is the localization completely lost upon deleting these genes? I see only a partial loss in shape/localization. This is not properly explained in the manuscript. The shape of the IB seems to remain intact while the localization is slightly altered. This indicates that even when sphingolipid is present, SY1 localization is dictated by the (lipid-raft embedded) proteins. Interestingly, it shows that even in the absence of mitochondrial localization the shape of the aggregates is not altered in these deletion strains! How do the authors explain this if mitochondrial surface sphingolipids are important for IB maturation? (the primary screen found that sphingolipid biosynthesis promotes the formation of Class I IBs).

      We agree that mutation in one mitochondrial binding protein only a partial loss in shape/localization, and we have replaced “association” with “surrounding” in the manuscript. Please see lines 163-166 on page 6 in the revised version. In mutants that interact with SY1, we counted the proportion of Class 3 aggregates formed by SY1 and found an increase in the proportion of SY1 Class 3 aggregates in the deletion mutants compared to controls, partially lost interaction of SY1 with mitochondria has effect on shape of aggregates, as detailed in line 184 on page 7 and Figure 4-figure supplement 1D. We think that SY1 interactions with mitochondrial proteins are important for the localization of SY1 IB in mitochondria, whereas sphingolipids play an important role in facilitating the formation of Class 1 IBs from Class 3 aggregates.

      b. What happens to the toxicity when the aggregates are not localized on mitochondria?

      We thank the reviewer for the comments, however to investigate this issue, since a single mutant can only partially affect the phenotype, it may be necessary to construct groups of mutants of different genes to observe the effect, which we will further elucidate in our future studies. What we want to show in this work is that SY1 achieves binding to mitochondria by interacting with these mitochondrial proteins.

      c. It is important to note that sphingolipids may affect the whole process indirectly by altering pathways involved in protein quality control or UPR. UPR may regulate the maturation of IBs. It is therefore important to test if any of the effects seen could be of direct consequence.

      We agree with the reviewer's comments, but there was no significant enrichment for protein quality control or UPR-related pathways in our genome-wide screen, so it is unlikely that sphingolipids indirectly cause maturation of IBs by affecting these two pathways. We addressed this issue in our discussion. Please see lines 325-328 on page 12 in the revised version.

      d. In Figure 4D, the authors find SY1 when they pull down Tom70, Tom37 or Tim44. Tim44 is a protein found in the mitochondrial matrix, how do the authors explain that this protein is interacting with a protein outside the mitochondrial outer membrane?

      This interaction could be potentially due to that some of the soluble SY1 enter the mitochondrial matrix and interact with Tim44.

      e. Is it possible that the authors are immunoprecipitating SY1 since IBs have some amount of unimported mitochondrial proteins in aggregates formed during proteotoxic stress (https://doi.org/10.1073/pnas.2300475120) (Liu et al. 2023).

      Our Co-IP experiments were performed in the soluble state supernatant, so mitochondrial proteins in aggregates were not detected.

      f. Line 261 (Discussion): Does deletion of Tom70 or one of the anchors increase Class III aggregation and increase toxicity? Without this, it is hard to say if mitochondria are involved in detoxification.

      We thank the reviewer for the comments, please see our response to comment 3b.

      (4) This fuels the loss of mitochondrial function.

      a. Line 218-219: Although the change is significant, the percentage change is very slight. Is this difference enough to be of physiological relevance in mitochondrial function? In our hands, the DCF fluorescence is much more variable.

      We agree with the reviewer that there is a small difference (but significant). To which extend such a difference be of physiological relevance in mitochondrial function need to be further investigated.

      b. Is SY1-induced loss of mitochondrial function less in knockouts of Tom70 or the other ones found to be important for localizing the SY1 aggregate to mitochondria?

      We examined mitochondrial membrane potential (indicated by Rho 123 fluor intensity) in tom70Δ, tom37Δ and control his3Δ strains and found that the knocking out of Tom70 or Tom37 reduced the mitochondrial toxicity caused by SY1 expression. Please see lines 212-214 on page 8 in the revised version, and Figure 5-figure supplement 2.

      (5) Mitochondrial function is further abrogated when there is a block in sphingolipid biosynthesis.

      a. Myriosin acted like the deletion strains that showed less structured aggregates. There were more aggregates (Class 3) but visually they seemed to be spread apart. The first comment (#2A) on aggregate classes and their interaction with mitochondria may become relevant here.

      According to a recent review article (https://doi.org/10.3389/fcell.2023.1302472), sphingolipids are present in the mitochondrial membrane, bind to many mitochondrial proteins and have emerged as key regulators of mitochondrial morphology, distribution and function. Dysregulation of sphingolipid metabolism in mitochondria disrupts many mitochondrial processes, leading to mitochondrial fragmentation, impaired bioenergetics and impaired cellular function. Myriocin treatment, which affects sphingolipid metabolism, causes mitochondria to become more fragmented, which may explain why the aggregates appear visually spread apart. Regarding the interaction with mitochondria, we counted the proportion of SY1 aggregates surrounded by mitochondria after treatment with myriocin, and the results were not significantly different compared to the control. Please see lines 168-169 on page 6 in the revised version, and Figure 4-figure supplement 1C.

      (6) A similar phenomenon is conserved in mammalian cell lines.

      a. Line 225-226: Did the authors confirm that this was the only alteration in the genome? Or did they complement the phenotype, genetically?

      We performed SPTLC2 gene complementation experiments in knockout cell lines and found that SPTLC2 gene complementation was able to reduce the number of cells forming IBs and the percentage of dispersed irregular IBs compared to controls. Please see lines 240-242 on page 9 in the revised version, and Figure 6-figure supplement 2B.

      b. Line 241-245: One of the significant phenotypes observed by downregulating sphingolipid biosynthesis in yeast and mammalian cells, was the increase in the number of aggregates. This is not shown in myriocin treatment in mammalian cells. This needs to be shown to the main concordance with the original screen and the data presented with the KO mammalian cell line.

      Please see Figure 7-figure supplement 1A for the data on the proportion of cells forming SY1 IBs after myriocin treatment in mammalian cells, and myriocin treatment in mammalian cells was the same as in the KO mammalian cell line.

      Minor Comments:

      Line 273-275: How is this statement connected to the previous statement? Was it observed that aggregate fusion was advantageous to the cells?

      Yes, aggregate/oligomer fusion is advantageous to the cells, and we have modified the previous statement. Please see line 280 on page 10 in the revised version.

      Line 293-294: I am not sure I understand this statement.

      We have modified this statement. Please see lines 302-303 on page 11 in the revised version.

      Line 295-296: But the authors have commented at multiple places that mitochondria detoxify the cell from SY1 aggregates. I find this link fascinating and worth investigating. Most of the current work has some known links in literature (not everything). The mitochondrial connection being the most fascinating one.

      We have removed this sentence. We have added a validation experiment for the role of mitochondrial activity in SY1 IB maturation in the revised version.

      Line 318: Do the authors mean: The open question is...

      Thanks to the reviewer, we have corrected it.

      Response to Reviewer #2 comments:

      I recommend considering live cell microscopy to analyze whether sphingolipid-dependent formation of SY1 IB takes place at the mitochondrial outer membrane. The IBs could also be produced at other membranes and then transported to the mitochondrial outer membrane for storage.

      As shown in Figure 4A, SY1 IB primarily interacts with mitochondria.

      I recommend analyzing whether mitochondrial activity is needed for sphingolipid-dependent SY1 IB formation. Are these IBs localized to mitochondrial membrane solely as scaffold or are these organelles needed to provide the energy for driving IB formation in concert with sphingolipids? This point could be addressed with rho0 strains lacking mitochondrial DNA.

      We thank the reviewer for this recommendation. We expressed SY1 protein in BY4741 rho0 strain as suggested and found that the maturation and mitochondrial surrounding state of SY1 IB was not affected by mitochondrial activity. Please see lines 185-187 on page 7 in the revised version, and Figure 4-figure supplement 1E and 1F.

      The authors should be more precise in the statistical methods used in their study (method, pre-/post-tests, number of replicates...).

      We thank the reviewer for the comment and we have provided a more precise description of the statistical methods. Please see lines 531-534 on page 19 and figure legends in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript aims at a quantitative model of how visual stimuli, given as time-dependent light intensity signals, are transduced into electrical currents in photoreceptors of macaque and mouse retina. Based on prior knowledge of the fundamental biophysical steps of the transduction cascade and a relatively small number of free parameters, the resulting model is found to fairly accurately capture measured photoreceptor currents under a range of diverse visual stimuli and with parameters that are (mostly) identical for photoreceptors of the same type.

      Furthermore, as the model is invertible, the authors show that it can be used to derive visual stimuli that result in a desired, predetermined photoreceptor response. As demonstrated with several examples, this can be used to probe how the dynamics of phototransduction affect downstream signals in retinal ganglion cells, for example, by manipulating the visual stimuli in such a way that photoreceptor signals are linear or have reduced or altered adaptation. This innovative approach had already previously been used by the same lab to probe the contribution of photoreceptor adaptation to differences between On and Off parasol cells (Yu et al, eLife 2022), but the present paper extends this by describing and testing the photoreceptor model more generally and in both macaque and mouse as well as for both rods and cones.

      Strengths:

      The presentation of the model is thorough and convincing, and the ability to capture responses to stimuli as different as white noise with varying mean intensity and flashes with a common set of model parameters across cells is impressive. Also, the suggested approach of applying the model to modify visual stimuli that effectively alter photoreceptor signal processing is thought-provoking and should be a powerful tool for future investigations of retinal circuit function. The examples of how this approach can be applied are convincing and corroborate, for example, previous findings that adaptation to ambient light in the primate retina, as measured by responses to light flashes, mostly originates in photoreceptors.

      Weaknesses:

      In the current form of the presentation, it doesn't become fully clear how easily the approach is applicable at different mean light levels and where exactly the limits for the model inversion are at high frequency. Also, accessibility and applicability by others could be strengthened by including more details about how parameters are fixed and what consensus values are selected.

      Thank you - indeed a central goal of writing this paper was to provide a tool that could be easily used by other laboratories. We have clarified and expanded four points in this regard: (1) we have stated more clearly that mean light levels are naturally part of inversion process, and hence the approach can be applied across a broad range of light levels (lines 292-297); (2) we have expanded our analysis of the high frequency limits to the inversion and added that expanded figure to the main text (new Fig 5); (3) we have included additional detail about our calibration procedures, including our calibration code, to facilitate transfer to other labs; and, (4) we have detailed the procedure for identification of consensus parameters (line 172-182, 191-199 and Methods section starting on line 831).

      Reviewer #2 (Public Review):

      Summary:

      This manuscript proposes a modeling approach to capture nonlinear processes of photocurrents in mammalian (mouse, primate) rod and cone photoreceptors. The ultimate goal is to separate these nonlinearities at the level of photocurrent from subsequent nonlinear processing that occurs in retinal circuitry. The authors devised a strategy to generate stimuli that cancel the major nonlinearities in photocurrents. For example, modified stimuli would generate genuine sinusoidal modulation of the photocurrent, whereas a sinusoidal stimulus would not (i.e., because of asymmetries in the photocurrent to light vs. dark changes); and modified stimuli that could cancel the effects of light adaptation at the photocurrent level. Using these modified stimuli, one could record downstream neurons, knowing that any nonlinearities that emerge must happen post-photocurrent. This could be a useful method for separating nonlinear mechanisms across different stages of retinal processing, although there are some apparent limitations to the overall strategy.

      Strengths:

      (1) This is a very quantitative and thoughtful approach and addresses a long-standing problem in the field: determining the location of nonlinearities within a complex circuit, including asymmetric responses to different polarities of contrast, adaptation, etc.

      (2) The study presents data for two primary models of mammalian retina, mouse, and primate, and shows that the basic strategy works in each case.

      (3) Ideally, the present results would generalize to the work in other labs and possibly other sensory systems. How easy would this be? Would one lab have to be able to record both receptor and post-receptor neurons? Would in vitro recordings be useful for interpreting in vivo studies? It would be useful to comment on how well the current strategy could be generalized.

      We agree that generalization to work in other laboratories is important, and indeed that was a motivation for writing this as a methods paper. The key issue in such generalization is calibration. We have expanded our discussion of our calibration procedures and included that code as part of the github repository associated with the paper. Figure 10 (previously Figure 9) was added to illustrate generalization. We believe that the approach we introduce here should generalize to in vivo conditions. We have expanded the text on these issues in the Discussion (sections starting on line 689 and 757).

      Weaknesses:

      (1) The model is limited to describing photoreceptor responses at the level of photocurrents, as opposed to the output of the cell, which takes into account voltage-dependent mechanisms, horizontal cell feedback, etc., as the authors acknowledge. How would one distinguish nonlinearities that emerge at the level of post-photocurrent processing within the photoreceptor as opposed to downstream mechanisms? It would seem as if one is back to the earlier approach, recording at multiple levels of the circuit (e.g., Dunn et al., 2006, 2007).

      Indeed the current model is limited to a description of rod and cone photocurrents. Nonetheless, the transformation of light inputs to photocurrents can be strongly nonlinear, and such nonlinearities can be difficult to untangle from those occurring late in visual processing. Hence, we feel that the ability to capture and manipulate nonlinearities in the photocurrents is an important step. We have expanded Figure 10 to show an additional example of how manipulation of nonlinearities in phototransduction can give insight into downstream responses. We have also noted in text that an important next step would be to include inner segment mechanisms (section starting on line 661); doing so will require not only characterization of the current-to-voltage transformation, but also horizontal cell feedback and properties of the cone output synapse.

      (2) It would have been nice to see additional confirmations of the approach beyond what is presented in Figure 9. This is limited by the sample (n = 1 horizontal cell) and the number of conditions (1). It would have been interesting to at least see the same test at a dimmer light level, where the major adaptation mechanisms are supposed to occur beyond the photoreceptors (Dunn et al., 2007).

      We have added an additional experiment to this figure (now Figure 10) which we feel nicely exemplifies the approach. The approach that we introduce here really only makes sense at light levels where the photoreceptors are adapting; at lower light levels the photoreceptors respond near-linearly, so our “modified” and “original” stimuli as in Figure 10 (previously Figure 9) would be very similar (and post-phototransduction nonlinearities are naturally isolated at these light levels).

      Reviewer #3 (Public Review):

      Summary:

      The authors propose to invert a mechanistic model of phototransduction in mouse and rod photoreceptors to derive stimuli that compensate for nonlinearities in these cells. They fit the model to a large set of photoreceptor recordings and show in additional data that the compensation works. This can allow the exclusion of photoreceptors as a source of nonlinear computation in the retina, as desired to pinpoint nonlinearities in retinal computation. Overall, the recordings made by the authors are impressive and I appreciate the simplicity and elegance of the idea. The data support the authors' conclusions but the presentation can be improved.

      Strengths:

      -  The authors collected an impressive set of recordings from mouse and primate photoreceptors, which is very challenging to obtain.

      -  The authors propose to exploit mechanistic mathematical models of well-understood phototransduction to design light stimuli that compensate for nonlinearities.

      -  The authors demonstrate through additional experiments that their proposed approach works.

      Weaknesses:

      -  The authors use numerical optimization for fitting the parameters of the photoreceptor model to the data. Recently, the field of simulation-based inference has developed methods to do so, including quantification of the uncertainty of the resulting estimates. Since the authors state that two different procedures were used due to the different amounts of data collected from different cells, it may be worthwhile to rather test these methods, as implemented e.g. in the SBI toolbox (https://joss.theoj.org/papers/10.21105/joss.02505). This would also allow them to directly identify dependencies between parameters, and obtain associated uncertainty estimates. This would also make the discussion of how well constrained the parameters are by the data or how much they vary more principled because the SBI uncertainty estimates could be used.

      Thank you - we have improved how we describe and report parameter values in several ways. First, the previous text erroneously stated that we used different fitting procedures for different cell types - but the real difference was in the amount of data and range of stimuli we had available between rods and cones. The fitting procedure itself was the same for all cell types. We have clarified this along with other details of the model fitting both in the main text (lines 121-130) and in the Methods (section starting on line 832). We also collected parameter values and estimates of allowed ranges in two tables. Finally, we used sloppy modeling to identify parameters that could covary with relatively small impact on model performance; we added a description of this analysis to the Methods (section starting on line 903).

      -  In several places, the authors refer the reader to look up specific values e.g. of parameters in the associated MATLAB code. I don't think this is appropriate, important values/findings/facts should be in the paper (lines 142, 114, 168). I would even find the precise values that the authors measure interesting, so I think the authors should show them in a figure/table. In general, I would like to see also the average variance explained by different models summarized in a table and precise mean/median values for all important quantities (like the response amplitude ratios in Figures 6/9).

      We have added two tables with these parameters values and estimates of allowable ranges. We also added points to show the mean (and SD) across cells to the population figures and added those numerical values to the figure legends throughout.

      -  If the proposed model is supposed to model photoreceptor adaptation on a longer time scale, I fail to see why this can be an invertible model. Could the authors explain this better? I suspect that the model is mainly about nonlinearities as the authors also discuss in lines 360ff.

      For the stimuli that we use we see little or no contribution of slow adaptation in phototransduction. We have expanded the description of this point in the text and referred to Angueyra et al (2022) which looks at this issue in more detail for primate cones (paragraph starting on line 280).

      -  The important Figures 6-8 are very hard to read, as it is not easy to see what the stimulus is, the modified stimulus, the response with and without modification, what the desired output looks like, and what is measured for part B. Reworking these figures would be highly recommended.

      We have reworked all of the figures to make the traces clearer.

      -  If I understand Figure 6 correctly, part B is about quantifying the relative size of the response to the little first flash to the little second flash. While clearly, the response amplitude of the second flash is only 50% for the second flash compared to the first flash in primate rod and cones in the original condition, the modified stimulus seems to overcompensate and result in 130% response for the second flash. How do the authors explain this? A similar effect occurs in Figure 9, which the authors should also discuss.

      Indeed, in those instances the modified stimulus does appear to overcompensate. We suspect this is due to differences in sensitivity of the specific cells probed for these experiments and those used in the model construction. We now describe this limitation in more detail (lines 524-526). A similar point comes up for those experiments in which we speed the photoreceptor responses (new FIgure 9B), and we similarly note that the cells used to test those manipulations differed systematically from those used to fit the model (lines 558-560).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I only have a few minor questions and suggestions for clarification.

      It hasn't become fully clear to me how general the model is when different mean light levels (on long-time scales) are considered. Are there slow adaptation processes not captured in the model that affect model performance? And how should one go about setting the mean light level when, for example, probing ganglion cells with a stimulus obtained through model inversion? Should it work to add an appropriate DC component to the current that is provided as input to the inverted model? (Presumably, deriving a stimulus and then just adding background illumination should not work, or could this be a good approximation, given a steady state that is adapted to the background?)

      We have clarified in the main text that slow adaptation does not contribute substantially to responses to the range of stimuli we explored (lines 281-289). We have also clarified that the stimulus in the model inversion is specified in isomerizations per second - so the mean value of the stimulus is automatically included in the model inversion (lines 293-298).

      Furthermore, a caveat for the model inversion seems to be the potential amplification of high-frequency noise. The suggested application of a cutoff temporal frequency seems appropriate, but data are shown only for a few example cells. Is this consistent across cells? (Given that performance between, e.g., mouse cones can vary considerably according to Fig. 4B?) I would also like to suggest moving the corresponding Supplemental Figure (4.1) into the main part of the manuscript, as it seems quite important.

      We have added population analysis to the new Figure 5 (which was Figure 4 - Figure Supplement 1). We have also clarified that the amplification of high frequency noise is an issue only when we try to apply model inversion to measured stimuli. When we use model inversion to identify stimuli that elicit desired responses, the target responses are computed from a linear model that has no noise, so this is not a concern in applications like those in Figures 6-10.

      Also, could the authors explain more clearly what the effect of the normalization of the estimated stimulus by the power of the true stimulus is? Does this simply reduce power at high frequency or also affect frequencies below the suggested cutoff (where the stimulus reconstruction should presumably be accurate even without normalization)?

      Indeed this normalization reduces high frequency power and has little impact on low frequencies where the inversion is accurate; this is now noted in the text (line 363). As for amplification of high frequency noise (previous comment), the normalization by the stimulus power is only needed when inverting measured responses (i.e. responses with noise) and is omitted when we are identifying stimuli that elicit desired responses (e.g. in Figures 6-10).

      While the overall performance of the model to predict photoreceptor currents is impressive, it seems that particular misses occur for flashes right after a step in background illumination and for the white-noise responses at low background illumination (e.g. Figure 1B). Is that systematic, and if so what might be missing in the model?

      Indeed the model (at least with fixed parameters across stimuli) appears to systematically miss a few aspects of the photoreceptor responses. These include the latency of the response to a bright flash and the early flashes in the step + flash protocol in Figure 1B. Model errors for the variable mean noise stimulus (Figure 2) showed little dependence on time even when responses were sorted by mean light level and by previous mean level. Model errors did not show a clear systematic dependence on light level; this likely reflects, at least in part, the use of mean-square-error to identify model parameters. We have expanded our discussion of these systematic errors in the text (lines 164-166).

      I was also wondering whether this is related to the fact that in Figure 9B, the gain in the modified condition is actually systematically higher when there is more background light. Do the authors think that this could be a real effect or rather an overcompensation from the model? (By the way, is it specified what "Delta-gain" really is, i.e., ratio or normalized difference?)

      We suspect this is an issue with the sensitivity of the specific cells for which we did these experiments (i.e. variability in the gamma parameter between cells). This sensitivity varies between cells, and such variations are likely to place the strongest limitation on our ability to use this approach to manipulate responses in different retinas. We now note those issues in the Results (lines 523-526, 557-559 and 591-593) with reference to Figures 9 (previously Figure 8) and 10 (previously Figure 9), and describe this limitation more generally in the Discussion (section starting on line 649). We have also changed delta-gain to response ratio, which seemed more intuitive.

      Maybe I missed this, but it seems that the parameter gamma is fitted in a cell-type-specific fashion (e.g. line 163), but then needs to be fixed for held-out cells. How was this done? Is there much variability of gamma between cells?

      There is variability in gamma between cells, and this likely explains some of systematic differences between data and model (see above and Methods, lines 902-903). For the consensus models in Figure 2B, gamma was allowed to vary for each cell while the remaining consensus model parameters were fixed. Gamma was set equal to the mean value across cells for model inversion (i.e. for all of the analyses in Figures 4-10). We have described the fitting procedure in considerably more detail in the revised Methods (starting on line 832).

      For completeness, it would be nice to have the applied consensus model parameters in the manuscript rather than just in the Matlab code (especially since the code has not been part of the submission). Also, some notes on how the numerical integration of the differential equations was done would be nice (time step size?).

      We have added tables with consensus parameters and estimates of the sensitivity of model predictions to each parameter. We have also added additional details about the numerical approaches (including the time step) to Methods.

      Similarly, it would be nice to explicitly see the relationships that are used to fix certain model parameters (lines 705ff). And can the constants k and n (lines 709-710) be assumed identical for different species and receptor types?

      We have added more details to the model fitting to the methods, including the use of steady-state conditions to hold certain parameters fixed (lines 862 and 866). We are not aware of any direct comparisons of k and n across species and receptor types. We have noted that model performance was not improved by modest changes in these parameters (due to compensation by other model parameters). More generally, we have explained how some parameters trade for others and hence the logic of fixing some even when exact values were not available.

      For the previous measurements of m and beta (lines 712-713), is there a reference or source?

      We have added references for these values.

      Did the authors check for differences in the model parameters between cone types (e.g., S vs. M)?

      We did not include S cones here. They are harder to record from and collecting a fairly large data set across a range of stimuli would be challenging. Our previous work shows that S cones have slower responses than L and M cones, and this would certainly be reflected in differences in model parameters. We have noted this in the text (Methods, line 808-810).

      For the stated flash responses time-to-peak (lines 183-184), is this for a particular light intensity with no background illumination?

      Those are flashes from darkness - now noted in the text.

      Figure 2 - Supplement 1 doesn't have panel labels A and B, unlike the legend.

      Fixed - thank you.

      Reviewer #2 (Recommendations For The Authors):

      (1) Fig. 2B - for some cells, the consensus model seems to fit better than the individual model. How is this possible?

      This was mostly an error on our part (we inadvertently included responses to more stimuli in fitting the individual models, which slightly hampered their performance). Even with this correction, however, a few cells remain for which the consensus model outperforms and individual model. We believe this is because there is more data to constrain model parameters for the consensus models (since they are fit to all cells at the same time), and that can compensate for improvements associated with customizing parameters to specific cells.

      (2) Fig. 2 Supplement 1, it would be useful to see a blow-up of the data in an inset, as in Fig. 2B.

      Thanks - added.

      (3) Line 400 - this paragraph could include additional quantification and statistics to back up claims re 'substantially reduced', 'considerably lower'.

      We quantify that in the next sentence by computing the mean-square-error between responses and sinusoidal fits (also in Figure 7B, which now includes statistics as well). We have made that connection more direct in the text.

      (4) Maybe a supplement to Fig. 8 could show the changes to the stimulus required to alter the kinetics in both directions - to give more insight into part B., especially.

      Good suggestion - we have added the stimuli to all of the panels of the figure (now Figure 9).

      (5) Fig. 8B - in 'Speed response up' condition - there seems to be error in the model for the decay time of the response - especially for the 'original' condition, which is not quantified in 8C. Was it generally difficult to predict responses to flashes?

      That seems largely to reflect that the cells used for those experiments had faster initial kinetics than the average cells (responses to the control traces are also faster than model predictions in these cells - black traces in Figure 9B). We have added this to the text.

      (6) Line 678, possibly notes that 405 nm equally activates S and M photopigments in mice, since most of the cones co-express the two photopigments (Rohlich et al., 1994; Applebury et al., 2000; Wang et al., 2011).

      Thanks - we have added this (lines 827-829).

      (7) The discussion could include a broader description of the various approaches to identifying nonlinearities within retinal circuitry, which include (incomplete list): recording at multiple levels of the circuit (e.g., Kim and Rieke 2001; Rieke, 2001; Baccus and Meister, 2002; Dunn et al., 2006; 2007; Beaudoin et al., 2007; Baccus et al., 2008); recording currents vs. spiking responses in a ganglion cell (e.g., Kim and Rieke, 2001; Zaghloul et al., 2005; Cui et al., 2016); neural network modeling approaches (e.g., Maheswaranathan et al., 2023); optogenetic approaches to studying filtering/nonlinear behavior at synapses (e.g., Pottackal et al., 2020; 2021).

      Good suggestion - we have added this to the final paragraph of the Discussion.

      Reviewer #3 (Recommendations For The Authors):

      -  I am personally not a fan of the style: "... as Figure 4A shows..." or comparable and much prefer a direct "We observe that X is the case (Figure 4A)". If the authors agree, they may want to revise their paper in this way.

      We have revised the text to avoid the “... as Figure xx shows” construction. We have retained multiple instances which follow a “Figure xx shows that …” construction (which is both active rather than passive and does not use a personal pronoun).

      -  I am not a fan of the title. Light-adaption clamp caters only to a very specialized audience.

      We have changed the title to “Predictably manipulating photoreceptor light responses to reveal their role in downstream visual responses.”

      -  The parameter fitting procedure should not only be described in Matlab code, but in the paper.

      Thanks - we have expanded this in the Methods considerably (section starting on line 832).

      -  The authors should elaborate on why different fitting procedures were used.

      We did not describe that issue clearly. The fitting procedures used across cells were identical, but we had different data available for different cell types due to experimental limitations. We have substantially revised that part of the main text to clarify this issue (paragraph starting on line 121).

      -  The authors state in line 126 that the input stimulus is supposed to mimic eye movements mouse, monkey, or human? Please clarify.

      Thanks - we have changed this sentence to “abrupt and frequent changes in intensity that characterize natural vision.”

      -  Please improve the figure style. For example, labels should be in consistent capitalization and ideally use complete words (e.g. Figure 2B, 4B, and others).

      We have made numerous small changes in the figures to make them more consistent.

      -  Is the fraction of variance calculated on held-out-data? Linear models should be added to Figure 2B.

      The fraction of variance explained was not calculated on held out data because of limitations in the duration of our recordings. Given the small number of free parameters, and the ability of the model to capture held out cells, we believe that the model generalizes well. We have added a supplemental figure with linear model performance (Figure 2 - Figure Supplement 2).

      -  Fig. 9A is lacking bipolar cell and amacrine cell labels. Currently, it looks like HC is next to the BC in the schematic.

      Thanks - we have updated that figure (now Figure 10A)

      -  Maybe I am misunderstanding something, but it seems like the linear model prediction shown in Figure 2A for the rod could be easily improved by scaling it appropriately. Is this impression correct or why not?

      We have clarified how the linear model is constructed (by fitting the linear model to low contrast responses of the full model at the mean stimulus intensity). We also added a supplemental figure, following the suggestion above, showing the linear model performance when a free scaling factor is included for each cell.

      -  The verification experiment in Fig. 5 is only anecdotal and is elaborated only in Figure 6. If I am not mistaken, this does not necessitate its own figure/section but could rather be merged.

      We have kept this figure separate (now Figure 6) as we felt that it was important to highlight the approach in general in a figure before getting into quantification of how well it works.

      -  Figure 5 right is lacking labels. What is red and grey?

      Thanks for catching that - labels are added now.

      -  The end of the Discussion is slightly unusual. Did some text go missing?

      Thanks - we have rearranged the Discussion so as not to end on Limitations.

      -  There is a bonus figure at the end which seems also not to belong in the manuscript.

      Thanks - the bonus figure is removed now.

      -  The methods should also describe briefly what kind of routines were used in the Matlab code, e.g. gradient descent with what optimizer?

      We’ve added that information as well.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their positive assessment of our manuscript. We agree that there are some further experiments suggested by the reviewers that would enhance our study. We have highlighted further proposed experimental work in bold for clarity.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      1. EVIDENCE, REPRODUCIBILITY AND CLARITY Summary: The Matrix 2 (M2) protein of influenza A virus (IAV) is a single pass transmembrane protein known to act as a tetrameric ion channel that is important for both viral entry and egress. The paper by Figueras-Nova et al. entitled "Caspase cleavage of Influenza A virus M2 disrupts M2-LC3 interaction and regulates virion production" reports on the regulation of IAV virion production through a regulatory interplay between a caspase cleavage site and a LC3 interacting region (LIR) motif in M2. In its C-terminal cytoplasmic tail the IAV M2 protein contains a C-terminal LIR motif interacting with LC3. The authors show that this LIR motif is preceded by a functional caspase cleavage motif cleaved predominantly by caspase-6, with some contribution from caspase-3: The motif 82-SAVD-85 directs cleavage after the aspartate (D) at position 85. The cleavage leads to loss of the remaining C terminal sequence from amino acid 86 to 97. The core LIR motif 91-FVSI-94 LIR motif is then lost from M2 which can no longer bind LC3. As previously described by the same group using point mutations in the LIR motif (Ref 12.), loss of a functional LIR., here by caspase- mediated deletion of the LIR, affects the virion production and inhibits filamentous budding. LC3B lipidation is increased upon treatment with a caspase inhibitor. The authors show for the first time that LC3 is included into IAV virions via binding to M2. Furthermore, they also report a co-crystal structure of the M2 C terminus (aa 70-97), containing the caspase cleavage site and LIR, and LC3B (aa 3-125) adding new insights into this interaction and showing that the caspase cleavage site is in a flexible region N-terminal to the LIR. This work shows how caspase cleavage may modulate LC3B lipidation, trafficking to the plasma membrane, incorporation of LC3B in the virions, filamentous budding and virion production (viral titer).

      Major comments: The findings reported here are very well supported by the data shown. This is a very clearly written paper with well described and nicely visualized results that are accompanied by adequate statistical analyses.

      We thank the reviewer for their assessment of our manuscript.

      The authors report a new way the LC3B binding to the C-terminal tail of the M2 proteins is regulated and suggest that this is an adaptation the virus has made to adjust virion production to host cell status by hijacking the function of host caspases. They show that the caspase cleavage motif is evolutionary conserved and use that as an argument. Perhaps it could be discussed if it also could be an argument that the host protects itself against a too massive virion production as this could be too detrimental to the host? Would it not also be an evolutionary advantage to the virus in the long run by avoiding killing the host?

      This is an interesting point. We agree there could be advantage for the virus not to overproduce virions under certain circumstances. Consistent with this caspase-6 deficient mice had increased mortality in response to IAV PR8 infection, and presented and increase in viral spread in the lungs (Zheng, 2021; doi: 10.1016/j.cell.2020.03.040). This is also relevant for the comments made by Reviewer 2. The manuscript will be updated to include a discussion of this point.

      A question I may raise which is optional as it may be too much work to address as part of this study is if the reported regulation of LC3B binding has any role in regulating the ion channel function of the M2 tetramer?

      It is well established that there is no impact of distal C-terminal truncations on M2 ion channel activity (Cady et al., 2009, doi: 10.1021/bi9008837 Schnell and Chou, doi: 10.1038/nature06531; Nguyen et al., 2008, doi: 10.1021/bi801315m; Tobler et al., 1999, doi: 10.1128/jvi.73.12.9695-9701.1999). This is also consistent with data from our lab (Ulferts et al., 2021, doi: 10.1016/j.celrep.2021.109899, Beale et al., 2014, doi: 10.1016/j.chom.2014.01.006) as well as others (Ren et al., 2015, doi: 10.1128/JVI.00576-15) showing the effects of the LIR motif and the proton channel are distinct. We appreciate the reviewer suggesting further work here as optional, but there is already compelling evidence to show there is no substantial effect of the LIR motif on ion channel activity. (See also Reviewer 2 points 4 and 5).

      Minor comments: Delete "with" in line 145.

      This will be changed in the updated manuscript.

      Line 217: It should be written more specifically how "cells were surface stained with M2"

      The protocol for surface staining of M2 will be explained in more detail in the updated manuscript.

      1. SIGNIFICANCE

      This is a very well performed study with a sound experimental strategy and well performed assays with clear results increasing our insight into the interplay between the Influenza A virus and host cells. Although caspase mediated cleavage of the autophagy receptor and signaling scaffold protein p62 (Ref. 25), removing the LIR and LC3-binding, has been reported before I consider this study as novel in reporting this type of regulation of LC3 binding. The cleavage of p62 deletes a large part of the protein while here it is a "clean" deletion of the LIR sequence representing a conceptual advance of regulation of LC3 binding. The study also reports for the first time on LC3B incorporated into virions. The effects on trafficking to the plasma membrane and viral budding and virion production are similar to those reported before (Ref. 12) using viruses with point mutations crippling the LIR motif. This research will be of interested to all studying virus- host interaction and to the autophagy field both as a non autophagic role of LC3B, and as a regulatory mechanism of LIR-LC3B interactions involving the irreversible caspase cleavage-mediated deletion of the LIR motif.

      We thank the reviewer for this assessment of our manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The influenza A virus (IAV) M2 protein is small transmembrane protein which plays a role in virus entry and egress. In a previous study, Beale et al. (2014) identified an LC3-interacting region (LIR) in the M2 cytoplasmic domain that was found to recruit the LC3B protein to the plasma membrane. Recombinant IAV harboring mutations in the LIR motif showed reduced particle stability and lost filamentous morphology.

      In the present study, Figueras-Novoa et al. show that the LIR motif is removed in response to activation of cellular caspases. The authors demonstrate that in in IAV-infected THP-1 cells M2 is partially cleaved at the motif (82)SAVD(85)¯A by caspase 6. Caspase inhibitors abolished cleavage, and a mutant virus harboring the D85A substitution was found to be resistant to caspase action. A crystal structure of purified M2 C- terminus and LC3B revealed that the caspase cleavage site lies in a flexible region that is accessible to caspases.

      Mutant virus encoding a truncated M2 protein (M2D86-97) was unable to interact with LC3, in accordance with the absence of the LIR motif. The M2D86-97 mutant showed reduced lipidation of LC3, while enhanced lipidation of LC3 was observed when wild-type virus-infected cells were treated with caspase inhibitors. The authors also observed that cell surface transport of M2D86-97 but not M2-D85A was impaired. However, in purified virus particles a mix of cleaved and uncleaved M2 was detected. The authors also demonstrated that lipidated LC3B was present in purified virions of wild-type virus particles but even more abundant in M2-D85A virions. Finally, M2D86-97 mutants produced significantly less infectious particles compared to wild-type virus while the D85A cleavage mutant replicated to similar titers than wt virus.

      Based on these findings the authors concluded that caspases regulate the interaction of M2 protein with LC3 which impacts virion production. Specifically, they propose that caspase-mediated removal of the LIR motif may enable a switch between filamentous and non-filamentous budding in response to depletion of cellular resources. However, the authors were unable to rescue a filamentous IAV with a truncated M2 protein and therefore could not provide direct proof for their guess.

      While the data are sound and presented well, they do not support the conclusions of the authors.

      1. To the authors opinion, the conserved caspase cleavage site in the M2 protein might provide an evolutionary advantage for the virus. However, the M2-D85A mutation has no effect on viral replication, so the biological significance of why M2 needs to be cleaved at all is unclear. The conclusion that caspase-induced M2 cleavage is a fine-tuning mechanism of IAV has not been supported by experiments.

      We thank the reviewer for the assessment of our data. We think the reviewer is specifically objecting to the phrase “We conclude that this highly conserved interaction and cleavage act as a regulatory mechanism exploited by IAV to fine-tune virion production in different cellular contexts.” This is a reasonable inference from our results, but we accept that it is not proven. We will change the wording to make it clear this has not been definitively demonstrated.

      1. The finding that the permanently truncated IAV M2 mutant virus was substantially attenuated does not necessarily mean that abrogation of M2-LC3 interaction was responsible for this attenuation. As the M2 protein plays a role in virus budding at the plasma membrane (recruitment of M1 protein, induction of membrane curvature, membrane scission), the impaired transport of the truncated M2 protein might already explain that the virus was attenuated and that incorporation of the protein into the viral envelope was reduced.

      We will confirm this further with additional experiments using LIR mutants. Recapitulating the plasma membrane transport defect of truncated M2 with LIR mutants including the newly characterised M2D87A and M2D88A mutants and a more severe mutant with a FVSI_AAAA substitution would strongly imply this truncation mutant phenotype is due to the lack of LIR motif.

      1. It is also not clear whether the loss of the C-terminal 11 amino acids may have affected the interaction of the M2 protein with other proteins such as TRAPPC6A-delta (Zhu et al., 2017).

      This is a reasonable point, however Zhu et al., 2017 (https://doi.org/10.1128/jvi.01757-16) reported that the interaction with TRAPPC6A retains M2 intracellularly. If the phenotype observed with our truncation was due to the loss of interaction with TRAPPC6A, the opposite phenotype would be observed (more M2 in the plasma membrane with the truncated M2∆86-97 mutant). To address this point directly we will attempt to rescue an M2 mutant virus that has disrupted the reported TRAPPC6A binding site and assess M2 plasma membrane localization.

      The authors did not rule out whether the truncation of the M2 protein by 11 amino acids would have an effect on proton channel activity. Proton channel activity, however, might be important to preserve the metastable conformation of HA in the secretory pathway and might be also important for virus uncoating.

      M2D86-97 induced less LC3 lipidation than wild-type M2 or the D85A mutant. The remaining lipidation was attributed to the ion channel activity of the M2 protein. Can the authors rule out that the truncation of the M2 protein led to reduced ion channel activity which in turn led to reduced LC3B lipidation?

      We have addressed points 4 and 5 in response to Reviewer 1.

      The suggested role of caspase cleavage as a regulatory switch between filamentous and spherical virions (lines 304- 313) is highly speculative as long as the authors do not provide any experimental proof for it. The authors indicated that they were unable to rescue filamentous IAV with M2D86-97. However, would it be possible to use caspase inhibitors to test their hypothesis?

      We acknowledge that M2∆86-97 could not be rescued in a filamentous background. The use of caspase inhibitors would only increase the amount of full length M2 present, and does not provide an alternative strategy for increasing the proportion of truncated M2. However, since M2∆86-97 mutant could not be rescued, we will attempt to rescue additional LIR motif mutants to address this point. In particular, D87A and D88A mutants could be generated in a MUd background, as well as the F91S mutant.

      The authors used only the PR8 strain for their studies, a highly cell culture-adapted strain with spherical morphology. Are the findings obtained with this strain are also valid for others IAV strains?

      As we highlight in Figure 2I, both the caspase cleavage motif and LIR motif are highly conserved in human IAV strains. PR8 was used as it is the reverse genetic system in use and approved for use in the lab. We will attempt to address this by testing whether other IAV strains we are able to obtain also undergo caspase mediated cleavage of M2. If possible, we will obtain recent clinical isolates to show cleavage of M2 in a strain that has not adapted to cell culture.

      1. The authors mainly used the THP-1 cells for their studies, a human macrophage-like cell line. However, human IAV mostly replicate in epithelial cells of the respiratory tract and cause only abortive infections of macrophages. Why did the authors choose this cell line? Can the findings obtained with this cell line be translated to epithelial cells of the airways?

      THP-1 cells are widely used for the study of caspase activity. However, we also show M2 cleavage in MDCK cells and HAP1 cells. PR8 infection of A549 cells does not induce significant amounts of cell death in the infection time points used and, as caspase activation is linked to cell death, we did not observe M2 cleavage in this cell type. We will attempt to infect some epithelial cell types to confirm this phenotype.

      1. Minor issues:

      2. Fig. 1C: There seem to be quite some differences in the cleavage efficiency of M2 between panels A, B, C, and D? Any explanations?

      Different cell types (THP-1 cells and HAP1 cells) are used for the experiments mentioned above, which accounts for the different amount of M2 cleavage.

      • Fig. 1: Panel E: The labeling of the first amino acids as aa 76 seems to be wrong!

      We thank the reviewer for pointing this out, this will be corrected in the updated manuscript.

      Line 147: ...caspase mediated disruption of the M2-LC3 interaction (Fig 2A-B). Should be Fig. 2A-C.

      This sentence was referring to Figure 2A-B, as it refers to LC3B lipidation and not the coIP. This sentence will be changed in the text to reflect the intended meaning.

      • Growth kinetics of the various mutant viruses are missing?

      __We will provide growth kinetics for the relevant mutants _(M2D85A and M2∆86-97).___

      • Line 195: The authors speculate that aa85 is important for viral fitness: That should be demonstrated!

      This speculation is based on the very strong conservation of D85 in human IAV strains. The importance of D85 in viral fitness (permitting cleavage of M2) is only likely to be directly demonstrable in transmission models (for example ferrets) which is not feasible or justifiable.

      Reviewer #2 (Significance (Required)):

      Authors concluded that caspases regulate the interaction of M2 protein with LC3 which impacts virion production. Specifically, they propose that caspase-mediated removal of the LIR motif may enable a switch between filamentous and non-filamentous budding in response to depletion of cellular resources. However, the authors were unable to rescue a filamentous IAV with a truncated M2 protein and therefore could not provide direct proof for their guess. +<br /> +

      • As stated in the response to the comments above, we will attempt to rescue LIR mutant viruses (____D87A and D88A) in a MUd background which would provide further support for our hypothesis. Our data has significance for the understanding of the cell biology of influenza infection as commented on by Reviewers 1 and 3.

        • Reviewer #3 (Evidence, reproducibility and clarity (Required)): Summary : In this article, the authors identify a caspase cleavage site in the influenza A virus (IAV) Matrix 2 protein (M2) that leads to a truncated form of M2 deleted from its C-term LC3-interacting region (LIR). This cleaved form of M2 is seen and accumulates starting at 12 hours post-infection. IAV expressing M2 delta 86-97 mutant, corresponding to cleaved M2, seems to disrupt LC3B localization to cell plasma membrane upon infection. The authors also show that the IAV M2 delta 86-97 has a reduced viral titer compared to IAV WT. Overall the data are quite exciting where the authors identify the specific caspase responsible for the cleavage and show the residues of M2 necessary for LC3 interaction. However, some of the data showing the consequence of the cleavage for viral replication could be better clarified.

      We thank Reviewer 3 for their kind comments and we propose further experiments to clarify the consequences of cleavage.

      Major comments: - In Fig3A-B, the authors seek to demonstrate that the localization of M2 to the plasma membrane requires LIR motif. However, the representative images for cell infected with the delta 86-97 mutant show relatively few cell are expressing M2 raising questions of the infectivity of this mutant virus or if the overall expression of M2 in this assay is less for the delta 86-97 mutant. The authors should consider first quantifying the ratio of M2 cell surface staining over total M2 staining and second re-evaluate the representative images chosen.

      __We will include more examples of permeabilised cells in which comparable numbers of cells are M2 positive between mutants. We will also include high-content microscopy based quantification to support this. __To clarify, we confirm that the quantification of M2 intensity in the plasma membrane is carried out relative to the number of M2 positive cells, as the reviewer agrees is the most accurate way. To avoid confusion, we will update figure legends to describe more accurately the quantification process. A comparison between surface M2 and total M2 cannot be done on an individual cell basis, as once cells are permeabilized (to look for internal M2), robust differentiation between surface and internal M2 is difficult. The above clarification and additional data should provide the necessary support for our conclusions.

      • In fig3E, it is unclear what is being quantified in the graph as the legend and text lines 222-223 mention that spot intensity was measured but the y axis indicates LC3 relocalization intensity. Given LC3 is punctated particularly in the cytosol, It is unclear which spots of LC3 they are referring to. Based on the images shown, using a graph with LC3 surface staining as performed for M2 would clarify the data. The authors should clarify the reporting of these data in the results section. Additionally, the images of the control non-infected cells should be added to 3C.

      We agree with the reviewer on this point. The figure will be updated to describe more accurately what is being quantified. Additionally, images for uninfected cells in 3C will be added.

      • The data in Fig4 and FigS3 need to be strengthened to be conclusive. The volcano plot in FigS3A indicates that there is more LC3B and IAV proteins in M2 D85A than M2delta86-97. However in Fig4E, both LC3 I and LC3 II are increased in virions M2 delta 86-97 compared to M2 D85A which is opposite to the authors' conclusions in lines 244-245. In other words, the total amount of lipidated LC3 is higher in virions from IAV M2 without LIR motif than M2 with LIR. LC3II/I ratio in fig4F would suggest in virions containing M2 with LIR motif, LC3B II may be preferentially incorporated compared to virions containing M2 without LIR, which incorporates both LC3B I and LC3B II. Since this is a critical point made by the authors, performing a co-immunoprecipitation of M2 D58A and M2delta86-97 in the particles and then assessing for binding of LC3 I or II would bolster their conclusions.

      Figure 4F quantifies the ratio of LC3II to LC3I in infectious particles. Another two repeats used to quantify this ratio will be shown in addition, with a better representation of increased amounts of lipidated LC3II in M2D85A infectious particles, as well as an increased LC3II/LC3I ration in said particles when compared to M2∆86-97. Because of the low yield acquired from the purification of IAV virions, performing an IP would be difficult. Even if this were technically feasible it would not prove that M2 is binding LC3 inside the virion – we do not make this claim in our paper, merely that LC3B can be detected in the purified viral particles. We will clarify this point in the revised manuscript.

      • In Fig4J, even if statistically significant, the PFU difference between M2 D85A and M2 delta86-97 is minimal, performing growth curve assay would help appreciate this difference over time. In Thp1 cells, as the authors show caspase cleavage of M2 at time point 12h 14h 16hpi etc... (fig1), they should also show PFU data at these same time points for M2 mutant D85A compared to WT and M2 delta 86-97.

      We agree with the reviewer and indeed this was a point we attempted to make in our manuscript: Figure 4J shows a statistically significant difference between the titers. However, in the text we state that, even though statistically significant, the difference is much smaller than in other titer quantifications performed. Given the nature of a plaque assay, differences of less than a log fold cannot be considered as definitively indicating biological significance. We will clarify this in a revised manuscript. We will also provide the relevant growth kinetics (as per response to Reviewer 2).

      • The title of Fig4 and FigS3 and in text line 226 should be changed as M2 incorporation into virions is not shown and not described in the text. Plus, in figS3B, the authors show that between the M2 mutants, there is no difference in the abundance of M2 and other viral proteins compared to M1.

      The title of Figures 4 and S3 will be changed to more accurately reflect all of the points made by the figure.

      • In the image shown in Fig4H the number of plaques is higher for M2delta86-97 even though the size in smaller than M2 WT. Could the authors clarify in the text of the results section how they quantify PFU in their plaque assay and if they used a size criterion when quantifying the number of plaques?

      The images of plaques are taken at different dilutions, with the M2∆86-97 image belonging to two dilutions lower than the M2WT image. We will include the calculation used for PFU/mL, which does not take into account plaque size. Furthermore, images of the whole plate, showing plaqued serial dilutions will be shown.

      • In fig3B, the legend indicates 8 hpi but on the graphs it is 9 hpi.

      We thank the reviewer for pointing out this mistake. Both should read 8 hpi, this will be corrected in the new manuscript.

      Reviewer #3 (Significance (Required)):

      The authors demonstrated that IAV M2 binding to LC3 is regulated by caspase cleavage. The authors clearly identify the cleavage site and the caspase involved: caspase 6. The cleaved form of M2 seems relevant to IAV infection as it is accumulating after 12hpi. Using a M2 mutant D85A that cannot be cleaved by caspase 6 and truncated M2 mutant delta86-97 mimicking caspase cleaved M2, the authors are able to elegantly address the role of M2 cleavage. However, the importance of M2 caspase cleavage on IAV infection is not demonstrated. Eventually, addressing the impact of the caspase cleavage of M2 LIR motif on autophagy or CASM would be interesting. - Advance: conceptual. - Audience: basic research, specialized in virology, specialized in autophagy. - Field of expertise: virology, autophagy.

      We agree with the reviewer that we have made a conceptual advance in our understanding of the cell biology of influenza A virus infection. We have also determined the structure of the terminal part of the M2 tail in complex with LC3B. The biological importance of the phenotypes we show are most likely in transmission of the virus between hosts, which for IAV would require animal experiments outside the scope of this study. We have demonstrated regulation of the LIR motif by caspase cleavage in a variety of ways, using cell biological and biochemical methods. IAV is a very significant human and animal pathogen, and we believe we have made an important advance in describing a host-pathogen interaction of relevance for viral egress.

    1. Author response:

      Reviewer #1 (Public Review):

      Weaknesses:

      There are some minor weaknesses.

      Notably, there are not a lot of new insights coming from this paper. The structural comparisons between MCC and PCC have already been described in the literature and there were not a lot of significant changes (outside of the exo- to endo- transition) in the presence vs. absence of substrate analogues.

      We agree that the structures of the human MCC and PCC holoenzymes are similar to their bacterial homologs. That is due to the conserved sequences and functions of MCC and PCC across different species.

      There is not a great deal of depth of analysis in the discussion. For example, no new insights were gained with respect to the factors contributing to substrate selectivity (the factors contributing to selectivity for propionyl-CoA vs. acetyl-CoA in PCC). The authors state that the longer acyl group in propionyl-CoA may mediate stronger hydrophobic interactions that stabilize the alpha carbon of the acyl group at the proper position. This is not a particularly deep analysis and doesn't really require a cryo-EM structure to invoke. The authors did not take the opportunity to describe the specific interactions that may be responsible for the stronger hydrophobic interaction nor do they offer any plausible explanation for how these might account for an astounding difference in the selectivity for propionyl-CoA vs. acetyl-CoA. This suggests, perhaps, that these structures do not yet fully capture the proper conformational states.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We will discuss this limitation in our revised manuscript.

      The authors also need to be careful with their over-interpretation of structure to invoke mechanisms of conformational change. A snapshot of the starting state (apo) and final state (ligand-bound) is insufficient to conclude *how* the enzyme transitioned between conformational states. I am constantly frustrated by structural reports in the biotin-dependent enzymes that invoke "induced conformational changes" with absolutely no experimental evidence to support such statements. Conformational changes that accompany ligand binding may occur through an induced conformational change or through conformational selection and structural snapshots of the starting point and the end point cannot offer any valid insight into which of these mechanisms is at play.

      Point accepted. We will revise our manuscript to use "conformational differences" instead of "conformational changes" to describe the differences between the apo and ligand-bound states.

      Reviewer #2 (Public Review):

      Comments and questions to the manuscripts:

      I'm quite impressed with the protein purification and structure determination, but I think some functional characterization of the purified proteins should be included in the manuscript. The activity of enzymes should be the foundation of all structures and other speculations based on structures.

      We appreciate this comment. However, since we purified the endogenous BDCs and the sample we obtained was a mixture of four BDCs, the enzymatic activity of this mixture cannot accurately reflect the catalytic activity of PCC or MCC holoenzyme. We will acknowledge this limitation in the discussion section of our revised manuscript.

      In Figure 1B, the structure of MCC is shown as two layers of beta units and two layers of alpha units, while there is only one layer of alpha units resolved in the density maps. I suggest the authors show the structures resolved based on the density maps and show the complete structure with the docked layer in the supplementary figure.

      We appreciate this comment. We have shown the cryo-EM maps of the PCC and MCC holoenzymes in fig. S8 to indicate the unresolved regions in these structures. The BC domains in one layer of MCCα in the MCC-apo structure were not resolved. However, we think it would be better to show a complete structure in Fig. 1 to provide an overall view of the MCC holoenzyme. We will revise Fig. 1B and the figure legend to clearly point out which domains were not resolved in the cryo-EM map and were built in the structure through docking.

      In the introduction, I suggest the author provide more information about the previous studies about the structure and reaction mechanisms of BDCs, what is the knowledge gap, and what problem you will resolve with a higher resolution structure. For example, you mentioned in line 52 that G437 and A438 are catalytic residues, are these residues reported as catalytic residues or this is based on your structures? Has the catalytic mechanism been reported before? Has the role of biotin in catalytic reactions revealed in previous studies?

      Point accepted. It was reported that G419 and A420 in S. coelicolor PCC, corresponding to G437 and A438 in human PCC, were the catalytic residues (PMID: 15518551). The same study also reported the catalytic mechanism of the carboxyl transfer reaction. The role of biotin in the BDC-catalyzed carboxylation reactions has been extensively studied (PMIDs: 22869039, 28683917). We will include these information in the introduction section of our revised manuscript.

      In the discussion, the authors indicate that the movement of biotin could be related to the recognition of acyl-CoA in BDCs, however, they didn't observe a change in the propionyl-CoA bound MCC structure, which is contradictory to their speculation. What could be the explanation for the exception in the MCC structure?

      We appreciate this comment. We do not have a good explanation for why we did not observe a change in the propionyl-CoA bound MCC structure. It is noteworthy that neither acetyl-CoA nor propionyl-CoA is the natural substrate of MCC. Recently, a cryo-EM structure of the human MCC holoenzyme in complex with its natural substrate, 3-methylcrotonyl-CoA, has been resolved (PDB code: 8J4Z). In this structure, the binding site of biotin and the conformation of the CT domain closely resemble that in our acetyl-CoA-bound MCC structure. Therefore, the movement of biotin induced by acetyl-CoA binding mimics that induced by the binding of MCC's natural substrate, 3-methylcrotonyl-CoA, indicating that in comparison with propionylCoA, acetyl-CoA is closer to 3-methylcrotonyl-CoA regarding its ability to bind to MCC. We will discuss this possibility in our revised manuscript.

      In the discussion, the authors indicate that the selectivity of PCC to different acyl-CoA is determined by the recognition of the acyl chain. However, there are no figures or descriptions about the recognition of the acyl chain by PCC and MCC. It will be more informative if they can show more details about substrate recognition in Figures 3 and 4.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We will discuss this limitation in our revised manuscript.

      How are the solved structures compared with the latest Alphafold3 prediction?

      Since AlphaFold3 was not released when our manuscript was submitted, we did not compare the solved structures with the AlphaFold3 predictions. We have now carried out the predictions using Alphafold3. Due to the token limitation of the AlphaFold3 server, we can only include two α and six β subunits of human PCC or MCC in the prediction. The overall assembly patterns of the Alphafold3-predicted structures are similar to that of the cryo-EM structures. The RMSDs between PCCα, PCCβ, MCCα, and MCCβ in the apo cryo-EM structures and those in the AlphaFold3-predicted structures are 7.490 Å, 0.857 Å, 7.869 Å, and 1.845 Å, respectively. The PCCα and MCCα subunits adopt an open conformation in the cryo-EM structures but adopt a closed conformation in the AlphaFold-3 predicted structures, resulting in large RMSDs.

    1. AbstractDefining a multicellular model can be challenging. There may be hundreds of parameters that specify the attributes and behaviors of objects. Hopefully the model will be defined using some format specification, e.g., a markup language, that will provide easy model sharing (and a minimal step toward reproducibility). PhysiCell is an open source, physics-based multicellular simulation framework with an active and growing user community. It uses XML to define a model and, traditionally, users needed to manually edit the XML to modify the model. PhysiCell Studio is a tool to make this task easier. It provides a graphical user interface that allows editing the XML model definition, including the creation and deletion of fundamental objects, e.g., cell types and substrates in the microenvironment. It also lets users build their model by defining initial conditions and biological rules, run simulations, and view results interactively. PhysiCell Studio has evolved over multiple workshops and academic courses in recent years which has led to many improvements. Its design and development has benefited from an active undergraduate and graduate research program. Like PhysiCell, the Studio is open source software and contributions from the community are encouraged.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.128), and has published the reviews under the same license. This is part of the PhysiCell Ecosystem Series: https://doi.org/10.46471/GIGABYTE_SERIES_0003

      Reviewer 1. Meghna Verma:

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      The authors have provided links for video descriptions for installation and that is appreciated.

      One overall recommendation is: If all the screenshots (for e.g.: from Fig 1-12 of the main paper and all the subsections in Supplementary) can be combined in one figure that will help enhance the complete overview and the overall flow of the paper.

      Additional comments are available here: https://gigabyte-review.rivervalleytechnologies.comdownload-api-file?ZmlsZV9wYXRoPXVwbG9hZHMvZ3gvVFIvNTA3L1Jldmlld19QaHlzaUNlbGxTdHVkaW9fTVYucGRm

      Reviewer 2. Koert Schreurs and Lin Wouters supervised by Inge Wortel

      Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

      The problem statement is addressed in the introduction, which mentions the need for a GUI tool as a much more accessible way to edit the XML-based model syntax. However, it is somewhat confusing who exactly the intended audience of the paper is. Is the paper targeted at researchers that already use PhysiCell, but might want to switch to the GUI version? Or should it (also) target the potential new user-base of researchers interested in using ABMs, for whom the XML version was not sufficiently accessible and who will now gain access to these models because there is a GUI? Specifying the intended audience might impact some sections of the paper. For example, for users who already use PhysiCell, the step-by-step tutorials might not be useful since they would already know most of the available options; they would just need a quick overview of what info is in which tab. But if the paper is (also) targeted at potential new users, then some additional information could make both the paper and the tool much more accessible, such as:
      
      • A clear comparison to other modeling frameworks and their functionalities. Why should they use PhysiCell instead of one of the other available (GUI) tools? For example, the referenced Morpheus, CC3D and Artistoo all focus on a different model framework (CPMs); this might be worth mentioning. And what about Chaste? Does it represent different types of models, or are there other reasons to consider PhysiCell over Chaste or vice versa? For new users, this would be important information to include. The paper currently also does not mention other frameworks except those that offer a GUI. While the main point of the paper is the addition of the GUI, for completeness sake it might still be good to mention a broader overview of ABM frameworks and how they compare to PhysiCell, or simply to refer to an existing paper that provides such an overview.
      • The current tutorial immediately dives into very specific instructions (what to click and exact values to enter), often without explaining what these options mean or do. New users would probably appreciate to get a rough outline of which types of processes can be modelled, and which steps they would take to do so. This could be as easy as summarising the different main tabs before going into the details. I understand that some of these explanations will overlap with the main PhysiCell software – but considering that the GUI will open up modelling to a different type of community, it might make sense to outline them here to get a self-contained overview of functionality.
      • Indeed, if the above information is provided, the detailed tutorial might fit better as an appendix or in online documentation. That would also leave more space to explain not only which values to enter, but also what these variables do, why choose these values, what other options to consider, etc. Having this information together in one place would be very useful for beginning users.

      Is the source code available, and has an appropriate Open Source Initiative license been assigned to the code?

      The software is available under the GPL v3 licence.

      As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

      There is a Github repository, ensuring that it is possible to contribute and report issues, and the paper explicitly invites community contributions. However, although the paper mentions that it is possible to seek support through Github Issues and “Slack channels”, we could find no link to the latter resource. This should probably be added to make this resource usable for the reader (or otherwise the statement should be removed)

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      Mostly yes, as installation and deployment are outlined in the paper and documentation. However, we did notice a couple of issues: - The studio guide explains how to compile a project in PhysiCell (https://github.com/PhysiCell-Tools/Studio-Guide/blob/main/README.md), but does not mention that Mac users need to specify the g++ version at the top of the Makefile. This is explained in a separate blog (http://www.mathcancer.org/blog/setting-up-gcc-openmp-on-osx-homebrew-edition/) but should be outlined (or at least referenced) here as well. - There are several different resources covering the installation process, referring to e.g. github.com/physicell-training, github.com/PhysiCell-Tools/Studio-Guide, and the abovementioned blog. But this might not be very accessible to all users targeted by the new GUI functionality (especially when command line interventions and manual Makefile edits are involved). While not all of this has to be changed before publication, having all information in one place would already improve accessibility to a larger user-base. - When following the instructions (https://github.com/PhysiCell-Tools/Studio-Guide/blob/main/README.md), “python studio/bin/studio.py -p -e virus-sample” the -p flag gives an error: “Invalid argument(s): [‘-p’]”. We assumed it has to be left out, but perhaps the docs have to be updated.

      Is the documentation provided clear and user friendly?

      Mostly yes, as there is already a lot of documentation available. However, the user-friendliness could be improved with some minor changes. For example, the documentation could be made more user-friendly if resources were available from a central spot. Currently, information can be found in different places: - https://github.com/PhysiCell-Tools/Studio-Guide/blob/main/README.md provides installation instructions and a nice overview of what is where in the GUI, but as mentioned above, does not mention potential issues when installing on MacOS. - The paper provides very detailed examples; these might be nice to include along with the abovementioned overview. - Potentially other places as well. It would be great if the main documentation page could at least link to these other resources with a brief description of what the user will find there. Further, some additions would make the documentation more complete: - It would be good to have an overview somewhere of all the configuration files that can be supplied/loaded (e.g. those for “rules” and for initial configurations). - A clearer instruction/small tutorial on how to use simularium and paraview with physicell studio; especially for paraview there is no instruction on how to use your own data or make your own `.pvsm` file In the longer term, it might be worthwhile to set up a self-contained documentation website (this is relatively easy nowadays using e.g. Github pages), which can outline dependencies, installation instructions, a quick overview, detailed tutorials, example models, links to Github issues/slack communities. This is not a requirement for publication but might be worth looking into in the future as it would be more user-friendly.
      

      Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

      No. The core functionality of the software is nicely outlined in the Github README (https://github.com/PhysiCell-Tools/Studio-Guide/blob/main/README.md), but as mentioned before, this high-level overview is missing in the paper itself. The README and paper recommend installing the Anaconda python distribution to get the required python dependencies. This is fine, but adding a setup file or requirements.txt might still be useful for users who are more familiar with python and want a more minimal installation. Providing a conda environment.yml that allows running the studio along with paraview and/or simularium might also be helpful. Note that running the studio with simularium in anaconda did not work because anaconda did not have the required vtk v9.3.0; instead we had to install simularium without anaconda (“pip3 install simularium”).

      Are there (ideally real world) examples demonstrating use of the software?

      The detail tutorial nicely walks the reader through the tool (although as mentioned before, a high-level overview is missing and the level of detail feels slightly out of place in the paper itself). When walking through the example in the paper and the supplementary, we did run into a few (minor) issues: - It might be good to stress explicitly that after copying the template.xml into tumor_demo.xml, the first step is always to compile using “make”. The paper mentions “Assuming … you have compiled the template project executable (called “project”) …”. But it might not be immediately clear to all users how exactly they should do so (presumably by running “make tumor_demo” after copying the xml file?). - When running “python studio/bin/studio.py -c tumor_demo.xml -e project” as instructed, a warning pops up that “rules0.csv” is not valid (although the tool itself still works). - The instructions for plotting say to press “enter” when changing cmin and cmax, but Mac offers only a return key. Pressing fn+return to get the enter functionality also does not work; it might be good to offer an alternative for Mac. - When reproducing the supplementary tutorial, results were slightly different. It might be good if the example would offer a random seed so that users can verify that they can reproduce these results exactly. In our hands, when reproducing figs 39, 40, 48, 49 yields way more (red) macrophages (even when running multiple times), but we could not be sure if this is due to variation between runs, or a mistake in the settings somewhere.
      
      
      The paper mentions that they have started setting up automated testing, but it does not give an idea of what the current test coverage is. Did they add a few tests here and there, or start to systematically test all parts of the software? I understand the latter might not be achievable immediately, but it would be good if users and/or contributors can at least get a sense of how good the current coverage is. (Note: the framework uses pytest, which seems to offer some functionality to generate coverage reports, see e.g. https://www.lambdatest.com/blog/pytest-code-coverage-report/). The code in studio_for_pytest.py has a comment “do later, otherwise problems sometimes”, but it is not entirely clear if the relevant issue has been resolved.
      

      Additional Comments: The presented tool offers a GUI interface to the PhysiCell framework for agent-based modeling. As outlined for the paper, this offers significant value to the users since editing a model is now much more accessible. The tool comes with extensive functionality and instructions. Overall, the tool functions as advertised, and will be of great value to the community of PhysiCell users that now have to edit XML files by hand. It is therefore (mostly) publishable as is if some of the issues with installation (mentioned above) can be straightened out. That said, we do think some improvements could make both the tool and the paper more accessible to a larger user audience. Most of these have been mentioned in the other questions, but we will list some additional ones below. Note that many of these are just suggestions, so we will leave it up to the authors if and when they implement them.

      Suggestions for the paper: While the paper nicely outlines design ideas and usage of the tool, there were some points where we felt that the main point did not quite come across, for example: - As mentioned in the question about problem statement and intended audience, adding some information to the paper would make it a more useful resource to users not yet familiar with PhysiCell (see remarks there). - The section “Design and development” describes the development history of the tool. In principle this is a valuable addition, because it illustrates how the project is under ongoing development and has already been improved several times based on feedback of users. However, the amount of information on each previous stage is slightly confusing; it is not entirely clear how this relates to the paper and current tool. If the main point is to showcase that the current tool has been built based on practical user experiences, this would probably come across better if this section was somewhat shorter and focused on the design choices rather than previous versions. If the main point is something else, it should be clarified what the main idea is. – The point of Table 1 was unclear to us – consider removing or explaining the main idea. - Several figures do not have captions (e.g. Figure 1 but also others); it would be helpful to clarify what message the figure should convey. – P4 “adjust the syntax for Windows if necessary” – is it self-explanatory how users should adjust? Consider adding the correct code for windows as well if possible, since users that want to use the GUI tool might not be familiar with command line syntax. - P6 “if you create your own custom C++ code referring directly to cell type ID” – this functionality is never discussed. This might be part of the general PhysiCell functionality, but it would be good to at least provide a link to a resource on how you could do this. - P8 “Only those parameters that display … editing the C++ code” – it was not entirely clear to me what this means, could you clarify? - P13 mentions you can immediately see changes to the model parameters made. This is very useful for prototyping when users want immediate feedback. However, what happens when you try to save output for a simulation where parameters were changed while the simulation was running? Would users be reminded that their current output is not representative? - Discussion: it is good to mention that the tool is already being used. Can you give an indication based on your experience how long it takes new users to learn to navigate the tool? This might be useful information to add in the paper. - The last statement on LLMs seems to come out of nowhere. Consider leaving it out or expanding further on what would be needed to make this work/how feasible this is.

      Further comments on the tool itelf: - The paper mentions that results may not be fully reproducible if multiple threads are used (I assume this is the case even when a random seed is set). In this case, would it make sense to throw a warning the first time a user tries to set a seed with multiple threads, to avoid confusion as to why the results are not reproducible? - Unusable fields are not always greyed out to indicate that they are disabled, which sometimes makes it seem as though the tool is unresponsive. In other places unusable options are set to grey, so it might be good to double-check if this is consistent. - At the initial conditions (IC) page there is no legend; it might be good to add one. - There are some small inconsistencies between the field names mentioned in the paper and those in the tool/screenshots. For example “boundary condition” (p5) should be “dirichlet BC”, “uptake” (p6) should be “uptake rate”. For the latter, the paper mentions that the length scale is 100 micron but this should be visible in the tool as well. - Not all fields have labels, so it is not always clear what the options do (see e.g. drop-downs in Figure 6). – There are a few points in the tool where you have to “enable” a functionality before it works, but this might not always be intuitive. For example, if you upload a file with initial conditions, it can be assumed that you want to use it. There might be good reasons for this in some cases but in general, consider if all these checkpoints are necessary or if this could be simplified. Same goes for the csv files that have to be saved separately instead of through the main “save” button – in the long term it might be worth saving all relevant files when they are updated, or at least throwing a warning that you have to save some of them separately.

    1. AbstractDespite advances in identifying genetic markers associated to severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores the use of imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≈0.97) across sequencing platforms, showing GLIMPSE1’s ability to confidently impute variants with minor allele frequencies as low as 2% in Spanish ancestry individuals. We conducted a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here may be leveraged in future genomic projects, providing vital insights for health challenges like COVID-19.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.127 ), and has published the reviews under the same license. For a video summary from the author see: https://youtu.be/x6oVzt_H_Pk?si=Byufhl0mIL3h0K6u

      The reviews are as follows:

      Reviewer 1. Jong Bhak:

      Severe cases of covid-19 patients are critical data. This manuscript deals with detailed clinical information genome set as a subset of exome sequences and provide invaluable data for on-going global covid-19 omics studies.

      Reviewer 2. Alfredo Iacoangeli:

      The authors present the release of a new dataset that include low coverage WGS data of 79 individuals who experienced severe covid-19 in Madrid (Spain). The authors processed the data and imputed common variants and they are making this dataset available to the scientific community. They also present the clinical data of these patients in a descriptive and informative fashion. Finally, the authors also validated the quantify of their imputation, showcasing the potential of low coverage WGS as an alternative to microarrays. Overall the manuscript is written very well, clear, and exhaustive. The data is certainly valuable. Its generation and processing and analysis appears robust.
      

      Overall I support the publication of this article and dataset. I only have a small number of minor suggestions for the authors: The sentence "Traditionally, the genotyping process has relied on array technologies as the standard, both at the broader GWAS level and the more specific genetic scoring and genetic diagnostics levels" sounds a little off. I totally understand where the authors come from but given the central role of NGS and Sanger for genetic diagnostics I would suggest the authors to modify accordingly or to keep the GWAS focus.

      Please double-check the use a statistical terms in the description of the imputed data. For example: "On average, each VCF file in this rich dataset contains 9.49 million high-confidence single nucleotide variants [95%CI: 9.37 million - 9.61 million] (Figure 1)." The use of CI in this context is a little miss-leading as it is not strictly referring to a distribution of probability but to a finite collection. A range would be more appropriate. The authors say that they examined the ethnicity of the 79 individuals, however I do not think the ancestry is actually reported anywhere while a few figures show ancestral population data. The authors might clarify or correct the terminology.

      Looking at figure 2 the sentence " although the male age distribution exhibits a broader range and higher variability, suggestive of a greater" does not appear justified. The authors might want to clarify or correct accordingly.

      The sentence "This exploratory analysis highlights the diverse ways in which severe COVID-19 can present, and the importance of comprehensive and nuanced clinical phenotyping in improving our understanding and management of the disease." suggests some basic clustering might be useful. The readers might benefit from a couple of graphs or figures quantifying the overlap of the SNPs across samples and maybe one that shows the density of SNPs across the genome.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Tutak et al use a combination of pulldowns, analyzed by mass spectrometry, reporter assays, and fluorescence experiments to decipher the mechanism of protein translation in fragile X-related diseases. The topic is interesting and important.

      Although a role for Rps26-deficient ribosomes in toxic protein translation is plausible based on already available data, the authors' data are not carefully controlled and thus do not support the conclusions of the paper.

      Strengths:

      The topic is interesting and important.

      Weaknesses:

      In particular, there is very little data to support the notion that Rps26-deficient ribosomes are even produced under the circumstances. And no data that indicate that they are involved in the RAN translation. Essential controls (for ribosome numbers) are lacking, no information is presented on the viability of the cells (Rps26 is an essential protein), and the differences in protein levels could well arise from block in protein synthesis, and cell division coupled to differential stability of the proteins.

      We agree that presented data could benefit from addition of suggested experiments. We will  address the ribosome content, global translation rate and cell viability upon RPS26 depletion. We are also planning to apply polysome profiling to determine if RPS26-depleted ribosomes are translationally active.

      Specific points:

      (1) Analysis of the mass spec data in Supplemental Table S3 indicates that for many of the proteins that are differentially enriched in one sample, a single peptide is identified. So the difference is between 1 peptide and 0. I don't understand how one can do a statistical analysis on that, or how it would give out anything of significance. I certainly do not think it is significant. This is exacerbated by the fact that the contaminants in the assay (keratins) are many, many-fold more abundant, and so are proteins that are known to be mitochondrial or nuclear, and therefore likely not actual targets (e.g. MCCC1, PC, NPM1; this includes many proteins "of significance" in Table S1, including Rrp1B, NAF1, Top1, TCEPB, DHX16, etc...).

      The data in Table S6/Figure 3A suffer from the same problem.

      Tables S3 and S6 show the mass spectrometry output data from MaxQuant analysis  without any flittering.  Certain identifications, i.e. those denoted as contaminants (such as keratins) were removed during statistical analysis in Perseus software. Regarding the data presented in Table S6 (SILAC data), we argue that these data are of very good quality. More than 2000 proteins were identified in a 125min gradient, with over 80% of proteins that were identified with at least 2 unique peptides. However, we acknowledge that the description of Tables S3 and S6 may lead to misunderstanding, thus we will clarify their explanation.

      I am not convinced that the mass spec data is reliable.

      (2) The mass-spec data however claims to identify Rps26 as a factor binding the toxic RNA specifically. The rest of the paper seeks to develop a story of how Rps26-deficient ribosomes play a role in the translation of this RNA. I do not consider that this makes sense.

      Indeed, we identified RPS26 as a protein co-precipitated with FMR1 RNA containing expanded CGG repeats. However, we do not claim that they interact directly. Downregulation of FMRpolyG biosynthesis could be an outcome of the alteration of ribosomal assembly, changes in efficiency and fidelity of PIC scanning or impeded elongation or more likely combination of some of these processes. We will  provide better explanation regarding those issues in the revised version of the manuscript.

      (3) Rps26 is an essential gene, I am sure the same is true for DHX15. What happens to cell viability? Protein synthesis? The yeast experiments were carefully carried out under experiments where Rps26 was reduced, not fully depleted to give small growth defects.

      We agree with the Reviewer 1 that RPS26 is an essential protein. Previously, it was shown that cell viability in cells with mutated C-terminal deletion of RPS26 is decreased (Havkin-Solomon T, Nucleic Acids Res 2023). We will address the question regarding the suppression of FMRpolyG in models with partial RPS26 knock-down.

      (4) Knockdown efficiency for all tested genes must be shown to evaluate knockdown efficiency.

      Missing experiments showing efficiency of knock-down will be included in the revised version of the manuscript.

      (5) The data in Figure 1E have just one mock control, but two cell types (control si and Rps26 depletion).

      We will clarify this ambiguity in the revised version of the manuscripts.

      (6) The authors' data indicate that the effects are not specific to Rps26 but indeed also observed upon Rps25 knockdown. This suggests strongly that the effects are from reduced ribosome content or blocked protein synthesis. Additional controls should deplete a core RP to ascertain this conclusion.

      We agree that observed effect may stem partially from reduced ribosome content, however, we argue that this is not the only explanation. In the publication concerning RPS25 regulation of G4C2-related RAN translation (Yamada SB, 2019, Nat Neurosci), it was shown that RPS25 KO does not affect global translation. Our experiments (SUnSET assay, unpublished) indicated that RPS26 KD also did not reduce global translation rate significantly. We will present that data in the revised version of the manuscript.

      (7) Supplemental Figure S3 demonstrates that the depletion of S26 does not affect the selection of the start codon context. Any other claim must be deleted. All the 5'-UTR logos are essentially identical, indicating that "picking" happens by abundance (background).

      Results shown in Fig.S3 does not imply that RPS26 does not affect the selection of start codon context entirely. We just tested a few hypotheses. We decided to test -4 position, because this position was indicated as the most sensitive to RPS26 regulation in yeast (Ferretti M, 2017, Nat Struct Mol Biol). Regarding WebLOGO analysis; we wrote in the manuscript that we did not identify any specific motif or enrichment within analysed transcripts in comparison to background. We will clarify this ambiguity in revised version of the manuscript.

      (8) Mechanism is lacking entirely. There are many ways in which ribosomes could have mRNA-specific effects. The authors tried to find an effect from the Kozak sequence, unsuccessfully (however, they also did not do the experiment correctly, as they failed to recognize that the Kozak sequence differs between yeast, where it is A-rich, and mammalian cells, where it is GGCGCC). Collisions could be another mechanism.

      As in (7).

      Reviewer #2 (Public Review):

      Summary:

      Translation of CGG repeats leads to the accumulation of poly G, which is associated with neurological disorders. This is a valuable paper in which the authors sought out proteins that modulate RAN translation. They determined which proteins in Hela cells bound to CGG repeats and affected levels of polyG encoded in the 5'UTR of the FMR1 mRNA. They then showed that siRNA depletion of ribosomal protein RPS26 results in less production of FMR1polyG than in control. There are data supporting the claim that RPS26 depletion modulates RAN translation in this RNA, although for some results, the Western results are not strong. The data to support increased aggregation by polyG expression upon S26 KD are incomplete.

      Strengths:

      The authors have proteomics data that show the enrichment of a set of proteins on FMR1 RNA but not a related RNA.

      Weaknesses:

      - It is insinuated that RPS26 binds the RNA to enhance CGG-containing protein expression. However, RPS26 reduction was also shown previously to affect ribosome levels, and reduced ribosome levels can result in ribosomes translating very different RNA pools.

      We agree that presented data could benefit from addition of some experiments. Therefore we will address questions regarding the ribosome content, global translation rate and cell viability upon RPS26 depletion. We are also planning to apply polysome profiling to determine if RPS26-depleted ribosomes are translationally active. However, we did not state that RPS26 binds directly to RNA with expanded CGG repeats and that this interaction is crucial for translation regulation of studied RNA. We just tested such hypotheses. We will improve the text narration in revised version of the manuscript to make major conclusions clearer.

      - A significant claim is that RPS26 KD alleviates the effects of FMRpolyG expression, but those data aren't presented well.

      We thank the Reviewer 2 for this comment. We will show the data derived from a few different cell models that we already have obtained. Moreover, we will include results of experiments with luminescence readout for FMRpolyG fused with luciferase upon RPS26 KD.

      Reviewer #3 (Public Review):

      Tutak et al provide interesting data showing that RPS26 and relevant proteins such as TSR2 and RPS25 affect RAN translation from CGG repeat RNA in fragile X-associated conditions. They identified RPS26 as a potential regulator of RAN translation by RNA-tagging system and mass spectrometry-based screening for proteins binding to CGG repeat RNA and confirmed its regulatory effects on RAN translation by siRNA-based knockdown experiments in multiple cellular disease models and patient-derived fibroblasts. Quantitative mass spectrometry analysis found that the expressions of some ribosomal proteins are sensitive to RPS26 depletion while approximately 80% of proteins including FMRP were not influenced. Since the roles of ribosomal proteins in RAN translation regulation have not been fully examined, this study provides novel insights into this research field. However, some data presented in this manuscript are limited and preliminary, and their conclusions are not fully supported.

      (1) While the authors emphasized the importance of ribosomal composition for RAN translation regulation in the title and the article body, the association between RAN translation and ribosomal composition is apparently not evaluated in this work. They found that specific ribosomal proteins (RPS26 and RPS25) can have regulatory effects on RAN translation(Figures 1C, 2B, 2C, 2E, 4A, 5A, and 5B), and that the expression levels of some ribosomal proteins can be changed by RPS26 knockdown (Figure 3B, however, the change of the ribosome compositions involved in the actual translation has not been elucidated). Therefore, their conclusive statement, that is, "ribosome composition affects RAN translation" is not fully supported by the presented data and is misleading.

      We thank Reviewer 3 for critical comments and suggestions. We agree that the proposed title may be misleading and the presented data does not fully support the aforementioned statement regarding ribosomal composition affecting FMRpolyG synthesis. Hence, we will change the title together with a narrative regarding these unfortunate statements that go beyond the presented results.

      (2) The study provides insufficient data on the mechanisms of how RPS26 regulates RAN translation. Although authors speculate that RPS26 may affect initiation codon fidelity and regulate RAN translation in a CGG repeat sequence-independent manner (Page 9 and Page 11), what they really have shown is just identification of this protein by the screening for proteins binding to CGG repeat RNA (Figure 1A, 1B), and effects of this protein on CGG repeat-RAN translation. It is essential to clarify whether the regulatory effect of RPS26 on RAN translation is dependent on CGG repeat sequence or near-cognate initiation codons like ACG and GUG in the 5' upstream sequence of the repeat. It would be better to validate the effects of RPS26 on translation from control constructs, such as one composed of the 5' upstream sequence of FMR1 with no CGG repeat, and one with an ATG substitution in the 5' upstream sequence of FMR1 instead of near-cognate initiation codons.

      We will address the question regarding the influence of the content of CGG repeats and START codon selection (including different near-cognate start codons) on RPS26-sensitive translation, and present these data in revised version of the manuscript.

      (3) The regulatory effects of RPS26 and other molecules on RAN translation have all been investigated as effects on the expression levels of FMRpolyG-GFP proteins in cellular models expressing CGG repeat sequences Figures 1C, 2B, 2C, 2E, 4A, 5A, and 5B). In these cellular experiments, there are multiple confounding factors affecting the expression levels of FMRpolyG-GFP proteins other than RAN translation, including template RNA expression, template RNA distribution, and FMRpolyG-GFP protein degradation. Although authors evaluated the effect on the expression levels of template CGG repeat RNA, it would be better to confirm the effect of these regulators on RAN translation by other experiments such as in vitro translation assay that can directly evaluate RAN translation.

      We agree that there are multiple factors affecting final translation of investigated mRNA including aforementioned processes. We evaluated the level of FMR1 mRNA, which turned out not to be affected upon RPS26 depletion (Figure 2B&C), however, we will address other possibilities as well.

      (4) While the authors state that RPS26 modulated the FMRpolyG-mediated toxicity, they presented limited data on apoptotic markers, not cellular viability (Figure 1E), not fully supporting this conclusion. Since previous work showed that FMRpolyG protein reduces cellular viability (Hoem G et al., Front Genet 2019), additional evaluations for cellular viability would strengthen this conclusion.

      We thank Reviewer 3 for this suggestion. We addressed the effect of RPS26 KD on apoptotic process induced by FMRpolyG. We will perform other experiments regarding different aspects of FMRpolyG-mediated cell toxicity as well.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The mechanisms of how axonal projections find their correct target requires the interplay of signalling pathways, and cell adhesion that act over short and long distances. The current study aims to use the small ventral lateral clock neurons (s-LNvs) of the Drosophila clock circuit as a model to study axon projections. These neurons are born during embryonic stages and are part of the core of the clock circuit in the larval brain. Moreover, these neurons are maintained through metamorphosis and become part of the adult clock circuit. The authors use the axon length by means of anti-Pdf antibody or Pdf>GFP as a read-out for the axonal length. Using ablation of the MB- the overall target region of the s-LNvs, the authors find defects in the projections. Next, by using Dscam mutants or knock-down they observe defects in the projections. Manipulations by the DNs - another group of clock neurons- can induce defects in the s-LNvs axonal form, suggesting an active role of these neurons in the morphology of the s-LNvs.

      Strengths:

      The use of Drosophila genetics and a specific neural type allows targeted manipulations with high precision.

      Proposing a new model for a small group of neurons for axonal projections allows us to explore the mechanism with high precision.

      Weaknesses:

      It is unclear how far the proposed model can be seen as developmental.

      The study of changes in fully differentiated and functioning neurons may affect the interpretation of the findings.

      We appreciate the reviewer's feedback on the strengths and weaknesses of our study.

      We acknowledge the strengths of our research, particularly the precision afforded by using Drosophila genetics and a specific neural type for targeted manipulations, as well as the proposal of a new model for studying axonal projections in a small group of neurons.

      We understand the concerns about the developmental aspects of our proposed model and the use of Pdf-GAL4 >GFP as a read-out for the axonal length (revised manuscript Figure 1--figure supplement 1). However, even with the use of Clk856-GAL4 that began to be expressed at the embryonic stage (revised manuscript Figure 3--figure supplement 1) to suppress Dscam expression, the initial segment of the dorsal projection of s-LNvs (the vertical part) remained unaffected. Instead, the projection distance is severely shortened towards the midline, and this defect persists until the adult stage. It is for this reason that we delineate the dorsal projections of s-LNvs into two distinct phases: the vertical and horizontal parts, rather than a mere expansion in correspondence with the development of the larval brain.

      Thank you for your valuable feedback, and we have incorporated these considerations into our revised manuscript to enhance the clarity and depth of our research.

      Reviewer #2 (Public Review):

      Summary:

      The paper from Li et al shows a mechanism by which axons can change direction during development. They use the sLNv neurons as a model. They find that the appearance of a new group of neurons (DNs) during post-embryonic proliferation secretes netrins and repels horizontally towards the midline, the axonal tip of the LNvs.

      Strengths:

      The experiments are well done and the results are conclusive.

      Weaknesses:

      The novelty of the study is overstated, and the background is understated. Both things need to be revised.

      We appreciate your acknowledgment that the experiments were well-executed and the results conclusive. This validation reinforces the robustness of our findings.

      We take note of your feedback regarding the novelty of the study being overstated and the background being understated. While axonal projections navigate without distinct landmarks, like the midline or the layers, columns, and segments, they pose more challenges and uncertainties. As highlighted, our key contribution lies in elucidating how axonal projections without clear landmarks are guided, with our research demonstrating how a newly formed cluster of cells at a specific time and location provides the necessary guidance cues for axons.

      We value your insights, and we have carefully addressed these points in our manuscript revision to improve the overall quality and presentation of our research.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The overall idea of using the s-LNvs as a model is indeed intriguing. There are genetic tools available to tackle these cells with great precision.

      However, based on the stage at which these cells are investigated raises some issues, that I feel are critical to be addressed.

      These neurons develop their axonal projections during embryogenesis and are fully functioning when the larvae hatch, thus to investigate axonal pathfinding one would have to address embryonic development.

      The larval brain indeed continues to grow during larval life, however extensive work from the Hartenstein lab, Truman lab, and others have shown that the secondary (larval born) neurons do not yet wire into the brain, but stall their axonal projections.

      It is thus quite unclear, what the authors are actually studying.

      One interpretation could be that the authors observe changes in axon length due to morphological changes in the brain. Indeed, the fact that the MB expands the anatomy of the surrounding neuropil changes too.

      Moreover, it is unclear when exactly the Pdf-Gal4 (and other drivers) are active, thus how far (embryonic) development of s-LNvs is affected, or if it's all happening in the differentiated, functioning neuron. (Gal4 temporal delay and dynamics during embryonic development may further complicate the issue). As far as I am aware the MB drivers might already be active during embryonic stages.

      Since the raised issue is quite fundamental, I am not sure what might be the best and most productive fashion to address this.

      Eg. either to completely re-focus the topic on "neural morphology maintenance" or to study the actual development of these cells.

      We thank the reviewer for the detailed and insightful feedback on our study. We have tested whether Pdf-Gal4 could effectively label s-LNv, and tracked the s-LNv projection in the early stage after larvae hatching. We did not observe the PDF antibody staining signal and the GFP signal driven by Pdf-GAL4 when the larvae were newly hatched. At 2-4 hours ALH, PDF signals were primarily concentrated at the end of axons, while GFP signals were mainly concentrated at the cell body. Helfrich-Förster initially detected immunoreactivity for PDF in the brains approximately 4-5 hours ALH. The GFP signal expressed by Pdf-GAL4 driver does have signal delay. However, at 8 hours ALH, the GFP signal strongly co-localized with the PDF signal within the axons (see revised manuscript lines 98-101) (Figure 1—figure supplement 1).

      Based on previous research findings and our staining of Clk856-GAL4 >GFP, it is indeed confirmed that the dorsal projection of s-LNvs in Drosophila is formed during the embryonic stage (Figure 3—figure supplement 1). The s-LNvs in first-instar larval Drosophila are capable of detecting signal output and may play a role in regulating certain behaviors. Our selection of tools for characterizing the projection pattern of s-LNv was not optimal, leading us to overlook the crucial detail that the projection had already formed during its embryonic stage.

      However, even when employing Clk856-GAL4 to suppress Dscam expression from the embryonic stage, the initial segment of the dorsal projection of s-LNvs (the vertical part) remains unaffected. Instead, the projection distance is severely shortened towards the midline, and this defect persists until the adult stage. It is for this reason that we delineate the dorsal projections of s-LNvs into two distinct phases: the vertical and horizontal parts, rather than a mere expansion in correspondence with the development of the larval brain.

      From the results searched in the Virtual Fly Brain (VFB) database (https://www.virtualflybrain.org/), it is clear that the neurons that form synaptic connections with s-LNvs at the adult stage are essentially completely different from the neurons that are associated with them at the L1 larval stage. Thus, most neurons that form synapses with s-LNvs in the early larvae either cease to exist after metamorphosis or assume other roles in the adult stage. Similar to the scenario where Cajal-Retzius cells and GABAergic interneurons establish transient synaptic connections with entorhinal axons and commissural axons, respectively, these cells form a transient circuit with presynaptic targets and subsequently undergo cell death during development. In our model, the neurons that synapse with s-LNvs in early development serve as "placeholders," offering positive or negative cues to guide the axonal targeting of s-LNvs towards their ultimate destination.

      Thank you again for your valuable feedback, and we have incorporated these considerations into our revised manuscript to enhance the clarity and depth of our research.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      In the introduction too many revisions are cited and very few actual research papers. This should be corrected and the most significant papers in the field should be cited. For example, there is no reference to the pioneering work from the Christine Holt lab or the first paper looking at axon guidance and guideposts by Klose and Bentley, Isbister et al 1999.

      The introduction should encapsulate the actual knowledge based on actual research papers.

      We acknowledge your concern regarding the citation of review papers rather than primary research papers in the introduction. Following your suggestion, we have revised the introduction section to incorporate references to relevant research papers.

      In the introduction and discussion: The authors cite revisions where the signals that guide axons across different regions including turning are shown and they end up saying: "However, how the axons change their projection direction without well-defined landmarks is still unclear." I think the sentence should be changed. Many things are still not clear but this is not a good phrasing. Maybe they could focus on their temporal finding?

      We appreciate the reviewer's feedback and insightful suggestions. We agree that emphasizing the temporal aspect is crucial in our study. However, we also recognize the significance of understanding the origin of signals that guide axonal reorientation at specific locations. While axonal projections navigating without distinct landmarks pose more challenges and uncertainties compared to those guided by prominent landmarks like the midline, our research demonstrates the crucial role of a specific cell population near turning points in providing accurate guidance cues to ensure precise axonal reorientation. We have revised our phrasing in the introduction and discussion to better reflect these key points (see revised manuscript lines 69-71 and 350-354). Thank you for highlighting the significance of focusing on our temporal findings and the complexities involved in studying axonal projection.

      Many rather old papers have looked into the effect of repulsive guideposts to guide axon projections. In particular, I can think of the paper from Isbister et al. 1999 (DOI: 10.1242/dev.126.9.2007) that not only shows how semaphoring guides Ti axon projection but also shows how the pattern of expression of sema 2a changes during development to guide the correct projection. I really think that the novelty of the paper should be revised in light of the actual knowledge in the field.

      We appreciate the reviewer's reference to the seminal work by Isbister et al. (1999) and the importance of guidepost cells in axon projection guidance, which we have already cited in our revised manuscript. It is crucial to recognize that segmented patterns such as the limb segment traversed by Ti1 neuron projections or neural circuits formed in a layer- or column-specific manner also serve as intrinsic "guideposts," offering valuable insights into axonal pathfinding processes. In our model, explicit guidance cues are lacking. As highlighted, our key contribution lies in elucidating how axonal projections without clear landmarks are guided, with our research demonstrating how a newly formed cluster of cells at a specific time and location provides the necessary guidance cues for axons (see revised manuscript lines 350-354). We have ensured that our revised manuscript reflects these insights and emphasizes the significance of studying axonal guidance in the absence of distinct guideposts. Thank you for underscoring these essential points, which enhance our understanding of axonal projection dynamics.

      Minors:

      Line 54, the authors start talking about floorplate at the end of a section on Drosophila. Please use “In vertebrates”, or “in invertebrates” or “in Drosophila” etc.. when needed to put things in context.

      We thank the reviewer for this suggestion and have modified this sentence. Please refer to lines 62-63 of the revised manuscript.

      Line 69: many factors change the axonal outgrowth. The authors are missing the paper from Fernandez et al. 2020, who have shown that unc5 the receptor of netrin induces the stalling for sLNvs projections before the turn. https://doi.org/10.1016/j.cub.2020.04.025

      We thank the reviewer for this suggestion and have added this research article. Please refer to line 79 of the revised manuscript.

      Line 99: "precisely at the pivotal juncture". It I hard to see how it was done in the figures shown. Can the authors add a small panel with neuronal staining showing this (please no HRP)?

      For all figures, tee magenta is too strong and it is really hard to see the sLNvs projections. Can this be sorted, please?

      We have depicted the pivotal juncture in the schematic diagram on the left side of Figure 1C. Additionally, we have included a separate column of images without HRP in Figure 1A. Moreover, we have modified the pseudo-color of HRP from magenta to blue to enhance the visualization of the s-LNv projection. The figure legends have also been correspondingly modified.

      Line 407: Spatial position relationship between calyx and s-LNvs. OK107-GAL4 labels ... calyx and s-LNvs labeled by, which which.

      We have modified it according to your suggestion. Please refer to lines 430-432 of the revised manuscript.

      Line 137 typo RPRC

      We thank the reviewer for noticing this mistake, which has now been corrected. Please refer to line 148-149 of the revised manuscript.

      Section 158-164. the paper from Zhang et al 2019 needs to be cited since they have found the same effect of decreasing Dscam even if they didn't think about horizontal projection.

      Thanks to the suggestion, we have included in the manuscript the phenotype observed by Zhang et al. (2019) upon knocking down Dscam1-L in adults. Please refer to lines 170-172 of the revised manuscript.

      Line 176: typo senses (instead of sensor).

      Thank you for pointing out our mistake. We have modified it according to your suggestion. Please refer to line 189 of the revised manuscript.

      Line 193: more than Interesting it is Notable. Add "ubiquitus" knockdown.

      Thank you for the suggestion. We have included the word "ubiquitus" to enhance the precision of the narrative. Please refer to line 206 of the revised manuscript.

      Line 224: the pattern of expression of the crz cells is not visible where the projections of sLNvs are located. Are they in that region? Or further away?

      We've changed the pseudo-color of HRP, and in the updated Figure 5- figure supplement 1, you can see the projection pattern of crz+ cells, positioned close to the end of the s-LNv axon terminal.

      Line 243: applied? Do you mean "used"

      Thank you for the suggestion. We have revised it at line 256.

      Figure 5 Sup1: the schematic shows DNs proliferation that is not visible on the GFP image. Please comment.

      We have modified the Figure 5 figure supplementary 1 for 120 h per-GAL4, Pdf-GAL80 >GFP expression pattern. Due to the strong GFP intensity in some DN neurons, there was a loss of GFP signal. Additionally, in Figure 6 figure supplementary 1, we have added co-localization images of DN and s-LNv at 72 h and 96 h. To better illustrate the co-localization information, we have shown only a portion of the layers in the right panel. We hope these additions clarify your concerns.

      Line 251: cite Fernandez et al. 2020 with Purohit et al 2012.

      We have modified it according to your suggestion. Please refer to line 264 of the revised manuscript.

      Line 272: you have not shown synergistic effects because you have not modulated both pathways at the same time. You should talk about complementary.

      We have modified it according to your suggestion at lines 25, 285, 439.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Point for more elaborate discussion: Apparently the timescale of negative feedback signals is conserved between endothelial cell migration in vitro (with human cells) and endothelial migration during the formation of ISVs in zebrafish. What do you think might be an explanation for such conserved timescales? Are there certain processes within cytoskeletal tension build up that require this quantity of time to establish? Or does it relate to the time that is needed to begin to express the YAP/TAZ target genes that mediate feedback?

      The underlying mechanisms responsible for the conserved timescale is a major direction that we continue to explore. Localization of YAP/TAZ to the nucleus is likely not rate-limiting. We showed previously that acute RhoA activation produced significant YAP/TAZ nuclear localization within minutes, while subsequent co-transcriptional activity aligned with the gene expression dynamics observed here (Berlew et al., 2021). We hypothesize that the dynamics of YAP/TAZdependent transcription and the translation of those target genes are rate-limiting for initial feedback loop completion (tic = 4 hours). This is supported by work from us and others in a variety of cell lines showing YAP/TAZ transcriptional responses take place during the first few hours after activation. (Franklin et al., 2020; Mason et al., 2019; Plouffe et al., 2018) While our data identify mediators of initial feedback loop completion, the molecular effectors that determine the timescale of new cytoskeletal equilibrium establishment (teq = 8 hours) remain unclear.

      (2) Do you expect different timescales for slower endothelial migratory processes (e.g. for instance during fin vascular regeneration which takes days)?

      We selected the ISV development model because it exhibits similar migratory kinetics to our previously-explored human ECFC migration in vitro. The comparable kinetics allowed us to study dynamics of the feedback loop in vivo on similar time scales, but we have not explored models featuring either slower or faster dynamics. 

      It would be interesting to test how feedback dynamics are impacted in distinct endothelial migratory processes. Our data suggest that the feedback loop is necessary for persistent migration; however, YAP and TAZ respond to a diversity of upstream regulators in addition to mechanical signals, which might depend on the process of vascular morphogenesis. For example, after fin amputation, inflammation and tissue regeneration may impact the biochemical and mechanical environment experienced by the endothelium. Additionally, cells display different migratory behaviors in ISV morphogenesis compared to fin regeneration. During ISV formation, sprouting tip cells migrate dorsally through avascular tissue, followed by stalk cells. (Ellertsdóttir et al., 2010) In contrast, the fin vasculature regenerates by forming an intermediate vascular plexus, where some venous-derived endothelial cells migrate towards the sprouting front, while others migrate against it. (Xu et al., 2014) We are excited to study the role of this feedback loop in these different modes of neovessel formation in future studies.

      (3) Is the ~4hrs and 8hrs feedback time window a general property or does it differ between specific endothelial cell types? In the veins the endothelial cells generate less stress fibers and adhesions compared to in the arteries. Does this mean that there might be a difference in the feedback time window, or does that mean that certain endothelial cell types may not have such YAP/TAZcontrolled feedback system?

      Recent studies suggest that venous endothelial cells are the primary endothelial subtype responsible for blood vessel morphogenesis. (Lee et al., 2022, 2021; Xu et al., 2014) They are highly motile and mechanosensitive, migrating against blood flow. (Lee et al., 2022) The Huveneers group has shown that the actin cytoskeleton is differently organized in adult arteries and veins in response to biomechanical properties of its extracellular matrix, rather than intrinsic differences between arterial and venous cells. (van Geemen et al., 2014) This suggests that arterial and venous cells have distinct cytoskeletal setpoints due to mechanical cues in their environment (Price et al., 2021). We expect this to impact the degree of cytoskeletal remodeling and cell migration at equilibrium, rather than the kinetics of the feedback loop per se, though we have not yet tested this hypothesis. Testing these predictions on cytoskeletal setpoint stability and adaptation is a major direction that we continue to explore. 

      (4) The experiments are based on perturbations to prove that transcriptional feedback is needed for endothelial migration. What would happen if the feedback systems is always switched on? An experiment to add might be to analyse the responsiveness of endothelial cells expressing constitutively active YAP/TAZ.

      This is a problem that we are actively pursuing. Though the feedback system forms a coherent loop, we anticipate that the identity of the node of the loop selected for constitutive activation will influence the outcome, depending on whether that node is rate-limiting for feedback kinetics and the extent of intersection of that node with other signaling events in the cell. For example, we have observed that constitutive YAP activation drives profound changes to the transcriptional landscape including, but not limited to, RhoA signaling (Jones et al., 2023). We further anticipate that constitutive activation of feedback loop nodes may alter feedback dynamics, while dynamic or acute perturbation will be required to dissect these contributions in real time. For these reasons, ongoing work in the lab is pursuing these questions using optogenetic tools that enable precise spatial and temporal control (Berlew et al., 2021).   

      (5) To investigate the role of YAP-mediated transcription in an accurate time-dependent manner the authors may consider using the recently developed optogenetic YAP translocation tool: https://doi.org/10.15252/embr.202154401

      We are enthusiastic about the power of optogenetics to interrogate the nodes and timescales of this feedback system, and we are now funded to pursue this line of research. 

      Reviewer #2:

      The idea is intriguing, but it is not clear how the feedback actually works, so it is difficult to determine if the events needed could occur within 4 hrs. Specifically, it is not clear what gene changes initiated by YAP/TAZ translocation eventually lead to changes in Rho signaling and contractility. Much of the evidence to support the model is preliminary. Some of the data is consistent with the model, but alternative explanations of the data are not excluded. The fish washout data is quite interesting and does support the model. It is unclear how some of the in vitro data supports the model and excludes alternatives.

      Major strengths:

      The combination of in vitro and in vivo assessment provides evidence for timing in physiologically relevant contexts, and a rigorous quantification of outputs is provided. The idea of defining temporal aspects of the system is quite interesting.

      Major weaknesses:

      The evidence for a "loop" is not strong; rather, most of the data can also be interpreted as a linear increase in effect with time once a threshold is reached. Washout experiments are key to setting up a time window, yet these experiments are presented only for the fish model. A major technical challenge is that siRNA experiments take time to achieve depletion status, making precise timing of events on short time scales problematic. Also, Actinomycin D blocks most transcription so exposure for hours likely leads to secondary and tertiary effects and perhaps effects on viability. No RNA profiling is presented to validate proposed transcriptional changes.

      We thank the reviewer for these helpful suggestions. We have expanded our explanation of the history and known mediators of the feedback loop in the introduction. We and, independently, the Huveneers group recently reported that human endothelial cells maintain cytoskeletal equilibrium for persistent motility through a YAP/TAZ-mediated feedback loop that modulates cytoskeletal tension. (Mason et al., 2019; van der Stoel et al., 2020) Because YAP and TAZ are activated by tension of the cytoskeleton (Dupont et al., 2011), suppression of cytoskeletal tension by YAP/TAZ transcriptional target genes constitutes a negative feedback loop (Fig. 1A). We described key components of this cell-intrinsic feedback loop, which acts as a control system to maintain cytoskeletal homeostasis for persistent motility via modulation of Rho-ROCK-myosin II activity. (Mason et al., 2019) Both we and the Huveneers group found that perturbation of genes and pathways regulated by YAP/TAZ mechanoactivation can functionally rescue motility in YAP/TAZ-depleted cells (e.g., RhoA/ROCK/myosin II, NUAK2, DLC1). (Mason et al., 2019; van der Stoel et al., 2020) We further showed previously that both YAP/TAZ depletion and acute YAP/TAZ-TEAD inhibition consistently increased stress fiber and FA maturation and arrested cell motility, accounting for these limitations of siRNA. (Mason et al., 2019)

      Enduring limitations to the temporal, spatial, and cell-specific control of the genetic and pharmacologic methods have inspired us to initiate alternative approaches, which are the subject of ongoing efforts. Further research will be necessary in the zebrafish to determine the extent to which the observed migratory dynamics are driven by cytoskeletal arrest. 

      To identify early YAP/TAZ-regulated transcriptional changes, we have added RNA profiling of control and YAP/TAZ depleted cells cultured on stiff matrices for four hours. Genes upregulated by YAP/TAZ depletion were enriched for Gene Ontology (GO) terms associated with Rho protein signal transduction, vascular development, cellular response to vascular endothelial growth factor (VEGF) stimulus, and endothelial cell migration (Fig. 9B). These data support a role for YAP and TAZ as negative feedback mediators that maintain cytoskeletal homeostasis for endothelial cell migration and vascular morphogenesis.  

      Reviewer #3:

      The authors used ECFC - endothelial colony forming cells (circulating endothelial cells that activate in response to vascular injury).

      Q: Did the authors characterize these cells and made sure that they are truly endothelial cells - for example examine specific endothelial markers, arterial-venous identity markers & Notch signalling status, overall morphology etc prior to the start of the experiment. How were ECFC isolated from human individuals, are these "healthy" volunteers - any underlying CVD risk factors, cells from one patient or from pooled samples, what injury where these humans exposed to trigger the release of the ECPFs into the circulation, etc. The materials & methods on ECFC should be expanded.

      Human umbilical cord blood-derived ECFCs were isolated at Indiana University School of Medicine and kindly provided by Dr Mervin Yoder. Cells were cultured as described by the Yoder group (Rapp et al., 2011) and our prior paper (Mason et al., 2019). We have expanded the materials and methods section to describe the source and characterization of these cells.

      The authors suggest that loss of YAP/TAZ phenocopies actinomycin-D inhibition - "both transcription inhibition and YAP/TAZ depletion impaired polarization, and induced robust ventral stress fiber formation and peripheral focal adhesion maturation". However, the cell size of actinomycin-D treated cells (Fig. 1B, top right panel), differs from the endothelial cell size upon siYAP/TAZ (Fig. 1E, top right panel) - and vinculin staining seems more pronounced in actinomycin-D treated cells (B, bottom right) when compared to siYAP/TAZ group. Cell shape is defined by acto-myosin tension.

      Q: Besides Fraction of focal adhesion >1um; focal adhesion number did the authors measure additional parameters related to cytoskeleton remodelling / focal adhesions that can substantiate their statement on similarity between loss of YAP/TAZ and actinomycin-D treatment. Would it be possible to make a more specific genetic intervention (besides YAP/TAZ) interfering with the focal adhesion pathway as opposed to the broad spectrum inhibitor actinomyocin-D.

      Our previous paper (Mason et al., 2019) delineated the mechanistic relationships between YAP/TAZ signaling, focal adhesion turnover, actomyosin polymerization, and the intervening mechanisms of myosin regulation. Specifically, we demonstrated that YAP/TAZ regulate the myosin phosphatase kinase, NUAK2, and ARHGAP genes to mediate this feedback. Expanding on this work, the current study aimed to define the temporal kinetics of the cytoskeletal mechanotransductive feedback in vitro and in vivo. We used actinomycin-D and YAP/TAZ depletion to interrogate the role of transcriptional regulation and YAP/TAZ signaling, respectively. In this revision, we have added RNA profiling that identifies early YAP/TAZ-regulated transcriptional changes and further points to other molecular mediators of focal adhesions (e.g. TRIO, RHOB, THBS1) that will be the subjects of future studies.    

      Q: Does the actinomycin-D treatment affect responsiveness to Vegf? induce apoptosis or reduce survival of the ECFC?

      We have not looked specifically at the effect of actinomycin-D treatment on responsiveness to VEGF. However, actinomycin-D has been reported to reduce transcription of VEGF receptors (E et al., 2012). In contrast, we found that YAP/TAZ depletion upregulated GO terms associated with endothelial cell migration and response to VEGF stimulus (Fig. 9B), as well as receptors to angiogenic growth factors, including KDR and FLT4 (Fig. 9E). These results suggest YAP/TAZ depleted cells may be more sensitive to VEGF stimulation but remain nonmotile due to cytoskeletal arrest.

      We showed previously that long-term treatment with actinomycin-D reduces ECFC survival (Mason et al., 2019).

      Q: Which mechanism links ECM stiffness with endothelial surface area in the authors scenario. In zebrafish, activity of endothelial guanine exchange factor Trio specifically at endothelial cell junctions (Klems, Nat Comms, 2020) and endoglin in response to hemodynamic factors (Siekmann, Nat Cell Biol 2017) have been show to control EC shape/surface area - do these factors play a role in the scenario proposed by the authors.

      Our new transcriptional profiling indicates both Trio and endoglin are regulated through YAP and TAZ in human ECFCs. We plan to follow up on these findings.

      Q: The authors report that EC migrate faster on stiff substrate, and concomitantly these cells have a larger surface area. What is the physiological rationale behind these observations. Did the authors observe such behaviors in their zebrafish ISV model? How do these observations integrate with the tip - stalk cell shuffling model (Jakobsson & Gerhardt, Nat Cell Biol, 2011) and Notch activity in developing ISVs.

      This question raises important distinctions between the mode of migration in ISV morphogenesis and endothelial cells adherent to substrates. Cells behave and respond to mechanical cues differently in 2D vs. 3D matrices. (LaValley and Reinhart-King, 2014) Additionally, the microenvironment in vivo is much more complex, combining numerous biochemical signals and changing mechanical properties. (Whisler et al., 2023) We are actively investigating the downstream targets of YAP/TAZ mechanotransduction and how that integrates with other pathways known to regulate vascular morphogenesis, such as Notch signaling. 

      The authors examined the formation of arterial intersegmental vessels in the trunk of developing zebrafish embryos in vivo. They used a variety of pharmacological inhibitors of transcription and acto-myosin remodelling and linked the observed morphological changes in ISV morphogenesis with changes in endothelial cell motility.

      Q: Reduced formation and dorsal extension of ISVs may have several reasons, including reduced EC migration and proliferation. The Tg(fl i1a:EGFP) reporter however is not the most suitable line to monitor migration of individual endothelial cells. Can the authors repeat the experiments in Tg(fl i1a:nEGFP); Tg(kdrl:HRAS-mCherry) double transgenics to visualize movement-migration of the individual endothelial cells and EC proliferation events, in the different treatment regimes.

      So far, we have not tracked individual endothelial cells during ISV morphogenesis. We agree this is the best approach and are pursuing a similar technique for these experiments.

      ISV formation is furthermore affected by Notch signalling status and a series of (repulsive) guidance cues.

      Q: Does de novo blockade of gene expression with Actinomycin D affect Notch signalling status, expression of PlexinD - sFlt1, netrin1 or arterial-venous identify genes.

      While we have not performed gene expression analysis under the Actinomycin D condition, Actinomycin D functions as a broad transcription inhibitor. We are currently pursuing the downstream targets of YAP/TAZ mechanotransduction in both ECFCs and zebrafish.

      Remark: The authors may want to consider using the Tg(fl i1:LIFEACT-GFP) reporter for in vivo imaging of actin remodelling events.

      We thank the reviewer for their helpful suggestion.

      Remark: the authors report "As with broad transcription inhibition, in situ depletion of YAP and TAZ by RNAi arrested cell motility, illustrated here by live-migration sparklines over 10 hours: siControl: , siYAP/TAZ: (25 μm scale-bar: -)". Can the authors make a separate figure panel for this, how many cells were measured?

      Please refer to our previous publication for the complete details on this data (Mason et al., 2019). We have added the citation in the text.

      Remark: in the wash-out experiments, exposure to the inhibitors is not the same in the different scenarios - could it be that the longer exposure time induces "toxic" side effect that cannot be "washed out" when compared to the short treatment regimes?

      This is a possible limitation of the pharmacological approach and have included it in the discussion section. We are currently exploring alternative approaches to interrogate the timescale of the feedback loop more precisely.  

      References

      Berlew EE, Kuznetsov IA, Yamada K, Bugaj LJ, Boerckel JD, Chow BY. 2021. Single-Component Optogenetic Tools for Inducible RhoA GTPase Signaling. Advanced Biology 5:2100810. doi:10.1002/adbi.202100810

      Dupont S, Morsut L, Aragona M, Enzo E, Giulitti S, Cordenonsi M, Zanconato F, Le Digabel J,Forcato M, Bicciato S, Elvassore N, Piccolo S. 2011. Role of YAP/TAZ in mechanotransduction. Nature 474:179–183. doi:10.1038/nature10137

      E G, Cao Y, Bhattacharya S, Dutta S, Wang E, Mukhopadhyay D. 2012. Endogenous Vascular Endothelial Growth Factor-A (VEGF-A) Maintains Endothelial Cell Homeostasis by Regulating VEGF Receptor-2 Transcription. J Biol Chem 287:3029–3041. doi:10.1074/jbc.M111.293985

      Ellertsdóttir E, Lenard A, Blum Y, Krudewig A, Herwig L, Affolter M, Belting H-G. 2010. Vascular morphogenesis in the zebrafish embryo. Developmental Biology, Special Section: Morphogenesis 341:56–65. doi:10.1016/j.ydbio.2009.10.035

      Franklin JM, Ghosh RP, Shi Q, Reddick MP, Liphardt JT. 2020. Concerted localization-resets precede YAP-dependent transcription. Nat Commun 11:4581. doi:10.1038/s41467-02018368-x

      Jones DL, Hallström GF, Jiang X, Locke RC, Evans MK, Bonnevie ED, Srikumar A, Leahy TP, Nijsure MP, Boerckel JD, Mauck RL, Dyment NA. 2023. Mechanoepigenetic regulation of extracellular matrix homeostasis via Yap and Taz. Proceedings of the National Academy of Sciences 120:e2211947120. doi:10.1073/pnas.2211947120

      LaValley DJ, Reinhart-King CA. 2014. Matrix stiffening in the formation of blood vessels. Advances in Regenerative Biology 1:25247. doi:10.3402/arb.v1.25247

      Lee H-W, Shin JH, Simons M. 2022. Flow goes forward and cells step backward: endothelial migration. Exp Mol Med 54:711–719. doi:10.1038/s12276-022-00785-1

      Lee H-W, Xu Y, He L, Choi W, Gonzalez D, Jin S-W, Simons M. 2021. Role of Venous Endothelial Cells in Developmental and Pathologic Angiogenesis. Circulation 144:1308–1322. doi:10.1161/CIRCULATIONAHA.121.054071

      Mason DE, Collins JM, Dawahare JH, Nguyen TD, Lin Y, Voytik-Harbin SL, Zorlutuna P, Yoder MC, Boerckel JD. 2019. YAP and TAZ limit cytoskeletal and focal adhesion maturation to enable persistent cell motility. Journal of Cell Biology 218:1369–1389. doi:10.1083/jcb.201806065

      Plouffe SW, Lin KC, Moore JL, Tan FE, Ma S, Ye Z, Qiu Y, Ren B, Guan K-L. 2018. The Hippo pathway effector proteins YAP and TAZ have both distinct and overlapping functions in the cell. J Biol Chem 293:11230–11240. doi:10.1074/jbc.RA118.002715

      Price CC, Mathur J, Boerckel JD, Pathak A, Shenoy VB. 2021. Dynamic self-reinforcement of gene expression determines acquisition of cellular mechanical memory. Biophysical Journal 120:5074–5089. doi:10.1016/j.bpj.2021.10.006

      Rapp BM, Saadatzedeh MR, Ofstein RH, Bhavsar JR, Tempel ZS, Moreno O, Morone P, Booth DA, Traktuev DO, Dalsing MC, Ingram DA, Yoder MC, March KL, Murphy MP. 2011. Resident Endothelial Progenitor Cells From Human Placenta Have Greater Vasculogenic Potential Than Circulating Endothelial Progenitor Cells From Umbilical Cord Blood. Cell Med 2:85–96. doi:10.3727/215517911X617888

      Tammela T, Zarkada G, Nurmi H, Jakobsson L, Heinolainen K, Tvorogov D, Zheng W, Franco CA, Murtomäki A, Aranda E, Miura N, Ylä-Herttuala S, Fruttiger M, Mäkinen T, Eichmann A, Pollard JW, Gerhardt H, Alitalo K. 2011. VEGFR-3 controls tip to stalk conversion at vessel fusion sites by reinforcing Notch signalling. Nat Cell Biol 13:1202–1213. doi:10.1038/ncb2331

      van der Stoel M, Schimmel L, Nawaz K, van Stalborch A-M, de Haan A, Klaus-Bergmann A, Valent ET, Koenis DS, van Nieuw Amerongen GP, de Vries CJ, de Waard V, Gloerich M, van Buul JD, Huveneers S. 2020. DLC1 is a direct target of activated YAP/TAZ that drives collective migration and sprouting angiogenesis. Journal of Cell Science 133:jcs239947. doi:10.1242/jcs.239947

      van Geemen D, Smeets MWJ, van Stalborch A-MD, Woerdeman LAE, Daemen MJAP, Hordijk PL, Huveneers S. 2014. F-Actin–Anchored Focal Adhesions Distinguish Endothelial Phenotypes of Human Arteries and Veins. Arteriosclerosis, Thrombosis, and Vascular Biology 34:2059–2067. doi:10.1161/ATVBAHA.114.304180

      Whisler J, Shahreza S, Schlegelmilch K, Ege N, Javanmardi Y, Malandrino A, Agrawal A, Fantin A, Serwinski B, Azizgolshani H, Park C, Shone V, Demuren OO, Del Rosario A, Butty VL, Holroyd N, Domart M-C, Hooper S, Szita N, Boyer LA, Walker-Samuel S, Djordjevic B, Sheridan GK, Collinson L, Calvo F, Ruhrberg C, Sahai E, Kamm R, Moeendarbary E. 2023. Emergent mechanical control of vascular morphogenesis. Science Advances 9:eadg9781. doi:10.1126/sciadv.adg9781

      Xu C, Hasan SS, Schmidt I, Rocha SF, Pitulescu ME, Bussmann J, Meyen D, Raz E, Adams RH, Siekmann AF. 2014. Arteries are formed by vein-derived endothelial tip cells. Nat Commun 5:5758. doi:10.1038/ncomms6758

    1. Colletotrichum fungi infect a wide diversity of monocot and eudicot hosts, causing plant diseases on almost all economically important crops worldwide. In addition to its economic impact, Colletotrichum is a suitable model for the study of gene family evolution on a fine scale to uncover events in the genome that are associated with the evolution of biological characters important for host interactions. Here we present the genome sequences of 30 Colletotrichum species, 18 of them newly sequenced, covering the taxonomic diversity within the genus. A time-calibrated tree revealed that the Colletotrichum ancestor diverged in the late Cretaceous around 70 million years ago (mya) in parallel with the diversification of flowering plants. We

      Reviewer 1: Jamie McGowan In this study, Baroncelli and colleagues carry out a comprehensive analysis of genomic evolution in Colletotrichum fungi, an important group of plant pathogens with diverse and economically significant hosts. Their comparative genomic and phylogenomics analyses are based on the genome sequences of 30 Colletotrichum species spanning the diversity of the genus, including pathogens of dicots, monocots, and both dicots and monocots. This includes 18 genome sequences that are newly reported in this study. They also perform comparative transcriptomic analyses of 4 Colletotrichum species (2 dicot pathogens and 2 monocot pathogens) on different carbon sources. Overall, I thought the manuscript was very well written and technically sound. The results should be of interest to a broad audience, particularly to those interested in fungal evolutionary genomics and plant pathology. I only have a few minor comments. Minor comments: (1) Lines 50 - 51: "The plant cell wall (PCW) consists of many different polysaccharides that are attached not only to each other through a variety of linkages providing the main strength and structure for the PCW". I found this confusing - is the sentence incomplete? (2) Line 66: "Some Colletotrichum species show…" I think there should be a couple of introductory sentences about Colletotrichum before this. (3) Figure 1: It would be informative to label which genomes were sequenced with PacBio versus just Illumina. (4) Lines 254 - 255: "As no other enrichment was identified we performed a manual annotation of genes identified in Figure 3D". I don't think it is clear here what manual annotation this is referring to. (5) One area where I felt the analysis was lacking was the lack of analyses on genome repeat content. The authors highlight the large variation in genome sizes within Colletotrichum species (~44 Mb vs ~90 Mb) and show in Figure 1 that this correlates with increased non-coding DNA. It would have been interesting to determine if this is driven by the proliferation of particular repeat families. (6) Another concern is the inconsistent use of genome annotation methods. 12 of the genomes reported in this study were annotated using the JGI annotation pipeline, whereas the other 6 were annotated using the MAKER pipeline. Several studies (e.g., Weisman et al., 2022 - Current Biology) show that inconsistent genome annotation methods can inflate the number of observed lineage specific genes. The authors may wish to comment on this or demonstrate that this isn't an issue in their study (e.g., by aligning lineage specific proteins against the other genome assemblies).

    1. Structural variants (SVs) play a significant role in speciation and adaptation in many species, yet few studies have explored the prevalence and impact of different categories of SVs. We conducted a comparative analysis of long-read assembled reference genomes of closely related Eucalyptus species to identify candidate SVs potentially influencing speciation and adaptation. Interspecies SVs can be either fixed differences, or polymorphic in one or both species. To describe SV patterns, we employed short-read whole-genome sequencing on over 600 individuals of E. melliodora and E. sideroxylon, along with recent high quality genome assemblies. We aligned reads and genotyped interspecies SVs predicted between species reference genomes. Our results revealed that 49,756 of 58,025 and 39,536 of 47,064 interspecies SVs could be typed with short reads, in E. melliodora and E. sideroxylon

      Reviewer 1: Jakob Butler Ferguson et al have performed a thorough analysis of two species of Eucalyptus, quantifying the extent of structural variation between assembled genomes of the species and determining how prevalent those variations are across a selection of wild material. I believe this study is of sufficient quality for publication in GigaScience, if some minor inconsistencies and grammatical issues are addressed, and a few supporting analyses are performed. The major changes I would like to see include the addition of a syri plot of the complete set of SVs between E. melliodora and E. sideroxylon. I believe this, along with correcting the scale on the plots of recombination in Figure S6/S7 would allow for a better comparison of how recombination rate is interacting with the SVs. I would also suggest a more formal test of enrichment for COG terms, to better support the statements of "enrichment" in the discussion. Suggested changes by line: Line 142 - This section is quite short, I would either merge this section into the Genome scaffolding (and annotation) section, or expand on the results of the gene annotation. Line 182 - (Supplementary Figure S4) Line 183 (and throughout) - Please be consistent with your references to tables and figures. Line 186 - delete comma after 28.63% Line 194 - These are density plots rather than histograms Figure 4 - Both axes are labelled as PC1 Line 217 (page 10, line numbers are doubled up) - This seems repetitive, perhaps "…especially as they may also represent divergent sequences". Line 221 (page 11) - Please insert "and" before polymorphic translocations Line 223 - You have stated that those not successfully genotyped in both species are private or artefacts earlier in the paragraph, please reduce the repetition. Figure 6 - I don't find this figure particularly informative (and somewhat confusing to interpret). I think showing the percentages of each different SV in a visual form implies a level of equivalence in genomic impact, which is difficult to reconcile with the raw difference in numbers. I think a supplemental table with the focus on the percentages would illustrate the point better. Line 246 - There is no mention in the methods about what r threshold was used to declare a pair "correlated", please state it here or in the methods. Line 265 - This line was confusing to interpret. A suggested alteration: "significant value. After attempting to functionally annotating all genes across the genome and placing them within COG categories, 247 of the total 281 gene candidates in SSPs were annotated. These genes were enriched for...." Line 266 - I would like to see a formal enrichment analysis rather than "increased/decreased association", so we could have a clearer picture of which gene functions are truly over/underrepresented in SSPs. You could subsequently limit Figure 8 to those that show a difference. Line 275 - The grammar of this title is a bit off, perhaps "Effect of syntenic, rearranged, unaligned regions and genes on recombination rates" Line 276 - This is the first mention of p, please define it as recombination rate Line 283 - The supplemental Figure S6 and S7 seem to have regions of heightened recombination, but this is difficult to interpret and compare with the current variable axis scales. Please make these consistent. I would also like to see the syri graph of the two aligned genomes, as this would allow for a visual comparison of SV regions with recombination rate. Line 290 - How were p-values adjusted? Line 294 - More information about this 'significantly' higher recombination rate would be good, either in the figure or further expanded in the text. Line 307 - Italics for species names (repeated in Figure 10 and Figure 11 caption) Line 310 - Similar problem to line 275 Figure 10 - Having Figure 9b repeated in Figure 10 and Figure 11 is unnecessary. Line 336 - Vertical lines show average FST, not p Line 341 - Similar problem to line 275 Line 356 - translocations should be plural Line 367 - Vertical lines show average SNP density, not p Line 391 - This is the first mention of barrier loci, please define Line 413 - As mentioned above, I would recommend a formal enrichment test to support this statement Line 428 - Grammar is poor here, please correct Line 490 - Please make this a complete sentence Line 499 - Please state how the Hi-C map was manually edited, and what informed the position of those edits. Line 508 - Please provide an example of how well your LAI score of ~18 compares. The LAI paper seems to intimate that 10 is low quality? Line 513 - Missing bracket for version number Line 536 - Syntenic rather than synteny Line 717 - Formatting error in references Supp table S3-S4-S5 - Space between E. and sideroxylon

  2. Jun 2024
    1. However, by examining the bacteriome in detail, we can obtain much more information about its composition and function than diversity alone can tell us. Based on the taxonomic constitution of our samples, Proteobacteria and Actinobacteria phyla were clearly dominant both in fish skin mucus and water samples. The dominance of the Proteobacteria phylum is not an uncommon observation in fish external mucus samples1,3,5,6,8,11,21,62,63, however, differences between fish species have been observed for the other phyla1,11,62,63. Moreover, significant within-species variability in dominant phyla has been described64, and variability within individuals related to body sites should be noted12.The microbiome can be an important indicator of various pathological conditions, which has already been described in fish, for example, in the case of the gastrointestinal tract65. In this regard, the Bacteroidota phylum may be interesting, which has been highlighted as a marker for eutrophication9,66. Understanding the changes in the composition of the bacteriome or even the microbiome during different pathological conditions can be an important step in understanding and potentially diagnosing disease processes.Our results are therefore in line with the dominance of the Proteobacteria phylum observed in other fish species, but direct comparison with C. carpio is not possible due to the lack of available data. Of course, our observations on the bacteriome composition of our samples are also limited by their paramount host genome contamination, which reduced the coverage of bacterial genomes of interest in the sequencing reaction.

      Since you have the resolution to go below phylum, I think it would be interesting to focus on that more in the discussion.

    1. 17) The just man is the freest of anyone from anxiety; but the unjust man is perpetually haunted by it.

      I found this passage disturbing and I do not necessarily agree with it. I think that because we have two different people with two different moral compasses, their views on the world are polar and there is a struggle in comparing them. This "unjust" person has an opposite view of anxiety, punishment, power, fear, etc... because they are "morally wrong," and may not experience the same emotional spectrum as a person who always does the right thing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study presents a valuable tool for searching molecular dynamics simulation data, making such data sets accessible for open science. The authors provide convincing evidence that it is possible to identify useful molecular dynamics simulation data sets and their analysis can produce valuable information.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Tiemann et al. have undertaken an original study on the availability of molecular dynamics (MD) simulation datasets across the Internet. There is a widespread belief that extensive, well-curated MD datasets would enable the development of novel classes of AI models for structural biology. However, currently, there is no standard for sharing MD datasets. As generating MD datasets is energy-intensive, it is also important to facilitate the reuse of MD datasets to minimize energy consumption. Developing a universally accepted standard for depositing and curating MD datasets is a huge undertaking. The study by Tiemann et al. will be very valuable in informing policy developments toward this goal.

      Strengths:

      The study presents an original approach to addressing a growing concern in the field. It is clear that adopting a more collaborative approach could significantly enhance the impact of MD simulations in modern molecular sciences.

      The timing of the work is appropriate, given the current interest in developing AI models for describing biomolecular dynamics.

      Weaknesses:

      The study primarily focuses on one major MD engine (GROMACS), although this limitation is not significant considering the proof-of-concept nature of the study.

      We thank the reviewer for his/her comments. Moving forward, our plan includes expanding this research to encompass other MD engines used in biomolecular simulations and materials sciences, such as NAMD, Charmm, Amber, LAMMPS, etc. However, this requires parsing associated files to supplement the sparse metadata generally available for the related datasets

      Reviewer #2 (Public Review):

      Summary:

      Molecular dynamics (MD) data is deposited in public, non-specialist repositories. This work starts from the premise that these data are a valuable resource as they could be used by other researchers to extract additional insights from these simulations; it could also potentially be used as training data for ML/AI approaches. The problem is that mining these data is difficult because they are not easy to find and work with. The primary goal of the authors was to discover and index these difficult-to-find MD datasets, which they call the "dark matter of the MD universe" (in contrast to data sets held in specialist databases).

      The authors developed a search strategy that avoided the use of ill-defined metadata but instead relied on the knowledge of the restricted set of file formats used in MD simulations as a true marker for the data they were looking for. Detection of MD data marked a data set as relevant with a follow-up indexing strategy of all associated content. This "explore-and-expand" strategy allowed the authors for the first time to provide a realistic census of the MD data in non-specialist repositories.

      As a proof of principle, they analyzed a subset of the data (primarily related to simulations with the popular Gromacs MD package) to summarize the types of simulated systems (primarily biomolecular systems) and commonly used simulation settings.

      Based on their experience they propose best practices for metadata provision to make MD data FAIR (findable, accessible, interoperable, reusable).

      A prototype search engine that works on the indexed datasets is made publicly available. All data and code are made freely available as open source/open data.

      Strengths:

      The novel search strategy is based on relevant data to identify full datasets instead of relying on metadata and thus is likely to have many true positives and few false positives.

      The paper provides a first glimpse at the potential hidden treasures of MD simulations and force field parametrizations of molecules.

      Analysis of parameter settings of MD simulations from how researchers *actually* run simulations can provide valuable feedback to MD code developers for how to document/educate users. This approach is much better than analyzing what authors write in the Methods sections.

      The authors make a prototype search engine available.

      The guidelines for FAIR MD data are based on experience gained from trying to make sense of the data.

      Weaknesses:

      So far the work is a proof-of-concept that focuses on MD data produced by Gromacs (which was prevalent under all indexed and identified packages).

      As discussed in the manuscript, some types of biomolecules are likely underrepresented because different communities have different preferences for force fields/MD codes (for example: carbohydrates with AMBER/GLYCAM using AMBER MD instead of Gromacs).

      Materials sciences seem to be severely under-represented --- commonly used codes in this area such as LAMMPS are not even detected, and only very few examples could be identified. As it is, the paper primarily provides an insight into the *biomolecular* MD simulation world.

      The authors succeed in providing a first realistic view on what MD data is available in public repositories. In particular, their explore-expand approach has the potential to be customized for all kinds of specialist simulation data, whereby specific artifacts are used as fiducial markers instead of metadata. The more detailed analysis is limited to Gromacs simulations and primarily biomolecular simulations (even though MD is also widely used in other fields such as the materials sciences). This restricted view may simply be correlated with the user community of Gromacs and hopefully, follow-up studies from this work will shed more light on this shortcoming.

      The study quantified the number of trajectories currently held in structured databases as ~10k vs ~30k in generalist repositories. To go beyond the proof-of-principle analysis it would be interesting to analyze the data in specialist repositories in the same way as the one in the generalist ones, especially as there are now efforts underway to create a database for MD simulations (Grant 'Molecular dynamics simulation for biology and chemistry research' to establish MDDB' DOI 10.3030/101094651). One should note that structured databases do not invalidate the approach pioneered in this work; if anything they are orthogonal to each other and both will likely play an important role in growing the usefulness of MD simulations in the future.

      We thank the reviewer for his/her comments. As mentioned to Reviewer 1, we intend to extend this work to other MD engines in the near future to go beyond Gromacs and even biomolecular simulations. Furthermore, as the value of accessing and indexing specialized MD databases such as MDDB, MemprotMD, GPCRmd, NMRLipids, ATLAS, and others has been mentioned by the reviewer, it is indeed one of our next steps to continue to expand the MDverse catalog of MD data. This indexing may also extend the visibility and widespreaded adoptability of these specific databases.

      Reviewer #3 (Public Review):

      Molecular dynamics (MD) simulations nowadays are an essential element of structural biology investigations, complementing experiments and aiding their interpretation by revealing transient processes or details (such as the effects of glycosylation on the SARS-CoV-2 spike protein, for example (Casalino et al. ACS Cent. Sci. 2020; 6, 10, 1722-1734 https://doi.org/10.1021/acscentsci.0c01056) that cannot be observed directly. MD simulations can allow for the calculation of thermodynamic, kinetic, and other properties and the prediction of biological or chemical activity. MD simulations can now serve as "computational assays" (Huggins et al. WIREs Comput Mol Sci. 2019; 9:e1393.

      https://doi.org/10.1002/wcms.1393). Conceptually, MD simulations have played a crucial role in developing the understanding that the dynamics and conformational behaviour of biological macromolecules are essential to their function, and are shaped by evolution. Atomistic simulations range up to the billion atom scale with exascale resources (e.g. simulations of SARS-CoV-2 in a respiratory aerosol. Dommer et al. The International Journal of High Performance Computing Applications. 2023; 37:28-44. doi:10.1177/10943420221128233), while coarse-grained models allow simulations on even larger length- and timescales. Simulations with combined quantum mechanics/molecular mechanics (QM/MM) methods can investigate biochemical reactivity, and overcome limitations of empirical forcefields (Cui et al. J. Phys. Chem. B 2021; 125, 689 https://doi.org/10.1021/acs.jpcb.0c09898).

      MD simulations generate large amounts of data (e.g. structures along the MD trajectory) and increasingly, e.g. because of funder mandates for open science, these data are deposited in publicly accessible repositories. There is real potential to learn from these data en masse, not only to understand biomolecular dynamics but also to explore methodological issues. Deposition of data is haphazard and lags far behind experimental structural biology, however, and it is also hard to answer the apparently simple question of "what is out there?". This is the question that Tiemann et al explore in this nice and important work, focusing on simulations run with the widely used GROMACS package. They develop a search strategy and identify almost 2,000 datasets from Zenodo, Figshare and Open Science Framework. This provides a very useful resource. For these datasets, they analyse features of the simulations (e.g. atomistic or coarse-grained), which provides a useful snapshot of current simulation approaches. The analysis is presented clearly and discussed insightfully. They also present a search engine to explore MD data, the MDverse data explorer, which promises to be a very useful tool.

      As the authors state: "Eventually, front-end solutions such as the MDverse data explorer tool can evolve being more user-friendly by interfacing the structures and dynamics with interactive 3D molecular viewers". This will make MD simulations accessible to non-specialists and researchers in other areas. I would envisage that this will also include approaches using interactive virtual reality for an immersive exploration of structure and dynamics, and virtual collaboration (e.g. O'Connor et al., Sci. Adv.4, eaat2731 (2018). DOI:10.1126/sciadv.aat2731)

      The need to share data effectively, and to compare simulations and test models, was illustrated clearly in the COVID-19 pandemic, which also demonstrated a willingness and commitment to data sharing across the international community (e.g. Amaro and Mulholland, J. Chem. Inf. Model. 2020, 60, 6, 2653-2656 https://doi.org/10.1021/acs.jcim.0c00319; Computing in Science & Engineering 2020, 22, 30-36 doi: 10.1109/MCSE.2020.3024155). There are important lessons to learn here, for simulations to be reproducible and reliable, for rapid testing, for exploiting data with machine learning, and for linking to data from other approaches. Tiemann et al. discuss how to develop these links, providing good perspectives and suggestions.

      I agree completely with the statement of the authors that "Even if MD data represents only 1 % of the total volume of data stored in Zenodo, we believe it is our responsibility, as a community, to develop a better sharing and reuse of MD simulation files - and it will neither have to be particularly cumbersome nor expensive. To this end, we are proposing two solutions. First, improve practices for sharing and depositing MD data in data repositories. Second, improve the FAIRness of already available MD data notably by improving the quality of the current metadata."

      This nicely states the challenge to the biomolecular simulation community. There is a clear need for standards for MD data and associated metadata. This will also help with the development of standards of best practice in simulations. The authors provide useful and detailed recommendations for MD metadata. These recommendations should contribute to discussions on the development of standards by researchers, funders, and publishers. Community organizations (such as CCP-BioSim and HECBioSim in the UK, BioExcel, CECAM, MolSSI, learned societies etc) have an important part to play in these developments, which are vital for the future of biomolecular simulation.

      We thank the reviewer for his/her comments. Beyond the points mentioned to Reviewers 1 and 2, as the reviewer suggested, it would be of great interest to combine innovative and immersive approaches to visualize and possibly interact with the data collected. This is indeed more and more amenable thanks to technologies such as WebGL and programs such as Mol*, or even - as also pointed out by the reviewer - through virtual reality, for example with the mentioned Narupa framework or with the UnityMol software. For a comprehensive review on MD trajectory visualization and associated challenges, we refer to our recent review article https://doi.org/10.3389/fbinf.2024.1356659.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Some minor text editing would improve the readability of the manuscript.

      It would be very useful if the authors could share their perspectives on the best and most efficient approach to sharing datasets and code associated with a publication. My concern lies in the fact that Github, which is currently the dominant platform for sharing code, is not well-suited for hosting large MD datasets. As a result, researchers often need to adopt a workflow where code is shared on Github and datasets are stored elsewhere (e.g., Zenodo). While this is feasible, it adds extra work. Ideally, a transparent process could be developed to seamlessly share code and datasets linked to a study through a unified interface.

      We thank the reviewer for this excellent suggestion. To our knowledge, there is yet no easy framework to jointly store and share code and data, linked to their scientific publication. Of course, code can be submitted to “generic” databases along with the data, but at the current state, those do not provide such useful features like collaborative work & track recording as done to the extent of GitHub.

      Although GitHub is indeed a suitable platform to deposit code, we strongly advise researchers to archive their code in Software Heritage. In addition to preserving source code, Software Heritage provides a unique identifier called SWHID that unambiguously makes reference to a specific version of the source code.

      So far, it is the responsibility of the scientific publication authors to link datasets and source codes (whether in GitHub or Software Heritage) in their paper, but also to make the reverse link from the data and code sharing platforms to the paper after publication.

      As mentioned by the reviewer, a unified interface that could ease this process would significantly contribute to FAIR-ness in MD.

      Reviewer #2 (Recommendations For The Authors):

      L180: I am not aware that TRR files contain energy terms as stated here, my understanding was that EDR files primarily served that purpose.

      “…available in one dataset. Interestingly, we found 1,406 .trr files, Which contain trajectory but also additional information such as velocities, energy of the system, etc’ While the file is especially useful in terms of reusability, the large size (can go up to several 100GB) limits its deposition in most…”

      Indeed, our formulation was ambiguous. The EDR files contain the detailed information on energies, whereas TRR files contain numerous values from the trajectory such as coordinates, velocities, forces and to some extent also energies

      (https://manual.gromacs.org/current/reference-manual/file-formats.html#trr)

      L207: The text states that the total time was not available from XTC files, only the number of frames. However, XTC files record time stamps in addition to frame numbers. As long as these times are in the Gromacs standard of picoseconds, the simulation time ought to be available from XTCs.

      “…systems and the number of frames available in the files (Fig. 3-B). Of note, the frames do not directly translate to the simulation runtime - more information deposited in other files (e.g. .mdp files) is needed to determine the complete runtime of the simulation. The system was up…”.

      Thank you for the useful comment, we removed this sentence. We now mention that studying the simulation time would be of interest in the future, especially when we will perform an exhaustive analysis of XTC files.

      “Of note, as .xtc files also contain time stamps, it would be interesting to study the relationship between the time and the number of frames to get useful information about the sampling. Nevertheless, this analysis would be possible only for unbiased MD simulations. So, we would need to decipher if the .xtc file is coming from biased or unbiased simulations, which may not be trivial.”

      Analysis of MDP files: Were these standard equilibrium MD or can you distinguish biased MD or free energy calculations?

      Currently we do not distinguish between biased and unbiased MD, but in the future we may attempt to do so, e.g. by correlating it with standard equilibration force-fields/parameters, timesteps or similar. Nevertheless, a true distinction will remain challenging.

      L336: typo: pikes -> spikes (or peaks?)

      “…simulations of Lennard-Jones models (Jeon et al., 2016). Interestingly, we noticed the appearance of several pikes at 400K, 600K and 800K, which were not present before the end of the year 2022. These peaks correspond to the same study related to the stability of hydrated crystals (Dybeck et al., 2023)’ Overall, thhis analysis revealed that a wide range of temperatures have been explored,…”

      Thank you. We have corrected this typo.

      Make clear how multiple versions of data sets are handled, e.g., if v1, v2, and v3 of a dataset are provided in Zenodo then which one is counted or are all counted?

      We collected the latest version only of datasets, as exposed by default by the Zenodo API. To reflect this, we added the following sentence to the Methods and Materials section, Initial data collection sub-section:

      “By default, the last version of the datasets was collected.”

      L248 Analysis of GRO files seems fairly narrow because PDB files are very often used for exactly the same purpose, even in the context of Gromacs simulations, not the least because it is familiar to structural biologists that may be interested in representative MD snapshots. Despite all the shortcomings of abusing the PDB format for MD, it is an attempt at increased interoperability. Perhaps the authors can make sure that readers understand that choosing GRO for analysis may give a somewhat skewed picture, even within Gromacs simulations.

      Thanks for this comment. We collected about 12,000 PDB files that could indeed be output from Gromacs simulations and easily be shared due to the universality of this format, but that could as well come from different sources (like other MD packages or the PDB database itself). We purposely decided to limit our study to files strictly associated with the Gromacs package, like MDP and XTC file types. However, we will extend our survey to all other structure-like formats and especially the PDB file type. We reflected this purpose in the following sentence (after line 281)

      “Beyond .gro files, we would like to analyze the ensemble of the ~12,000 .pdb files extracted in this study (see Figure 2-B) to better characterize the types of molecular structures deposited.”

      A simple template metadata file would be welcome (e.g., served from a GitHub/GitLab repository so that it can be improved with community input).

      Thank you for this suggestion that we fundamentally agree with. However, the generation of such a file is a major task, and we believe that the creation of a metadata file template requires far-reaching considerations, therefore is beyond the scope of this paper and should not be decided by a small group of researchers. Indeed, this topic requires a large consensus of different stakeholders, from users, to MD program developers, and journal editors. It would be especially useful to organize dedicated workshops with representatives of all these communities to tackle this specific issue, as mentioned by Reviewer3 in his/her public review. As a basis for this discussion, we humbly proposed at the end of this manuscript a few non-constraining guidelines based on our experience retrieving the data.

      To emphasize this statement, we added the following sentence at the end of the “Guidelines for better sharing of MD simulation data” section (line 420):

      “Converging on a set of metadata and format requires a large consensus of different stakeholders from users, to MD program developers, and journal editors. It would be especially useful to organize specific workshops with representatives of all these communities to collectively tackle this specific issue.”

      In "Data and code availability" it would be good to specify licenses in addition to stating "open source". Thank you for pointing out that GitLab/GitHub are not archives and that everyone should be strongly encouraged to submit data to stable archival repositories.

      We added the corresponding licenses for code and data in the “Data and code availability” section.

      Reviewer #3 (Recommendations For The Authors)

      The paper is well written, with very few typographical or other minor errors.

      Minor points:

      Line 468-9 "can evolve being more user-friendly" should be "can evolve to being more user-friendly", I think.

      Thank you, we have changed the wording accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study reports on the packing of molecules in cellular compartments, such as actin-based protrusions. The study provides solid evidence for parameters that enable the building of a biophysical model of filopodia, which is required to gain a complete understanding of these important actin-based structures. Some areas of the manuscript require further clarification.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript proposes an alternative method by SDS-PAGE calibration of Halo-Myo10 signals to quantify myosin molecules at specific subcellular locations, in this specific case filopodia, in epifluorescence datasets compared to the more laborious and troublesome single molecule approaches. Based on these preliminary estimates, the authors developed further their analysis and discussed different scenarios regarding myosin 10 working models to explain intracellular diffusion and targeting to filopodia.

      Strengths:

      Overall, the paper is elegantly written and the data analysis is appropriately presented.

      Weaknesses:

      While the methodology is intriguing in its descriptive potential and could be the beginning of an interesting story, a good portion of the paper is dedicated to the discussion of hypothetical working mechanisms to explain myosin diffusion, localization, and decoration of filopodial actin that is not accompanied by the mandatory gain/loss of function studies required to sustain these claims.

      To be fair, the detailed mechanisms that we raise related to diffusion, localization, and decoration are based on extensive work by others. Many prior papers use domain deletions of Myo10 and fall in the category of gain/loss-of-function studies. It is true that we have not repeated those extensive studies, but it seems appropriate to connect with and cite their work where appropriate.

      Reviewer #2 (Public Review):

      Summary:

      The paper sought to determine the number of myosin 10 molecules per cell and localized to filopodia, where they are known to be involved in formation, transport within, and dynamics of these important actin-based protrusions. The authors used a novel method to determine the number of molecules per cell. First, they expressed HALO tagged Myo10 in U20S cells and generated cell lysates of a certain number of cells and detected Myo10 after SDS-PAGE, with fluorescence and a stained free method. They used a purified HALO tagged standard protein to generate a standard curve which allowed for determining Myo10 concentration in cell lysates and thus an estimate of the number of Myo10 molecules per cell. They also examined the fluorescence intensity in fixed cell images to determine the average fluorescence intensity per Myo10 molecule, which allowed the number of Myo10 molecules per region of the cell to be determined. They found a relatively small fraction of Myo10 (6%) localizes to filopodia. There are hundreds of Myo10 in each filopodia, which suggests some filopodia have more Myo10 than actin binding sites. Thus, there may be crowding of Myo10 at the tips, which could impact transport, the morphology at the tips, and dynamics of the protrusions themselves. Overall, the study forms the basis for a novel technique to estimate the number of molecules per cell and their localization to actin-based structures. The implications are broad also for being able to understand the role of myosins in actin protrusions, which is important for cancer metastasis and wound healing.

      Strengths:

      The paper addresses an important fundamental biological question about how many molecular motors are localized to a specific cellular compartment and how that may relate to other aspects of the compartment such as the actin cytoskeleton and the membrane. The paper demonstrates a method of estimating the number of myosin molecules per cell using the fluorescently labeled HALO tag and SDS-PAGE analysis. There are several important conclusions from this work in that it estimates the number of Myo10 molecules localized to different regions of the filopodia and the minimum number required for filopodia formation. The authors also establish a correlation between number of Myo10 molecules filopodia localized and the number of filopodia in the cell. There is only a small % of Myo10 that tip localized relative to the total amount in the cell, suggesting Myo10 have to be activated to enter the filopodia compartment. The localization of Myo10 is log-normal, which suggest a clustering of Myo10 is a feature of this motor.

      Weaknesses:

      One main critique of this work is that the Myo10 was overexpressed. Thus, the amount in the cell body compared to the filopodia is difficult to compare to physiological conditions. The amount in the filopodia was relatively small - 100s of molecules per filopodia so this result is still interesting regardless of the overexpression. However, the overexpression should be addressed in the limitations.

      This is a reasonable perspective and we now note this caveat in the Limitations section so that readers will take note. Our goal here was to understand a system in which Myo10 is the limiting reagent for filopodia, rather than a native system that expresses high Myo10 on its own. Because U2OS cells do not express detectable levels of Myo10 (see below), the natural perturbation here is overexpressing Myo10 to stimulate filopodial growth.

      The authors have not addressed the potential for variability in transfection efficiency. The authors could examine the average fluorescence intensity per cell and if similar this may address this concern.

      Indeed, cells are heterogenous and will naturally express different levels of Myo10 not only due to transfection efficiency, but also due to their state (cell cycle stage, motile behavior, and more). In fact, we measure the transfection efficiency of each bioreplicate and account for it in our calibration procedure. We also measure the fluorescence intensity per cell, which lets us calculate the total Myo10s per cell and the cell-to-cell variability. These Myo10 distributions across cells are shown in Fig. 1D-E.

      We note here an error that we made in applying this transfection efficiency correction in the first submission. When we obtain the total Myo10 molecules by SDS-PAGE, we should divide by the total number of transfected cells. However, due to an operator precedence error, the transfection efficiency appeared in the numerator rather than the denominator. We have now corrected this error, which has the effect of increasing the number of molecules in all of our measurements. The effect of this correction has strengthened one of the paper’s main conclusions, that Myo10 is frequently overloaded at filopodial tips.

      The SDS PAGE method of estimating the number of molecules is quite interesting. I really like this idea. However, I feel there are a few more things to consider. The fraction of HALO tag standard and Myo10 labeled with the HALO tagged ligand is not determined directly. It is suggested that since excess HALO tagged ligand was added we can assume nearly 100% labeling. If the HALO tag standard protein is purified it should be feasible to determine the fraction of HALO tagged standard that is labeled by examining the absorbance of the protein at 280 and fluorophore at its appropriate wavelength.

      This is a fair point raised by the reviewer, and we have now measured a labeling efficiency of 90% in Supplementary Figure 2A-C. We have adjusted all values according to this labeling efficiency.

      The fraction of HALO tagged Myo10 labeled may be more challenging to determine, since it is in a cell lysate, but there may be some potential approaches (e.g. mass spec, HPLC).

      As noted, this value is considerably more challenging. Instead, we determined conditions under which labeling in cells is saturated. We have now stained with a concentration range for both fixed and live cell samples. Saturation occurs with ~0.5 μM HaloTag ligand-TMR in fixed/permeabilized cells and in live cells (Supplementary Figure 2D-E). This comparison of live cells vs. permeabilized cells allows us to say that the intact plasma membrane is not limiting labeling under these conditions.

      In Figure 1B, the stain free gel bands look relatively clean. The Myo10 is from cell lysates so it is surprising that there are not more bands. I am not surprised that the bands in the TMR fluorescence gel are clean, and I agree the fluorescence is the best way to quantitate.

      Figure 1B shows the focused view at high MW, and there is not much above Myo10. The full gel lanes shown in Supp. Fig. 1C show the expected number of bands from a cell lysate.

      In Figure 3C, the number of Myo10 molecules needed to initiate a filopodium was estimated. I wonder if the authors could have looked at live cell movies to determine that these events started with a puncta of Myo10 at the edge of the cell, and then went on to form a filopodia that elongated from the cell. How was the number of Myo10 molecules that were involved in the initiation determined? Please clarify the assumptions in making this conclusion.

      We thank the reviewer (and the other reviewers) for this excellent suggestion. We have now carried out these live cell experiments. These experiments were quite challenging, because we needed to collect snapshots of ~50 cells to measure the mean fluorescence intensity of transfected cells and then acquire movies of several cells for analysis. The U2OS cells were also highly temperature-sensitive and would retract their filopodia without objective heating.

      We have now analyzed filopodial initiation events and measured considerably more Myo10 at the first signs of accumulation– in the 100s of molecules. The dimmer spots that we measured in the first draft were likely unrelated to filopodial initiation, and we have corrected the discussion on this point.

      We now also track further growth from a stable filopodial tip (the phased-elongation mechanism from Ikebe and coworkers) and find approximately 500 molecules bud off in those events. We also track filopodial elongation rates as a function of Myo10 numbers. We have added additional live cell imaging sections that include these results.

      It is stated in the discussion that the amount of Myo10 in the filopodia exceeds the number of actin binding sites. However, since Myo10 contains membrane binding motifs and has been shown to interact with the membrane it should be pointed that the excess Myo10 at the tips may be interacting with the membrane and not actin, which may prevent traffic jams.

      This is also an excellent point to consider, and we have expanded the relevant discussion along these lines. We agree that the Myo10 at the filopodial tip is likely membrane-bound. We now estimate the 2D membrane area occupied by Myo10, and find that it reaches nearly full packing in many cases (under a number of assumptions that we spell out more fully in the manuscript).

      Reviewer #3 (Public Review):

      Summary:

      The unconventional myosin Myo10 (aka myosin X) is essential for filopodia formation in a number of mammalian cells. There is a good deal of interest in its role in filopodia formation and function. The manuscript describes a careful, quantitative analysis of Myo10 molecules in U2OS cells, a widely used model for studying filopodia, how many are present in the cytosol versus filopodia and the distribution of filopodia and molecules along the cell edge. Rigorous quantification of Myo10 protein amounts in a cell and cellular compartment are critical for ultimately deciphering the cellular mechanism of Myo10 action as well as understand the molecular composition of a Myo10-generated filopodium.

      Consistent with what is seen in images of Myo10 localization in many papers, the vast majority of Myo10 is in the cell body with only a small percentage (appr 5%) present in filopodia puncta. Interestingly, Myo10 is not uniformly distributed along the cell edge, but rather it is unevenly localized along the cell edge with one region preferentially extending filopodia, presumably via localized activation of Myo10 motors. Calculation of total molecules present in puncta based on measurement of puncta size and measured Halo-Myo10 signal intensity shows that the concentration of motor present can vary from 3 - 225 uM. Based on an estimation of available actin binding sites, it is possible that Myo10 can be present in excess over these binding sites.

      Strengths:

      The work represents an important first step towards defining the molecular stoichiometry of filopodial tip proteins. The observed range of Myo10 molecules at the tip suggests that it can accommodate a fairly wide range of Myo10 motors. There is great value in studies such as this and the approach taken by the authors gives one good confidence that the numbers obtained are in the right range.

      Weaknesses:

      One caveat (see below) is that these numbers are obtained for overexpressing cells and the relevance to native levels of Myo10 in a cell is unclear.

      A similar concern was raised by Reviewer 2; please see above.

      An interesting aspect of the work is quantification of the fraction of Myo10 molecules in the cytosol versus in filopodia tips showing that the vast majority of motors are inactive in the cytosol, as is seen in images of cells. This has implications for thinking about how cells maintain this large population in the off-state and what is the mechanism of motor activation. One question raised by this work is the distinction between cytosolic Myo10 and the population found at the ‘cell edge’ and the filopodia tip. The cortical population of Myo10 is partially activated, so to speak, as it is targeted to the cortex/membrane and presumably ready to go. Providing quantification of this population of motors, that one might think of as being in a waiting room, could provide additional insight into a potential step-by-step pathway where recruitment or binding to the cortical region/plasma membrane is not by itself sufficient for activation.

      As mentioned in our response to Reviewer 2, we have now carried out quantitation in live cells to capture Myo10 transitions from cell body into filopodial movement. We attempted to identify this membrane-bound population of motors in our new live cell experiments but were unable to make convincing measurements. Notably, we see no noticeable enrichment of Myo10 at the cortex relative to the cytosol. Although we believe there is a membrane-bound waiting room (akin to the 3D-2D-1D mechanism of Molloy and Peckham), we suspect that the 2D population is diffusing too rapidly to be detected under our imaging conditions.

      Specific comments:

      (1) It is not obvious whether the analysis of numbers of Myo10 molecules in a cell that is ectopically overexpressing Myo10 is relevant for wild type cells. It would appear to be a significant excess based on the total protein stained blot shown in Fig S1E where a prominent band the size of tagged Myo10 seen in the transfected sample is almost absent in the WT control lane.

      Even “wildtype” cells vary considerably in their Myo10 expression levels. For example, melanoma cells often heavily upregulate Myo10, while these U2OS cells produce nearly none (Supplementary Figure 1E). Thus, there is no single, widely acceptable target for Myo10 expression in wildtype cells.

      Please note that the new Supplementary Figure 1E is a Myo10 Western blot, not total protein staining as before.

      Ideally, and ultimately an important approach, would be to work with a cell line expressing endogenously tagged Myo10 via genome engineering. This can be complicated in transformed cells that often have chromosomal duplications.

      Indeed, we chose U2OS cells for this work because they do not express detectable levels of Myo10, and thus we can avoid all of these complications. Here we can examine how Myo10 levels control filopodial production through ectopic expression.

      However, even though there is an excess of Myo10 it would appear that activation is still under some type of control as the cytosolic pool is quite large and its localization to the cell edge is not uniform. But it is difficult to gauge whether the number of molecules in the filopodium is the same as would be seen in untransfected cells. Myo10 can readily walk up a filopodium and if excess numbers of this motor are activated they would accumulate in the tip in large numbers, possibly creating a bulge as and indeed it does appear that some tips are unusually large. Then how would that relate to the normal condition?

      As noted above, the normal condition depends on the cellular system. However, endogenous Myo10 also accumulates in bulges at filopodial tips, so this is not a phenotype unique to Myo10 overexpression. For example, the images from Figure 1 of the Berg and Cheney (2002) citation show bulges from endogenous Myo10 in endothelial cells.

      (2) Measurements of the localization of Myo10 focuses in large part on ‘Myo10 punctae’. While it seems reasonable to presume that these are filopodia tips, the authors should provide readers with a clear definition of a puncta. Is it only filopodia tips, which seems to be the case? Does it include initiation sites at the cell membrane that often appear as punctae?

      We define puncta as any clusters/spots of Myo10 signal detected by segmentation, not limited to any location within the surface-attached filopodia. We exclude puncta that appear in the cell interior (~5 of which appear in Fig. 1A). These are likely dorsal filopodia, but there are few of these compared to the surface attached filopodia of U2OS cells. In Figure 2, “puncta” includes all Myo10 clusters along the filopodia shaft, though a majority happen to be tip-localized (please see Supplementary Figure 4B). We have edited the main text for clarification.

      Along those lines, the position of dim punctae along the length of a filopodium is measured (Fig 3D). The findings suggest that a given filopodium can have more than one puncta which seems at odds if a puncta is a filopodia tip. How frequently is a filopodium with two puncta seen? It would be helpful if the authors provided an example image showing the dim puncta that are not present at the tip.

      We have now provided an example image of dim puncta along filopodia in Supplementary Figure 4C.

      (3) The concentration of actin available to Myo10 is calculated based on the deduction from Nagy et al (2010) that only 4/13 of the actin monomers in a helical turn are accessible to the Myo10 motor (discussion on pg 9; Fig S4). Subsequent work (Ropars et al, 2016) has shown that the heads of the antiparallel Myo10 dimer are flattened, but the neck is rather flexible, meaning that the motor can a variable reach (36 - 52 nm). Wouldn’t this mean that more actin could be accessible to the Myo10 motor than is calculated here?

      Although we see why the reviewer might believe otherwise, the 4/13 fraction of accessible actin holds. This fraction is obtained from consideration of the fascin-actin bundle structure alone, independent of the reach of any particular myosin motor. Every repeating layer of 13 actin subunits (or 36 nm) has 4 accessible myosin binding-sites. The remaining 9 sites are rejected because a single myosin motor domain will have a steric clash with a neighboring actin filament in the bundle. A myosin with an exceptionally long reach might reach the next 13 subunit layer, but that layer also has only 4 binding sites. Thus, we can calculate the number of binding sites per unit length along the filopodium. This number would hold for a dimeric myosin with any reach, including myosin-5 or myosin-2.

      (4) Quantification of numbers of Myo10 molecules in filopodial puncta (Fig 3C) leads the authors to conclude that ‘only ten or fewer Myo10 molecules are necessary for filopodia initiation’ (pg 7, top). While this is a reasonable based on the assumption that the formation of a puncta ultimately results from an initiation event, little is known about initiation events and without direct observation of coalescence of Myo10 at the cell edge that leads to formation of a filopodium, this seems rather speculative.

      As noted above, we have now performed the necessary live cell imaging of filopodial nucleation events and have updated our conclusions accordingly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have made a series of comments that might help the authors improve their manuscript:

      - A full calibration of the methodology would require testing a wider range of protein amounts, to exhaustively detect the dynamic range of the technique. The authors acknowledge in the discussion that “Furthermore, our estimates of molecules are predicated on the calibration curve of the Halo Standard Protein on the SDS-PAGE gels, which is likely the highest source of error on our molecule counts”. A good way of convincing a nasty reviewer is to provide a calibration with more than 3 reference points. At least this will help exclude from the analysis cells where Myo10 estimates are not in the linear regime of detection.

      We completely agree with the reviewer’s suggestion to build a robust calibration curve. The SDS gel shown in Figure 1C originally contained 4 reference points, but the highest HaloTag standard protein point oversaturated the detector at the set exposure in the TMR channel and was omitted. We have now re-run the SDS gel to include a HaloTag standard protein curve comprising 5 points, alongside all three bioreplicates from the fixed cell experiments and all three bioreplicates from the live cell experiments (updated in Figure 1B-C). We had saved frozen lysates from the original fixed cell work, so we were able to reanalyze our data with the new set of standards. The Myo10 quantities are consistent, but with much tighter CIs from the standard curve.

      - As already said this methodology is intriguing, however, a correlative validation with a conventional SMLM approach to address the bona-fide of the method would be ideal.

      Unfortunately, single molecule approaches for validation are impractical for us. Due to the relatively high magnification of our TIRF microscope and the large spread area of the U2OS cells, single cells typically extend beyond the field of view. We acknowledge the benefits of SMLM quantitative techniques and other approaches cited in the introduction section. To avoid use of special tools/instruments, we offer our methodology, based off Pollard group’s quantitative Western blotting of GFP, as a simpler alternative accessible to anyone.

      - TMR is a small ligand likely interacting also with Halo in its denatured state. However, to clear any doubts a parallel Native-PAGE investigation should be included, or if existing a specific reference should be provided.

      Perhaps there is a misunderstanding here. One of the key advantages of the HaloTag labeling system is that the engineered dehalogenase is covalently modified by the ligand (the TMR-ligand is a suicide substrate). This means that the TMR remains bound even under denaturing conditions, which allows its detection in SDS-PAGE. Native gels are unnecessary here.

      - Moreover, SDS-PAGE is run at alkaline pH, have the authors considered these points when designing the methodology? Fluorescence images were taken in PBS, which has a different pH. Could the authors, or the literature, exclude these aspects as potential pitfalls in the methodology? Also temperature is affecting fluorescence emission, but it is easier to control with certain tolerance in the room-temperature regime.

      Our method does not compare fluorescence values that cross the experimental systems (SDS-PAGE vs. microscopy). Cellular proteins and HaloTag protein standards are compared in a single setting of SDS-PAGE to obtain the average number of Myo10s per transfected cell. Likewise, all measurements on intact (live or fixed) cells are conducted in that single setting to obtain average fluorescence per cell. Thus, there is no issue with the different buffers or temperatures affecting fluorescence emission.

      - The authors should test their approach also with truncation variants of Myosin10 (for instance lacking the PH or motor domain). This is a classical approach that might prove the potential of the technique when altering the capacity of the protein to interact with a main binding partner. Also, treatments that induced filopodia formation might prove useful (i.e., hypotonic media induce filopodia formation in some fibroblast cell lines in our hands).

      The reviewer raises interesting suggestions that we aim to address in future experiments, but truncation variants and environmental perturbations are beyond the focus of the current manuscript. Here, we report on the otherwise unperturbed state when we add exogenous full-length Myo10 to the U2OS cells. But indeed, experiments with Myo10 domain truncations, PI3K and PTEN inhibition, and cargo protein / activating cofactor knock-downs (among others) are on our drawing board.

      - Most of the mechanisms hypothesized in the discussion are sound and plausible. However, the authors have chosen an experimental model where transient transfection of exogenous Myo10 in U2OS is performed. This approach poses two main and fundamental questions that are not resolved by the data provided:

      A) how do different expression levels affect the Myo10 counting?

      Our counting procedure does not assume uniform expression across a population of cells– quite the opposite, in fact. We directly measure Myo10 expression levels on a cell-by-cell basis with microscopy, once we know the number of molecules in our total pool (see the Methods for details). As an example of the final output, Figs. 1D and 1E show the total number of Myo10 molecules per cell for fixed and live cells, respectively.

      B) how does endogenous and unlabeled Myo10 hamper the bonafide of counts? The authors claimed “U2OS cells express low levels of Myo10, so there is a small population of unlabeled endogenous Myo10 unaddressed by this paper”. As presented, the low levels of endogenous Myo10 sound an arbitrary parameter, and there are no data presented that can limit if not exclude this bias in the analysis. To produce data in a genetically modified cell line with Halo-tag on the endogenous protein will represent a much cleaner system. Alternatively, the authors should look for Myo10 KO cell lines where they can back-transfect their Halo-Tagged Myo10 construct in a more consistent framework, focusing on cells with low-to-mid levels of expression.

      We agree, this is an important point to nail down (and is often neglected in the literature). We have now measured the endogenous Myo10 levels in U2OS cells by Western blotting and found that it is undetectable compared to our HaloTagged construct expression. Please see Supp. Fig 1E. Thus, for all intents and purposes, every Myo10 molecule in these experiments came from our expression plasmid. Accordingly, we have removed this caveat from the paper.

      Minor points

      - Figure 1B. To help the reader SDS-PAGE gels annotations should be clearer already from the figure.

      We have updated the annotations for clarity.

      - Methods should be organized in sessions. As it stands, it is hard for the reader to look for technical details.

      We have expanded and added subsections to the Methods as requested.

      - The good practice of indicating the gene and transcript entry numbers and the primer used to amplify and clone into the backbone vectors is getting lost in many papers. I would strongly encourage the authors to add this information to the methods.

      We have included the gene entries to the methods and will include a full FASTA file of the coding sequence as supplementary information to avoid any ambiguity here.

      The authors write “It is unclear how myosins navigate to the right place at the right time, but our results support an important interplay between Myo10 and the actin network.” It is a bit scholastic to say that Myo10 and actin have an important interplay, they are major binding partners. What is the new knowledge contained in this sentence?

      Agreed– we have deleted the sentence in question.

      Reviewer #2 (Recommendations For The Authors):

      The authors should address all the weaknesses indicated in the public review.

      There were a few other places that require clarification.

      On page 4, the last paragraph. It is stated that the targeting of Myo10 was reported/proposed based on previous work (ref 31). The next few sentences are not referenced and thus likely refer to ref 31. The authors did not measure the parameters discussed in these sentences, so it is important to clarify that they are referring to previous work and not the current study.

      Indeed, the next few sentences still refer to old reference 31, so we have now edited the paragraph for clarity.

      On page 7, the reference to Figure 3A indicates that the trend of higher Myo10 correlating with more filopodia. However, the reference to Figure 3B indicates total intracellular Myo10 weakly correlates with more filopodia. However, the x-axis on Figure 3B is filopodia molecules not the intracellular Myo10. Please clarify.

      We appreciate the reviewer for catching our mistake. Those plots are now in Fig. 2 and have been edited accordingly.

      Reviewer #3 (Recommendations For The Authors):

      The Discussion of results at the end of each section is rather brief and could be expanded on a bit more.

      Before we were operating under the constraints of an eLife Short Report. We have now expanded the discussion for a full article.

      The authors mention that actin filaments at the tips of filopodia could be frayed, citing Medalia et al, 2007 (ref 40). That paper describes an early cryoEM analysis of filopodia from the amoeba Dictyostelium. EM images of mammalian filopodia tips, e.g. Svitkina et al, 2003, JCB, do not show quite the same organization of actin as seen in the Dictyostelium filopodia tips. However, recent work from the Bershadsky lab, Li et al, 2023, presents a few cryoEM images of tips of left-bent filopodia that are tightly adhered to a substrate and there it looks like actin filaments become disorganized in tips, along with membrane bulging. The authors should consider expanding discussion of the filopodia tips to take into account what is known for mammalian filopodia.

      We thank the reviewer for bringing these enlightening papers to our attention. We have now included these citations in the discussion.

      Fig 1D - The x-axis is a bit odd, it goes from 0 then to 2.5e+06 with no indication of the bin size. Can this be re-labelled or the scale displayed a bit differently?

      We have double-checked the axis breaks, which are large because the underlying values are large. We have also provided the bin size as requested for all histograms.

      Fig 4A - What is the bin size for the histogram?

      As above, we have now updated the figure legends (now in Fig. 3) to include the bin size.

      Methods -

      - Please provide an accession number for the Myo10 nucleotide sequence used for this work as there are at least two known isoforms.

      Thank you for noting this. We are using the full-length, not the headless isoform. We have now updated the Methods accordingly.

      - No mention is made of the SDS sample buffer used, was that also added to the sample?

      We have now updated the Methods accordingly.

      - How are samples boiled at 70 deg C? Do the authors actually mean ‘heated’?

      Indeed. We have now corrected “boiled” to “heated.”

      - Could the authors please briefly explain the connected component analysis used to identify filopodia?

      We have now updated the Methods accordingly.

      - The intensity of filopodia was determined by dividing tip intensity by the total bioreplicate sum of intensities then multiplying it by the total pool, if this reviewer understands correctly. It sounds like intensities are being averaged across a whole cell population instead of cell-by-cell. Is that correct? If so, can the authors please provide the underlying rationale for this? If not, then please better describe what was actually done.

      We apologize for the confusion. Intensities are being averaged (summed) across a whole cell population, but importantly that step is only used to obtain a scale factor that converts the fluorescence signal at the microscope to the number of molecules. We then use that scale factor for all cells imaged in the bioreplicate, to both 1) find the total Myo10 in that cell, and 2) find the total amount of that Myo10 in any given location within that cell.

      To further clarify, each bioreplicate has a known total number of Myo10 molecules associated with the number of cells loaded onto the SDS gel. From the SDS gel, we have an average number of Myo10 molecules per positively transfected cell. If 50 cell images are analyzed, then there is a Myo10 ‘total pool’ of (50 cells) * (average Myo10 molecules/cell). The fluorescence signal intensities in microscopy were summed for all cells within the bioreplicate (50 cells in this example). However, due to variation in expression, not every cell has the same signal intensity when imaged under the same conditions. It would be inaccurate to assume each cell contains the average Myo10 molecules/cell. Therefore, to get the number of molecules within a given Myo10 cell (or punctum), the summed cell (punctum) intensity was divided by the bioreplicate fluorescence signal intensity sum and multiplied by ‘total pool.’

      - The authors quantify Myo10 protein amounts by western blotting using Halo tag fluorescence, a method that should provide good accuracy. The results depend on the transfection efficiency and it is rarely the case that it is 100%. The authors state that they use a ‘value correction for positively transfected cells’ (pg 11). It is likely that there was a range of expression levels in the cells, how was a cut-off for classifying a cell as non-expressing determined or set?

      As described in the Methods, “microscopy was used to count the percentage of transfected cells from ~105-190 randomly surveyed cells per bioreplicate.” Cells were labeled and located with DAPI. If no TMR signal could be visually detected by microscopy, then the cell was deemed to be non-Myo10 expressing. We did not set a cutoff fluorescence value, as untransfected cells have no detectable signal. Please see Supplementary Figure 1F for examples.

      - “In-house Python scripts” are used for image analysis. Will these be made publicly available?

      Yes, we will package these up on GitHub.

    1. Author response:

      a) that the investigation is very interesting and inventive, and has the potential to reveal some novel insights.

      We thank the reviewers and are excited to improve upon the manuscript through their suggestions.

      b) that the problem of temporal autocorrelation in the fMRI and behavioral data has not been dealt with clearly and convincingly

      We agree that convincingly accounting for fMRI temporal autocorrelation is important to our claims. To reduce its effects, we used field standard methods: prewhitening and autocorrelation modeling with SPM’s FAST algorithm (shown by Olszowy et al. 2019 to be superior to SPM’s default setting), as well as a high-pass filter of 128 Hz. There is still some first-order autocorrelation structure present across voxels in the left hippocampal beta series: across participants there is slightly positive autocorrelation between the betas of decision trials on successive trials, that decays to ~0 at subsequent lags. We note that our task is a narrative, and some patterns over time are expected; instead of attempting to fully eliminate all temporal structure in the data, we aim to show that the temporal distance between trials is unlikely to explain our effects.

      In the within versus between social dimension representational similarity analysis, the average temporal distance between trials is the same within and between dimensions. The clustering analysis is a between subject analysis about individual differences–and the same overall temporal structure is experienced by all participants.

      The trajectory analysis does not focus on consecutive trials across characters, but rather on consecutive trials within characters, where the time gap between successive trials is relatively large and highly variable. An average of over a minute of time elapses between successive decision trials for a given character (versus ~20 seconds across characters), which is on average almost 11 narrative slides and 3 decision trials. Across characters, the temporal gap between decision trials ranges between 12 seconds to more than 10 minutes, reducing the likelihood that temporal autocorrelation drives character-related estimates. We also highlight the shuffled choices control model, which shares the same temporal autocorrelation structure as the model of interest but had significantly poorer social location decoding–a strong indication that temporal autocorrelation alone can’t explain these results. For each participant, we shuffled their choices and re-computed trajectories that preserved the origin and end locations but produced different locations along the way. Our model decoded location significantly better than this null model, and this difference in performance can't be explained by differences in temporal autocorrelation in the neural or behavioral data.

      In the revision, we will further address this concern. For example, we will report more details on the task structure to aid in interpretation and will more precisely characterize the temporal autocorrelation profile. Where appropriate, we will also improve on and/or add more control analyses that preserve the autocorrelation structure.

      c) that a number of important interesting questions have not been addressed: Are the differences between social partners encoded in the hippocampus? Are the social dimensions encoded in a consistent manner across social partners?

      We believe that we should be able to decode other interesting task- and relationship-related features from the hippocampal patterns, as suggested by the reviewers. In the revision, we will attempt several such analyses, while taking care to control for temporal autocorrelation.

      d) that the cluster analysis in the brain-behavior correlation analysis is not well motivated or validated and should be clarified.

      We agree with the reviewers that this clustering analysis should be better described and validated. We aimed to ask whether less diverse and distinctive cognitive representations of the relationship trajectories relate to smaller real-world social networks. This question of impoverished cognitive maps was first raised by Edward Tolman; we think it is relevant here, as well. In the revision, we will clarify its motivations and implications, and better evaluate it for its robustness. Here, we address a few comments made by the reviewers.

      Reviewer 2 noted that other analyses could be used to ask whether social cognitive map complexity relates to real-world social network complexity. While the proposed alternatives are interesting (e.g., correlating decoding accuracy with social network size), we believe these analyses ask different questions. The current co-clustering analysis was intended to estimate map complexity jointly from the behavioral and neural signatures of the social map across characters. In contrast, the spline location decoding is within character; the accuracy of this decoding does not say much about representations across characters. And although we think character decoding is an interesting possible addition to this manuscript, its accuracy may reflect other aspects of the relationships, beyond just spatial representation. Thus, we will provide a clearer and better validated version of the current analysis to address this question.

      We would also like to clarify that we did not collect the Social Network Index questionnaire in the Initial sample; as such these results are more tentative than the other analyses, due to the inability to confirm them in a separate sample. Reviewer 2 also suggests that a single outlier could drive this effect; but estimating the effect with robust regression also returns a right-tailed p < 0.05, showing that the relationship is robust to outliers.

      References

      Olszowy, W., Aston, J., Rua, C. & Williams, W.B. Accurate autocorrelation modeling substantially improves fMRI reliability. Nature Communications. (2019).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      In this study, Ger and colleagues present a valuable new technique that uses recurrent neural networks to distinguish between model misspecification and behavioral stochasticity when interpreting cognitivebehavioral model fits. Evidence for the usefulness of this technique, which is currently based primarily on a relatively simple toy problem, is considered incomplete but could be improved via comparisons to existing approaches and/or applications to other problems. This technique addresses a long-standing problem that is likely to be of interest to researchers pushing the limits of cognitive computational modeling.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ger and colleagues address an issue that often impedes computational modeling: the inherent ambiguity between stochasticity in behavior and structural mismatch between the assumed and true model. They propose a solution to use RNNs to estimate the ceiling on explainable variation within a behavioral dataset. With this information in hand, it is possible to determine the extent to which "worse fits" result from behavioral stochasticity versus failures of the cognitive model to capture nuances in behavior (model misspecification). The authors demonstrate the efficacy of the approach in a synthetic toy problem and then use the method to show that poorer model fits to 2-step data in participants with low IQ are actually due to an increase in inherent stochasticity, rather than systemic mismatch between model and behavior.

      Strengths:

      Overall I found the ideas conveyed in the paper interesting and the paper to be extremely clear and wellwritten. The method itself is clever and intuitive and I believe it could be useful in certain circumstances, particularly ones where the sources of structure in behavioral data are unknown. In general, the support for the method is clear and compelling. The flexibility of the method also means that it can be applied to different types of behavioral data - without any hypotheses about the exact behavioral features that might be present in a given task.

      Thank you for taking the time to review our work and for the positive remarks regarding the manuscript. Below is a point-by-point response to the concerns raised.

      Weaknesses:

      That said, I have some concerns with the manuscript in its current form, largely related to the applicability of the proposed methods for problems of importance in computational cognitive neuroscience. This concern stems from the fact that the toy problem explored in the manuscript is somewhat simple, and the theoretical problem addressed in it could have been identified through other means (for example through the use of posterior predictive checking for model validation), and the actual behavioral data analyzed were interpreted as a null result (failure to reject that the behavioral stochasticity hypothesis), rather than actual identification of model-misspecification. I expand on these primary concerns and raise several smaller points below.

      A primary question I have about this work is whether the method described would actually provide any advantage for real cognitive modeling problems beyond what is typically done to minimize the chance of model misspecification (in particular, post-predictive checking). The toy problem examined in the manuscript is pretty extreme (two of the three synthetic agents are very far from what a human would do on the task, and the models deviate from one another to a degree that detecting the difference should not be difficult for any method). The issue posed in the toy data would easily be identified by following good modeling practices, which include using posterior predictive checking over summary measures to identify model insufficiencies, which in turn would call for the need for a broader set of models (See Wilson & Collins 2019). Thus, I am left wondering whether this method could actually identify model misspecification in real world data, particularly in situations where standard posterior predictive checking would fall short. The conclusions from the main empirical data set rest largely on a null result, and the utility of a method for detecting model misspecification seems like it should depend on its ability to detect its presence, not just its absence, in real data.

      Beyond the question of its advantage above and beyond data- and hypothesis-informed methods for identifying model misspecification, I am also concerned that if the method does identify a modelinsufficiency, then you still would need to use these other methods in order to understand what aspect of behavior deviated from model predictions in order to design a better model. In general, it seems that the authors should be clear that this is a tool that might be helpful in some situations, but that it will need to be used in combination with other well-described modeling techniques (posterior predictive checking for model validation and guiding cognitive model extensions to capture unexplained features of the data). A general stylistic concern I have with this manuscript is that it presents and characterizes a new tool to help with cognitive computational modeling, but it does not really adhere to best modeling practices (see Collins & Wilson, eLife), which involve looking at data to identify core behavioral features and simulating data from best-fitting models to confirm that these features are reproduced. One could take away from this paper that you would be better off fitting a neural network to your behavioral data rather than carefully comparing the predictions of your cognitive model to your actual data, but I think that would be a highly misleading takeaway since summary measures of behavior would just as easily have diagnosed the model misspecification in the toy problem, and have the added advantage that they provide information about which cognitive processes are missing in such cases.

      As a more minor point, it is also worth noting that this method could not distinguish behavioral stochasticity from the deterministic structure that is not repeated across training/test sets (for example, because a specific sequence is present in the training set but not the test set). This should be included in the discussion of method limitations. It was also not entirely clear to me whether the method could be applied to real behavioral data without extensive pretraining (on >500 participants) which would certainly limit its applicability for standard cases.

      The authors focus on model misspecification, but in reality, all of our models are misspecified to some degree since the true process-generating behavior almost certainly deviates from our simple models (ie. as George Box is frequently quoted, "all models are wrong, but some of them are useful"). It would be useful to have some more nuanced discussion of situations in which misspecification is and is not problematic.

      We thank the reviewer for these comments and have made changes to the manuscript to better describe these limitations. We agree with the reviewer and accept that fitting a neural network is by no means a substitute for careful and dedicated cognitive modeling. Cognitive modeling is aimed at describing the latent processes that are assumed to generate the observed data, and we agree that careful description of the data-generating mechanisms, including posterior predictive checks, is always required. However, even a well-defined cognitive model might still have little predictive accuracy, and it is difficult to know how much resources should be put into trying to test and develop new cognitive models to describe the data. We argue that RNN can lead to some insights regarding this question, and highlight the following limitations that were mentioned by the review: 

      First, we accept that it is important to provide positive evidence for the existence of model misspecification. In that sense, a result where the network shows dramatic improvement over the best-fitting theoretical model is easier to interpret compared to when the network shows no (or very little) improvement in predictive accuracy. This is because there is always an option that the network, for some reason, was not flexible enough to learn the data-generating model, or because the data-generating mechanism has changed from training to test. We have now added this more clearly in the limitation section. However, when it comes to our empirical results, we would like to emphasize that the network did in fact improve the predictive accuracy for all participants. The result shows support in favor of a "null" hypothesis in the sense that we seem to find evidence that the change in predictive accuracy between the theoretical model and RNN is not systematic across levels of IQ. This allows us to quantify evidence (use Bayesian statistics) for no systematic model misspecification as a function of IQ. While it is always possible that a different model might systematically improve the predictive accuracy of low vs high IQ individuals' data, this seems less likely given the flexibility of the current results.  

      Second, we agree that our current study only applies to the RL models that we tested. In the context of RL, we have used a well-established and frequently applied paradigm and models. We emphasize in the discussion that simulations are required to further validate other uses for this method with other paradigms.  

      Third, we also accept that posterior predictive checks should always be capitalized when possible, which is now emphasized in the discussion. However, we note that these are not always easy to interpret in a meaningful way and may not always provide details regarding model insufficiencies as described by the reviewer. It is very hard to determine what should be considered as a good prediction and since the generative model is always unknown, sometimes very low predictive accuracy can still be at the peak of possible model performance. This is because the data might be generated from a very noisy process, capping the possible predictive accuracy at a very low point. However, when strictly using theoretical modeling, it is very hard to determine what predictive accuracy to expect. Also, predictive checks are not always easy to interpret visually or otherwise. For example, in two-armed bandit tasks where there are only two actions, the prediction of choices is easier to understand in our opinion when described using a confusion matrix that summarizes the model's ability to predict the empirical behavior (which becomes similar to the predictive estimation we describe in eq 22).  

      Finally, this approach indeed requires a large dataset, with at least three sessions for each participant (training, validation, and test). Further studies might shed more light on the use of optimal epochs as a proxy for noise/complexity that can be used with less data (i.e., training and validation, without a test set).

      Please see our changes at the end of this document.  

      Reviewer #2 (Public Review):

      SUMMARY:

      In this manuscript, Ger and colleagues propose two complementary analytical methods aimed at quantifying the model misspecification and irreducible stochasticity in human choice behavior. The first method involves fitting recurrent neural networks (RNNs) and theoretical models to human choices and interpreting the better performance of RNNs as providing evidence of the misspecifications of theoretical models. The second method involves estimating the number of training iterations for which the fitted RNN achieves the best prediction of human choice behavior in a separate, validation data set, following an approach known as "early stopping". This number is then interpreted as a proxy for the amount of explainable variability in behavior, such that fewer iterations (earlier stopping) correspond to a higher amount of irreducible stochasticity in the data. The authors validate the two methods using simulations of choice behavior in a two-stage task, where the simulated behavior is generated by different known models. Finally, the authors use their approach in a real data set of human choices in the two-stage task, concluding that low-IQ subjects exhibit greater levels of stochasticity than high-IQ subjects.

      STRENGTHS:

      The manuscript explores an extremely important topic to scientists interested in characterizing human decision-making. While it is generally acknowledged that any computational model of behavior will be limited in its ability to describe a particular data set, one should hope to understand whether these limitations arise due to model misspecification or due to irreducible stochasticity in the data. Evidence for the former suggests that better models ought to exist; evidence for the latter suggests they might not.

      To address this important topic, the authors elaborate carefully on the rationale of their proposed approach. They describe a variety of simulations - for which the ground truth models and the amount of behavioral stochasticity are known - to validate their approaches. This enables the reader to understand the benefits (and limitations) of these approaches when applied to the two-stage task, a task paradigm commonly used in the field. Through a set of convincing analyses, the authors demonstrate that their approach is capable of identifying situations where an alternative, untested computational model can outperform the set of tested models, before applying these techniques to a realistic data set.

      Thank you for reviewing our work and for the positive tone. Please find below a point-by-point response to the concerns you have raised.

      WEAKNESSES:

      The most significant weakness is that the paper rests on the implicit assumption that the fitted RNNs explain as much variance as possible, an assumption that is likely incorrect and which can result in incorrect conclusions. While in low-dimensional tasks RNNs can predict behavior as well as the data-generating models, this is not *always* the case, and the paper itself illustrates (in Figure 3) several cases where the fitted RNNs fall short of the ground-truth model. In such cases, we cannot conclude that a subject exhibiting a relatively poor RNN fit necessarily has a relatively high degree of behavioral stochasticity. Instead, it is at least conceivable that this subject's behavior is generated precisely (i.e., with low noise) by an alternative model that is poorly fit by an RNN - e.g., a model with long-term sequential dependencies, which RNNs are known to have difficulties in capturing.

      These situations could lead to incorrect conclusions for both of the proposed methods. First, the model misspecification analysis might show equal predictive performance for a particular theoretical model and for the RNN. While a scientist might be inclined to conclude that the theoretical model explains the maximum amount of explainable variance and therefore that no better model should exist, the scenario in the previous paragraph suggests that a superior model might nonetheless exist. Second, in the earlystopping analysis, a particular subject may achieve optimal validation performance with fewer epochs than another, leading the scientist to conclude that this subject exhibits higher behavioral noise. However, as before, this could again result from the fact that this subject's behavior is produced with little noise by a different model. Admittedly, the existence of such scenarios *in principle* does not mean that such scenarios are common, and the conclusions drawn in the paper are likely appropriate for the particular examples analyzed. However, it is much less obvious that the RNNs will provide optimal fits in other types of tasks, particularly those with more complex rules and long-term sequential dependencies, and in such scenarios, an ill-advised scientist might end up drawing incorrect conclusions from the application of the proposed approaches.

      Yes, we understand and agree. A negative result where RNN is unable to overcome the best fitting theoretical model would always leave room for doubt regarding the fact that a different approach might yield better results. In contrast, a dramatic improvement in predictive accuracy for RNN is easier to interpret since it implies that the theoretical model can be improved. We have made an effort to make this issue clear and more articulated in the discussion. We specifically and directly mention in the discussion that “Equating RNN performance with the generative model should be avoided”.   

      However, we would like to note that our empirical results provided a somewhat more nuanced scenario where we found that the RNN generally improved the predictive accuracy of most participants. Importantly, this improvement was found to be equal across participants with no systematic benefits for low vs high IQ participants. We understand that there is always the possibility that another model would show a systematic benefit for low vs. high IQ participants, however, we suggest that this is less likely given the current evidence. We have made an effort to clearly note these issues in the discussion.  

      In addition to this general limitation, the paper also makes a few additional claims that are not fully supported by the provided evidence. For example, Figure 4 highlights the relationship between the optimal epochs and agent noise. Yet, it is nonetheless possible that the optimal epoch is influenced by model parameters other than inverse temperature (e.g., learning rate). This could again lead to invalid conclusions, such as concluding that low-IQ is associated with optimal epoch when an alternative account might be that low-IQ is associated with low learning rate, which in turn is associated with optimal epoch. Yet additional factors such as the deep double-descent (Nakkiran et al., ICLR 2020) can also influence the optimal epoch value as computed by the authors.

      An additional issue is that Figure 4 reports an association between optimal epoch and noise, but noise is normalized by the true minimal/maximal inverse-temperature of hybrid agents (Eq. 23). It is thus possible that the relationship does not hold for more extreme values of inverse-temperature such as beta=0 (extremely noisy behavior) or beta=inf (deterministic behavior), two important special cases that should be incorporated in the current study. Finally, even taking the association in Figure 4 at face value, there are potential issues with inferring noise from the optimal epoch when their correlation is only r~=0.7. As shown in the figures, upon finding a very low optimal epoch for a particular subject, one might be compelled to infer high amounts of noise, even though several agents may exhibit a low optimal epoch despite having very little noise.

      Thank you for these comments. Indeed, there is much we do not yet fully understand about the factors that influence optimal epochs. Currently, it is clear to us that the number of optimal epochs is influenced by a variety of factors, including network size, the data size, and other cognitive parameters, such as the learning rate. We hope that our work serves as a proof-of-concept, suggesting that, in certain scenarios, the number of epochs can be utilized as an empirical estimate. Moreover, we maintain that, at least within the context of the current paradigm, the number of optimal epochs is primarily sensitive to the amount of true underlying noise, assuming the number of trials and network size are constant. We are therefore hopeful that this proofof-concept will encourage research that will further examine the factors that influence the optimal epochs in different behavioral paradigms.  

      To address the reviewer's justified concerns, we have made several amendments to the manuscript. First, we added an additional version of Figure 4 in the Supplementary Information material, where the noise parameter values are not scaled. We hope this adjustment clarifies that the parameters were tested across a broad spectrum of values (e.g., 0 to 10 for the hybrid model), spanning the two extremes of complete randomness and high determinism. Second, we included a linear regression analysis showing the association of all model parameters (including noise) with the optimal number of epochs. As anticipated by the reviewer, the learning rate was also found to be associated with the number of optimal epochs. Nonetheless, the noise parameter appears to maintain the most substantial association with the number of optimal epochs. We have also added a specific mentioning of these associations in the discussion, to inform readers that the association between the number of optimal epochs and model parameters should be examined using simulation for other paradigms/models. Lastly, we acknowledge in the discussion that the findings regarding the association between the number of optimal epochs and noise warrant further investigation, considering other factors that might influence the determination of the optimal epoch point and the fact that the correlation with noise is strong, but not perfect (in the range of 0.7).

      The discussion now includes the following:

      “Several limitations should be considered in our proposed approach. First, fitting a data-driven neural network is evidently not enough to produce a comprehensive theoretical description of the data generation mechanisms. Currently, best practices for cognitive modeling \citep{wilson2019ten} require identifying under what conditions the model struggles to predict the data (e.g., using posterior predictive checks), and describing a different theoretical model that could account for these disadvantages in prediction. However, identifying conditions where the model shortcomings in predictive accuracy are due to model misspecifications rather than noisier behavior is a challenging task. We propose leveraging data-driven RNNs as a supplementary tool, particularly when they significantly outperform existing theoretical models, followed by refined theoretical modeling to provide insights into what processes were mis-specified in the initial modeling effort.

      Second, although we observed a robust association between the optimal number of epochs and true noise across varying network sizes and dataset sizes (see Fig.~\ref{figS2}), additional factors such as network architecture and other model parameters (e.g., learning rate, see .~\ref{figS7}) might influence this estimation. Further research is required to allow us to better understand how and why different factors change the number of optimal epochs for a given dataset before it can be applied with confidence to empirical investigations. 

      Third, the empirical dataset used in our study consisted of data collected from human participants at a single time point, serving as the training set for our RNN. The test set data, collected with a time interval of approximately $\sim6$ and $\sim18$ months, introduced the possibility of changes in participants' decision-making strategies over time. In our analysis, we neglected any possible changes in participants' decision-making strategies during that time, changes that may lead to poorer generalization performance of our approach. Thus, further studies are needed to eliminate such possible explanations.

      Fourth, our simulations, albeit illustrative, were confined to known models, necessitating in-silico validation before extrapolating the efficacy of our approach to other model classes and tasks. Our aim was to showcase the potential benefits of using a data-driven approach, particularly when faced with unknown models. However, whether RNNs will provide optimal fits for tasks with more complex rules and long-term sequential dependencies remains uncertain.

      Finally, while positive outcomes where RNNs surpass theoretical models can prompt insightful model refinement, caution is warranted in directly equating RNN performance with that of the generative model, as seen in our simulations (e.g., Figure 3). We highlight that our empirical findings depict a more complex scenario, wherein the RNN enhanced the predictive accuracy for all participants uniformly. Notably, we also provide evidence supporting a null effect among individuals, with no consistent difference in RNN improvement over the theoretical model based on IQ. Although it remains conceivable that a different datadriven model could systematically heighten the predictive accuracy for individuals with lower IQs in this task, such a possibility seems less probable in light of the current findings.”

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Is the t that gets fed as input to RNN just timestep?

      t = last transition type (rare/common). not timestep

      Line 378: what does "optimal epochs" mean here?

      The number of optimal training epochs that minimize both underfitting and overfitting (define in the line ~300)

      Line 443: I don't think "identical" is the right word here - surely the authors just mean that there is not an obvious systematic difference in the distributions.

      Fixed

      I was expecting to see ~500 points in Figure 7a, but there seem to be only 50... why weren't all datasets with at least 2 sessions used for this analysis?

      We used the ~500 subjects (only 2 datasets) to pre-train the RNN, and then fine-tuned the pre-trained RNN on the other 54 subjects that have 3 datasets. The correlation of IQ and optimal epoch also hold for the 500 subjects as shown below. 

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      Figure 3b: despite spending a long time trying to understand the meaning of each cell of the confusion matrix, I'm still unsure what they represent. Would be great if you could spell out the meaning of each cell individually, at least for the first matrix in the paper.

      We added a clarification to the Figure caption. 

      Figure 5: Why didn't the authors show this exact scenario using simulated data? It would be much easier to understand the predictions of this figure if they had been demonstrated in simulated data, such as individuals with different amounts of behavioral noise or different levels of model misspecifications.

      In Figure 5 the x-axis represents IQ. Replacing the x-axis with true noise would make what we present now as Figure 4. We have made an effort to emphasize the meaning of the axes in the caption. 

      Line 195 ("...in the action selection. Where"). Typo? No period is needed before "where".

      Fixed

      Line 213 ("K dominated-hand model"). I was intrigued by this model, but wasn't sure whether it has been used previously in the literature, or whether this is the first time it has been proposed.

      This is the first time that we know of that this model is used.  

      Line 345 ("This suggests that RNN is flexible enough to approximate a wide range of different behavioral models"): Worth explaining why (i.e., because the GRUs are able to capture dependencies across longer delays than a k-order Logistic Regression model).

      Line 356 ("We were interested to test"): Suggestion: "We were interested in testing".

      Fixed

      Line 389 ("However, as long as the number of observations and the size of the network is the same between two datasets, the number of optimal epochs can be used to estimate whether the dataset of one participant is noisier compared with a second dataset."): This is an important claim that should ideally be demonstrated directly. The paper only illustrates this effect through a correlation and a scatter plot, where higher noise tends to predict a lower optimal epoch. However, is the claim here that, in some circumstances, optimal epoch can be used to *deterministically* estimate noise? If so, this would be a strong result and should ideally be included in the paper.

      We have now omitted this sentenced and toned down our claims, suggesting that while we did find a strong association between noise and optimal epochs, future research is required to established to what extent this could be differentiated from other factors (i.e., network size, amount of observations).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important new insights into how multisensory information is processed in the lateral cortex of the inferior colliculus, a poorly understood part of the auditory midbrain. By developing new imaging techniques that provide the first optical access to the lateral cortex in a living animal, the authors provide convincing in vivo evidence that this region contains separate subregions that can be distinguished by their sensory inputs and neurochemical profiles, as suggested by previous anatomical and in vitro studies. Additional information and analyses are needed, however, to allow readers to fully appreciate what was done, and the comparison of multisensory interactions between awake and anesthetized mice would benefit from being explored in more detail.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors provide a characterisation of auditory responses (tones, noise, and amplitude-modulated sounds) and bimodal (somatosensory-auditory) responses and interactions in the higher-order lateral cortex (LC) of the inferior colliculus (IC) and compare these characteristics with the higher order dorsal cortex (DC) of the IC - in awake and anaesthetised mice. Dan Llano's group has previously identified gaba'ergic patches (modules) in the LC distinctly receiving inputs from somatosensory structures, surrounded by matrix regions receiving inputs from the auditory cortex. They here use 2P calcium imaging combined with an implanted prism to - for the first time - get functional optical access to these subregions (modules and matrix) in the lateral cortex of IC in vivo, in order to also characterise the functional difference in these subparts of LC. They find that both DC and LC of both awake and anaesthetised mice appear to be more responsive to more complex sounds (amplitude-modulated noise) compared to pure tones and that under anesthesia the matrix of LC is more modulated by specific frequency and temporal content compared to the gabaergic modules in LC. However, while both LC and DC appear to have low-frequency preferences, this preference for low frequencies is more pronounced in DC. Furthermore, in both awake and anesthetized mice, somatosensory inputs are capable of driving responses on their own in the modules of LC, but very little (possibly not at all) in the matrix. However, bimodal interactions may be different under awake and anesthesia in LC, which warrants deeper investigation by the authors: They find, under anesthesia, more bimodal enhancement in modules of LC compared to the matrix of LC and bimodal suppression dominating the matrix of LC. In contrast, under awake conditions bimodal enhancement is almost exclusively found in the matrix of LC, and bimodal suppression dominates both matrix and modules of LC.

      The paper provides new information about how subregions with different inputs and neurochemical profiles in the higher-order auditory midbrain process auditory and multisensory information, and is useful for the auditory and multisensory circuits neuroscience community.

      Strengths:

      The major strength of this study is undoubtedly the fact that the authors for the first time provide optical access to a subcortical region (the lateral cortex of the inferior colliculus (i.e. higher order auditory midbrain)) which we know (from previous work by the same group) have optically identifiable subdivisions with unique inputs and neurotransmitter release, and plays a central role in auditory and multisensory processing. A description of basic auditory and multisensory properties of this structure is therefore very useful for understanding auditory processing and multisensory interactions in subcortical circuits.

      Weaknesses:

      I have divided my comments about weaknesses and improvements into major and minor comments. All of which I believe are addressable by the reviewers to provide a more clear picture of their characterisation of the higher-order auditory midbrain.

      Major comment:

      (1) The differences between multisensory interactions in LC in anaesthetised and awake preparations appear to be qualitatively different, though the authors claim they are similar (see also minor comment related to figure 10H for further explanation of what I mean). However, the findings in awake and anaesthetised conditions are summarised differently, and plotting of similar findings in the awake figures and anaesthetised figures are different - and different statistics are used for the same comparisons. This makes it very difficult to assess how multisensory integration in LC is different under awake and anaesthetised conditions. I suggest that the authors plot (and test with similar statistics) the summary plots in Figure 8 (i.e. Figure 8H-K) for awake data in Figure 10, and also make similar plots to Figures 10G-H for anaesthetised data. This will help the readers understand the differences between bimodal stimulation effects on awake and anaesthetised preparations - which in its current form, looks very distinct. In general, it is unclear to me why the awake data related to Figures 9 and 10 is presented in a different way for similar comparisons. Please streamline the presentation of results for anaesthetised and awake results to aid the comparison of results in different states, and explicitly state and discuss differences under awake and anaesthetised conditions.

      We thank the reviewer for the valuable suggestion. We only highlighted the similarities between the data obtained from anesthetized and awake preparations to indicate the ability to reproduce the technique in awake animals for future assessment. Identifying those similarities between the two experimental setups was based on the comparison between modules vs matrix or LC vs DC within each experimental setup (awake vs anesthetized). Therefore, the statistics were chosen differently for each setup based on the size of the subjects (n) within each experimental preparation. However, we agree with the reviewer’s comment that there are differences between the anesthetized and awake data. To examine these differences, we ran the same statistics for Figure 5 (tonotopy of LC vs. DC-anesthetic animals) and Figure 9 (tonotopy of LC vs DC-awake animals). In addition, we added a new figure after Figure 9 to separate the statistical analysis from the maps. Accordingly, Figures 4 and 5 (maps and analysis, respectively -anesthetized animals) now match Figures 9 and 10 (maps and analysis, respectively – awake animals). We also did the same thing for Figures 7 (microprism imaging of the LC - anesthetized animals), 8 (imaging of the LC from the dorsal surface - anesthetized animals) as well as Figure 11 or old Figure 10 (microprism imaging of the LC - awake animals) to address the similarities and differences of the multisensory data between awake and anesthetized animals. We edited the text accordingly in the result and discussion sections.

      (2) The claim about the degree of tonotopy in LC and DC should be aided by summary statistics to understand the degree to which tonotopy is actually present. For example, the authors could demonstrate that it is not possible/or is possible to predict above chance a cell's BF based on the group of other cells in the area. This will help understand to what degree the tonotopy is topographic vs salt and pepper. Also, it would be good to know if the gaba'ergic modules have a higher propensity of particular BFs or tonotopic structure compared to matrix regions in LC, and also if general tuning properties (e.g. tuning width) are different from the matrix cells and the ones in DC.

      Thank you for the reviewer’s suggestion. We have examined the tonotopy of LC and DC using two regression models (linear and quadratic polynomial) between the BFs of the cells and their location on the anatomical axis. Therefore, the tonotopy is indicated by a significant regression fit with a high R2 between the BFs the cells, and their location within each structure. For the DC, there was a significant regression fit between the BFs of the cells and their locations over the rostromedial to the caudolateral axis. Additionally, the R2 of the quadratic polynomial fit was higher than that of the linear fit, which indicates a nonlinear distribution of cells based on their BFs, which is consistent with the presence of high-low-high tuning over the DC surface. Given that the microprism cannot image the whole area of the LC, and it images a slightly different area in each animal, it was very difficult to get a consistent map for the LC as well as a solid conclusion about the LC tonotopy. However, we have examined the regression fit between the BFs of cells and their location along the main four anatomical axes of the field of view obtained from each animal (dorsal to ventral), (rostral to caudal), (dorsocaudal to ventrorostral) (dorsorostral to ventrocoudal). Unlike the DC, the LC imaged via microprism showed a lower R2 for both linear and quadratic regression mostly in the dorsoventral axis. We show the fitting curves of these regressions in Figure 4-figure supplement 1 (anesthetized data) and Figure 9-figure supplement 1 (awake data). Despite the inconsistent tonotopy of the LC imaged via microprism, the modules were found to have a higher BFs median at 10 kHz compared to matrix that had a lower BFs median at 7.1 kHz, which was consistent across the anesthetized and awake animals. We have added these results in the corresponding spot in the results section (lines 193-197 and 361-364). We have examined the tuning width using the binarized receptive field sum (RFS) method in which each neuron was given a value of 1 if it responds to a single frequency (Narrow RF), but this value increases if the neuron responds to more neighbor frequencies (wide RF). We did this calculation across all the sound levels. Both DC and LC of the anesthetized animals had higher RFS mean and median than those of awake animals given that ketamine was known to broaden the RF. However, in both preparations (anesthetized and awake), the DC had a higher RFS mean than that of the LC, which could be consistent with the finding that the DC had a relatively lower SMI than the LC. To show these new data, we made a new Figure 10-figure supplement 1, and we edited the text accordingly [lines 372-379 & 527-531].

      (3) Throughout the paper more information needs to be given about the number of cells, sessions, and animals used in each panel, and what level was used as n in the statistical tests. For example, in Figure 4 I can not tell if the 4 mice shown for LC imaging are the only 4 mice imaged, and used in the Figure 4E summary or if these are just examples. In general, throughout the paper, it is currently not possible to assess how many cells, sessions, and animals the data shown comes from.

      Thank you for the reviewer’s comment. We do apologize for not adding this information. We added all the information regarding the size of the statistical subjects (number of cells or number of animals used) for every test outcome. To keep the flow of the text, we added the details of the statistical tests in the legends of the figures.

      (4) Throughout the paper, to better understand the summary maps and plots, it would be helpful to see example responses of the different components investigated. For example, given that module cells appear to have more auditory offset responses, it would be helpful to see what the bimodal, sound-only, and somatosensory responses look like in example cells in LC modules. This also goes for just general examples of what the responses to auditory and somatosensory inputs look like in DC vs LC. In general example plots of what the responses actually look like are needed to better understand what is being summarised.

      Thank you for the reviewer’s comment and suggestion. We modified Figure 6 and the text accordingly to include all the significant examples of cells discussed throughout the work.

      Reviewer #2 (Public Review):

      Summary:

      The study describes differences in responses to sounds and whisker deflections as well as combinations of these stimuli in different neurochemically defined subsections of the lateral and dorsal cortex of the inferior colliculus in anesthetised and awake mice.

      Strengths:

      The main achievement of the work lies in obtaining the data in the first place as this required establishing and refining a challenging surgical procedure to insert a prism that enabled the authors to visualise the lateral surface of the inferior colliculus. Using this approach, the authors were then able to provide the first functional comparison of neural responses inside and outside of the GABA-rich modules of the lateral cortex. The strongest and most interesting aspects of the results, in my opinion, concern the interactions of auditory and somatosensory stimulation. For instance, the authors find that a) somatosensory-responses are strongest inside the modules and b) somatosensory-auditory suppression is stronger in the matrix than in the modules. This suggests that, while somatosensory inputs preferentially target the GABA-rich modules, they do not exclusively target GABAergic neurons within the modules (given that the authors record exclusively from excitatory neurons we wouldn't expect to see somatosensory responses if they targeted exclusively GABAergic neurons), and that the GABAergic neurons of the modules (consistent with previous work) preferentially impact neurons outside the modules, i.e. via long-range connections.

      Weaknesses:

      While the findings are of interest to the subfield they have only rather limited implications beyond it. The writing is not as precise as it could be. Consequently, the manuscript is unclear in some places. For instance, the text is somewhat confusing as to whether there is a difference in the pattern (modules vs matrix) of somatosensory-auditory suppression between anesthetized and awake animals. Furthermore, there are aspects of the results which are potentially very interesting but have not been explored. For example, there is a remarkable degree of clustering of response properties evident in many of the maps included in the paper. Taking Figure 7 for instance, rather than a salt and pepper organization we can see auditory responsive neurons clumped together and non-responsive neurons clumped together and in the panels below we can see off-responsive neurons forming clusters (although it is not easy to make out the magenta dots against the black background). This degree of clustering seems much stronger than expected and deserves further attention.

      Thank you for the reviewer’s comment. We do apologize if some areas in the manuscript were imprecisely written. For anesthetized and awake data, we have only emphasized the similarities between the two setups to show the ability to use microprism in awake animals for future assessment. To highlight the differences between anesthetized and awake animals, we have now run uniform statistics for all the data collected from both setups. Accordingly, we have edited Figures 4 and 5 (tonotopy-anesthetized) to match Figures 9 and new Figure 10 (tonotopy-awake). Also, we edited Figures 7 and 8 (multisensory- anesthetized) to match Figure 11 or old Figure 10 (multisensory- awake). We edited the text accordingly in the results section and discussed the possible differences between anesthetized and awake data in the discussion section [lines 521-553].

      We agree with the reviewer’s comment that the cells were topographically clustered based on their responses. Some of these clusters include the somatosensory responsive cells, which were located mostly in the modules (Figures 7D and 8E). Also, the auditory responsive cells with offset responses were clustered mostly in the modules (Figures 7C and 8F). Accordingly, we have edited the text to emphasize this finding.

      We noticed also that some responsive cells to the tested stimulations were surrounded by nonresponsive cells. By comparing the response of the cells to different stimuli we found that while Figures 7 and 11 (old Figure 10) showed only the response of the cells to auditory stimulation (unmodulated broadband noise at 80 dB) and somatosensory stimulation (whisker deflection), some nonresponsive cells to these specific stimulations were found to be responsive to pure tones of different frequencies and amplitudes. As an indicator of the cells' viability, we additionally examined the spontaneous activity of the nonresponsive cells across different data sets. We note that spontaneous activity was rare for all cells even among the responsive cells to sound or somatosensory stimulations. This finding could be related to the possibility that the 2P imaging of calcium signals may not be sensitive enough to track spontaneous activity that may originate from single spikes. However, in some data sets, we have found that the cells that did not respond to any tested stimuli showed spontaneous activity when no stimulation was given indicating the viability of those cells. We have addressed the activity of the non-responsive cells in the text along with a new Figure 11-figure supplement 1.

      We changed the magenta into a green color to be suitable for the dark background. Also, we have completely changed the color palette of all of our images to be suitable for color-blind readers as suggested by reviewer 1.

      Reviewer #3 (Public Review):

      The lateral cortex of the inferior colliculus (LC) is a region of the auditory midbrain noted for receiving both auditory and somatosensory input. Anatomical studies have established that somatosensory input primarily impinges on "modular" regions of the LC, which are characterized by high densities of GABAergic neurons, while auditory input is more prominent in the "matrix" regions that surround the modules. However, how auditory and somatosensory stimuli shape activity, both individually and when combined, in the modular and matrix regions of the LC has remained unknown.

      The major obstacle to progress has been the location of the LC on the lateral edge of the inferior colliculus where it cannot be accessed in vivo using conventional imaging approaches. The authors overcame this obstacle by developing methods to implant a microprism adjacent to the LC. By redirecting light from the lateral surface of the LC to the dorsal surface of the microprism, the microprism enabled two-photon imaging of the LC via a dorsal approach in anesthetized and awake mice. Then, by crossing GAD-67-GFP mice with Thy1-jRGECO1a mice, the authors showed that they could identify LC modules in vivo using GFP fluorescence while assessing neural responses to auditory, somatosensory, and multimodal stimuli using Ca2+ imaging. Critically, the authors also validated the accuracy of the microprism technique by directly comparing results obtained with a microprism to data collected using conventional imaging of the dorsal-most LC modules, which are directly visible on the dorsal IC surface, finding good correlations between the approaches.

      Through this innovative combination of techniques, the authors found that matrix neurons were more sensitive to auditory stimuli than modular neurons, modular neurons were more sensitive to somatosensory stimuli than matrix neurons, and bimodal, auditory-somatosensory stimuli were more likely to suppress activity in matrix neurons and enhance activity in modular neurons. Interestingly, despite their higher sensitivity to somatosensory stimuli than matrix neurons, modular neurons in the anesthetized prep were far more responsive to auditory stimuli than somatosensory stimuli (albeit with a tendency to have offset responses to sounds). This suggests that modular neurons should not be thought of as primarily representing somatosensory input, but rather as being more prone to having their auditory responses modified by somatosensory input. However, this trend was reversed in the awake prep, where modular neurons became more responsive to somatosensory stimuli than auditory stimuli. Thus, to this reviewer, the most intriguing result of the present study is the dramatic extent to which neural responses in the LC changed in the awake preparation. While this is not entirely unexpected, the magnitude and stimulus specificity of the changes caused by anesthesia highlight the extent to which higher-level sensory processing is affected by anesthesia and strongly suggest that future studies of LC function should be conducted in awake animals.

      Together, the results of this study expand our understanding of the functional roles of matrix and module neurons by showing that responses in LC subregions are more complicated than might have been expected based on anatomy alone. The development of the microprism technique for imaging the LC will be a boon to the field, finally enabling much-needed studies of LC function in vivo. The experiments were well-designed and well-controlled, and the limitations of two-photon imaging for tracking neural activity are acknowledged. Appropriate statistical tests were used. There are three main issues the authors should address, but otherwise, this study represents an important advance in the field.

      (1) Please address whether the Thy1 mouse evenly expresses jRGECO1a in all LC neurons. It is known that these mice express jRGECO1a in subsets of neurons in the cerebral cortex, and similar biases in the LC could have biased the results here.

      Thank you for the reviewer’s comment. In the work published by Dana, et al, the expression of jRGECO1a in all Thy1 mouse lines was determined by the brightness of the jRGECO1a in the soma. Given that some cells do not show a detected level of jRGECO1a fluorescence until activated, the difference in expression shown in different brain regions could be related to the level of neuronal activity at the time of sample processing and not the expression levels of the indicator itself. To the best of our knowledge, there is no antibody for jRGECO1a, which can be used for detecting the expression levels of the indicator regardless of the neuronal activity. To test the hypothesis that DC and LC have different levels of jRGECO1a, we examined the expression levels of jRGECO1a after we perfused the mice with high potassium saline to elicit a general neuronal depolarization in the whole brain. Then we immunostained against NeuN (the neuronal marker) to quantify the percentage of the neurons expressing jRGECO1a to the total number of neurons (indicated by NeuN). To have a fair comparison, we restricted our analysis to include the areas imaged only by 2P as some regions were not accessible by microprism such as the deep ventral regions of the LC. There is a similar % of cells expressing jRGECO1a in DC and LC. As expected, the neurons expressing jRGECO1a were only nonGABAergic cells. We addressed these findings in the new Figure 3-figure Supplement 1 as well as the corresponding text in the results [lines 178-184] and methods sections [lines 878-892].

      (2) I suggest adding a paragraph or two to the discussion to address the large differences observed between the anesthetized and awake preparations. For example, somatosensory responses in the modules increased dramatically from 14.4% in the anesthetized prep to 63.6% in the awake prep. At the same time, auditory responses decreased from 52.1% to 22%. (Numbers for anesthetized prep include auditory responses and somatosensory + auditory responses.). In addition, the tonotopy of the DC shifted in the awake condition. These are intriguing changes that are not entirely expected from the switch to an awake prep and therefore warrant discussion.

      Thank you for the reviewer’s comment. To determine if differences exist between anesthetized and awake data, we have now used the same statistics and edited Figures 4,5,7,8,9, and 10 as well as added a new Figure 11. Accordingly, we have edited the result section and added a paragraph addressing the possible differences between the two preparations in the Discussion section [lines 521-553]..

      (3) For somatosensory stimuli, the authors used whisker deflection, but based on the anatomy, this is presumably not the only somatosensory stimulus that affects LC. The authors could help readers place the present results in a broader context by discussing how other somatosensory stimuli might come into play. For example, might a larger percentage of modular neurons be activated by somatosensory stimuli if more diverse stimuli were used?

      We agree with the reviewer’s point. Indeed, the modules are receiving different inputs from different somatosensory sources such as somatosensory cortex and dorsal column nuclei, which could indicate that the activity of the cells in the modular areas could be evoked by different types of somatosensory stimulations, which is an open area for future studies. We have discussed this point in the revised Discussion section [lines 516-520].

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 3H: The lateral surface seems quite damaged by the prism. An example slice of the imaging area of each mouse would help the reader better understand the extent of damage the prism leaves in the area of interest.

      Thank you for the reviewer’s comment. We already have included such images in Figures 4A, 7A, and 9A to present the field of view of all prism experiments. However, we need to clarify the point of tissue damage. The insertion of microprism may be associated with some tissue damage as a result of making the pocket for the microprism to be inserted, but it is not possible to get neuronal signals from a damaged field of view. Therefore, we do not believe that there is tissue damage to the parts of the LC imaged by microprism. However, there may be some areas where the microprism is not in direct contact with the LC surface. These areas are located mostly in the periphery of the field of view, and they are completely black as they are out of focus (i.e., the left side of Figure 3B). The right side of Figure 3b as well as Figure 3A have some black areas, which present the vasculatures, where there are no red signals because of the lack of jRGECO1a expression in those areas.

      (2) In relation to the data shown in Figure 4E it is claimed that LC is tuned to higher frequencies (lines 195-196). However, the majority of cells appear to be tuned to frequencies below 14kHz (with a median of 7.5 kHz), which is quite low for the mouse. I assume that the authors mean frequencies that are relatively higher than the DC, but it is worth mentioning in the text that the BFs found in the LC are quite low-frequency responses for the mouse.

      Thank you for the reviewer’s comment, which we agree with. We edited this part by acknowledging that around 50% of the LC cells had a low-frequency bias to 5 and 7.1 kHz. Then we mentioned that most of the LC cells are tuned to relatively higher frequencies than those of the DC [lines 215-218].

      (3) Figure 5A-C: Is it the tone-responsive cells plus an additional ~22% of cells that respond to AM, or are there also cells that respond to tones that do not respond to AM. Please break down to which degree the tone and AM responsive cells are overlapping.

      Thank you for the reviewer’s comment and suggestion. We broke down the responsive cells into cells responsive only to pure tone (tone selective cells or Tone-sel) or to only AM-noise (noise selective cells or Noise-sel) as well as cells responding to both sounds (nonselective cells or Non-sel). We examined the fractions of these categories of cells in both LC and DC within all responsive neurons. Accordingly, we have edited Figure 5A-C as well as the text [lines 229-243].

      (4) Figure 5D. It is unclear to me how a cell is classified as SMI or TMI responsive after computing the SMI or TMI for each cell. What statistic was used to determine if the cell was responsive or not?

      Thank you for the reviewer’s comment. We do apologize for the confusion caused by Figures 5D and E. These figures do not show the values of SMI or TMI, respectively. Rather, the figures show the percentage of the spectrally or temporally modulated cells, respectively. At each sound level, the cells were categorized into two main types. The spectrally modulated cells are those responsive to pure tones or unmodulated noise, so they can detect the spectral features of the sound (old Figure 5D or new Figure 5E). The temporally modulated cells are those responsive to AM-noise, so they can detect the temporal features of the sound of complex spectra like the broadband noise (old Figure 5E or new Figure 5F). To clear this confusion, we removed the words SMI and TMI from the figures, and then we renamed the x-axis label into “% of spectrally modulated cells” and “% of temporally modulated cells” for Figures 5D (new 5E) and E (new 5F), respectively.

      (5) Figure 5 D, E: Is the decrease in SMI and TMI modulated cells in the modules a result of simply lower sensitivity to sounds (i.e. higher response thresholds)? If a cell responds to neither tone, AM, or noise it will have a low SMI and TMI index. If this is the case that affects the interpretation, as it is then not a decrease in sensitivity to spectral or temporal modulation, but instead a difference in overall sound sensitivity.

      Thank you for the reviewer’s comment. We apologize for the confusion about Figures 5E and D, which did not show the SMI and TMI values. Rather, they show the percentage of spectrally or temporally modulated cells, respectively, as explained in our previous response. Therefore, Figure 5D shows the percentage of cells that can detect the spectral features of sound, while Figure 5E shows the percentage of cells that can detect the temporal features of sounds of complex spectra like broadband noise. Accordingly, Figures 5D and E show the sensitivity to different features of sound and not the overall sound sensitivity.

      (6) Figure 7 and 8: What is the false positive rate expected of the responsive cells using the correlation cell flagging criteria? Especially given that the fraction of cells responsive to somatosensory stimulation in LC (matrix) is 0.88% and 1.3% in DC, it is important to know what the expected false positive rate is in order to be able to state that there are actually somatosensory responses there or if this is what you would expect from false positives given the inclusion test used. Please provide an estimate of the false positive rate given your inclusion test and show that the rate found is statistically significantly above that level - and show this rate with a line in Figure 7 H, I.

      Thank you for the reviewer’s comment. To test the efficiency of the correlation method to determine the responsive cells, we initially ran an ROC curve comparing the automated method to a blinded human interpretation. The AUC of the ROC curve was 0.88. This high AUC value indicates that the correlation method can rank the random responsive cells than the random nonresponsive cells. At the correlation coefficient (0.4), which was the cutoff value to determine the responsive cells for somatosensory stimulation, the specificity was 87% and the sensitivity 72%, the positive predictive value was 73%, and the negative predictive value was 86%. Although the above percentages indicate the efficiency of the correlation method, we excluded all the false responsive cells from the analysis. Therefore, the fractions of cells in the graphs are the true responsive cells with no contamination of the non-responsive cells. We also modified Figures 7H and I to match the other data sets obtained from awake animals. Therefore, Figures 7H and I no longer show the average of the responsive cells. Instead, they show the % of different fractions of responsive cells within each cellular motif (modules and matrix). Accordingly, we believe that there is no need to include a rate line on the graph. We added the section describing the validation part to the methods section [lines 808-815].

      (7) Figure 7: Please clarify what is meant by a cell responding to 'both responding to somatosensory and auditory stimulation'. Does it mean that the cell has responses to both auditory and somatosensory stimulation when presented individually or if it responds to both presented together? If it is the former, I don't understand how the number to both can be higher than the number of somatosensory alone (as both requires it also to respond to somatosensory alone). If it is the latter (combined auditory and somatosensory) then it seems that somatosensory inputs remove the responsiveness of most cells that were otherwise responsive to auditory alone (e.g. in the module while 42% respond to sound alone, combined stimulation would leave only 10% of cells responsive). Please clarify what exactly the authors are plotting and stating here.

      Thank you for the reviewer’s comment. The responsive cells in Figure 7 are divided into three categories. Each category has a completely different group of cells. The first category is for the cells responding only to auditory stimulation (auditory-selective cells or Aud-sel). The second category is for the cells that respond only to somatosensory stimulation (somatosensory selective cells or Som-sel). The third category is for the cells that respond to both auditory and somatosensory stimulations when both stimulations are presented individually (auditory/somatosensory nonselective cells or Aud/Som-nonsel). Accordingly, the number of cells may be different across all these categories. We have clarified this part in the text [lines 299-303]. We have modified Figures 7, 8, and 11 (old Figure 10) to match the data from anesthetized and awake animals, so Figures 7H and I now show the collective % of the cells from all animals within modules vs matrix.

      (8) Why are the inferential statistics used in Figure 9F (chi-square test) and Figure 5A-C (t-test) when it tests the same thing (the only difference is one is anaesthetised data and the other awake)? Indeed, all Figure 9 and 10 (awake data figures) plots use chi-square tests to test differences in percentages instead of t-tests used in earlier (anaesthetised data figures) plots to test differences in percentages between groups. Please clarify the reason for this change in statistics used for similar comparisons.

      Thank you for the reviewer’s comment. Imaging the LC via microprism from awake animals confirmed the ability to run this technique with no interference to the ambulatory functions of the animals. Therefore, the main goal was to highlight the similarities between the data obtained from awake and anesthetized setups by highlighting the comparison between the LC and DC or between modules and matrix within each preparation (anesthetized vs awake). Accordingly, the statistics used to run these comparisons were chosen based on the number of the tested animals at each setup (7 anesthetized animals and 3 awake animals for prism insertion). The low number of animals used for awake data made us use the number of cells collectively from all animals instead of the number of animals, so we used the Chi-square test to examine the differences in percentages.

      (9) Figure 10H: The main text describes the results shown here as similar to what was seen in anaesthetised animals. But it looks to me like the results in awake animals are qualitatively different from the multisensory interaction seen in anaesthetised animals. In anaesthetised animals the authors find that there is a higher chance of auditory responses being enhanced by somatosensory inputs when cells are in the modules compared to in the matrix. However, in awake data, this relationship is flipped, with more bimodal enhancement found in the matrix compared to the modules. Furthermore, almost all cells in the modules are suppressed by combined somatosensory input which looks like it is different from what is found in anaesthestised mice and what is described in the discussion: 'we observed that combined auditory-somatosensory stimulation generally suppressed neural responses to auditory stimuli and that this suppression was most prominent in the LC matrix'.

      Thank you for the reviewer’s comment. Our statement was meant to show how the data obtained from awake and anesthetized animals were generally similar. However, we agree that the statement may not be suitable due to the possible differences between awake and anesthetized animals. To address a fair comparison between the anesthetized and awake preparations, we ran similar statistics and graphs for Figures 7, 8, and 11 (old Figure 10). Given that the areas occupied by modules and matrix are different across animals due to the irregular shape of the modules, we chose to run a chi-square test for all the data to quantify the collective % of responding cells within modules vs matrix from all tested animals for each experimental setup (anesthetized vs awake). The anesthetized and awake animals similarly showed that modules and matrix had higher fractions of auditory responsive cells. However, matrix had more cells responding to auditory stimulations than modules, while modules had more cells responding to somatosensory stimulation than matrix. In contrast, while the anesthetized animals showed higher fractions of offset auditory-responsive cells, which were mostly clustered in the modules, the offset auditory-responsive cells were very rare in awake animals (6 cells/one animal).

      Based on the fractions of cells with suppressed or enhanced auditory response induced by bimodal stimulation, the data obtained from anesthetized and awake animals showed that the auditory response in the matrix was suppressed more than enhanced by bimodal stimulation. In contrast, modules had different profiles across the experimental setups and locations. For instance, the modules imaged via microprism in the anesthetized and awake animals showed suppressed more than enhanced auditory responses, but modules imaged from the dorsal surface in anesthetized animals showed enhanced more than suppressed auditory responses. Additionally, modules had less suppressed and more enhanced auditory responses compared to matrix in the anesthetized animals regardless of the location of the modules (microprism or dorsal surface). Yet, modules from awake animals had more suppressed and less enhanced auditory responses compared to matrix. We have addressed these differences in the results and discussion section.

      Additional minor comments that I think the authors could use to aid their manuscript clarity:

      (1) The figure colour selection - especially in Figures 7 and 8 - is really hard to tell apart. Please choose more distinct colours, and a colour scheme that is appropriate for colour blind readers.

      Thank you for the reviewer’s suggestion. We have noticed that the magenta color assigned for the cells with offset responses was very difficult to distinguish from the black background. We have changed the magenta color to green to be different from the color of other cells. Using Photoshop, we chose a color scheme that is suitable for color-blind readers in all our maps.

      (2) The sentence in lines 331-334 should be rephrased for clarity.

      Thank you for the reviewer’s suggestion. We have rephrased the statement for clarity [lines 364-371].

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in the public review the strong clustering evident in some of the maps (some of which may be related to module/matrix differences but certainly not all of it) seems worth scrutinizing further. Would we expect such a strong spatial segregation of auditory responsive and non-responsive neurons? Would we expect response properties (e.g. off-responsiveness) other than frequency tuning to show evidence of a topographic arrangement in the IC? In addressing this it would, of course, be important to rule out that this clustering is not down to some trivial experimental variables and truly reflects functional organization. For instance, are the patches of non-responsive neurons found in parts of the field of view with poor visibility, poor labelling, etc which may explain why it is difficult to pick up responses there? Are the neurons in non-responsive areas otherwise active (i.e. do they show spontaneous activity) or could they be 'dead'? Could the way neuropil signals are dealt with play a role here (it is weighted by 0.4 which strikes me as quite low)? In relation to this, I am also wondering to what extent the extreme overrepresentation (Figure 4) of neurons with a BF of 5kHz (some of this is, of course, down to the fact that the lower end of the frequency range was 5kHz and that the step size was 0.5 octaves), especially in the DC, is to be interpreted.

      Thank you for the reviewer’s comment. Before analysis, the ROIs of all cells were set around the cell bodies using the jRGECO1a signals as a reference, so all cells (responsive and nonresponsive) were collected from areas of good visibility of jRGECO1a signals. In other words, no cells were collected from regions having poor jRGECO1a signals. In Figures 7, 8, and 11 (old Figure 10), the cells showed response either only to unmodulated broadband noise at 80 dB as an auditory stimulus or to whisker deflection with specific speed and power as a somatosensory stimulus. Given that the two stimuli above had specific parameters, the remaining non-responsive cells may respond to auditory or somatosensory stimulations with other features. For instance, some nonresponsive cells to the unmodulated broadband noise were responding to pure tones with different amplitudes and frequencies or to different AM-noise with different amplitudes and modulation frequencies.  Also, these nonresponsive cells may not respond to any of our tested stimuli and may respond to other sensory stimulations. Some of the non-responsive cells showed spontaneous activity when no stimulations were presented. However, we can not rule out the possibility that some of these nonresponsive cells may not be viable. We have addressed the clustering properties in the revised version of the manuscript in the corresponding spots of the results and discussion sections. We have added a new supplementary figure (Figure 11- Figure Supplement 1) to show how the nonresponsive cells to the unmodulated noise may respond to other types of sound and to show the spontaneous activity of some non-responsive cells.

      For the neuropil, previous reports used the contamination factor (r) in a range of 0.3-0.7 (we referenced these studies in the method section [line 776) based on the tissue or cells imaged, vasculatures, and the objective used for imaging. Therefore, we optimized the contamination factor (r) to be 0.4 through a preliminary analysis based on the tissue we image (LC), and the objective used (16x with NA = 0.8 and 3 mm as a working distance).

      We agree that there is an overrepresentation of 5 kHz as the best tuning frequency for DC cells. The previous report (A. B. Wong & Borst, 2019) showed a large zone of the DC where cells were tuned to (2-8 kHz). Given that 5kHz was the lowest tested frequency in our experiment, we think that the low-frequency bias of the DC surface is consistent between studies. This finding also could be supported by the electrophysiology data obtained by spanning the recording electrodes through the IC tissue along the dorsoventral axis. In those experiments, the cells were tuned to lower frequencies at the dorsal surface of the IC.

      We have changed the magenta-colored cells to green ones, so it will be easier to identify the cells. As required by another reviewer, we changed the color pallets of some images and cellular maps to be suitable for color-blind readers. 

      The manuscript would benefit from more precise language in a number of places, especially in the results section.

      Line 220/221, for instance: "... a significant fraction of cells that did not respond to pure tones did respond to AM-noise" Strictly speaking, this sentence suggests that you considered here only the subset of neurons that did not respond to pure tones and then ran a test on that subset. The test that was done seems to suggest though that the authors tested whether the percentage of responsive cells was greater for pure tones or for AM noise.

      Thank you for the reviewer’s comment. We do apologize for the confusion. In the revised manuscript, we categorized the cells according to their response into cells responding to pure tone only (tone-selective cells or Tone-sel), Am-noise only (noise-selective cells or Nose-sel), and to both pure tone and am-noise (nonselective cells or Non-sel). We have modified Figure 5 accordingly. We did the same thing for the data obtained from awake animals and showed that in a new figure to easily match the analysis done for the anesthetized animals.

      Please refer to the figure panels in the text in consecutive order. 2B, for instance, is mentioned after 2H.

      Thank you for the reviewer’s comment. Throughout the paper, we kept the consecutive order of the figure panels within each figure to be in a smooth flow with the text. Yet, figure 2 was just the only exception for a good reason. Figure 2 is a complex one that includes many panels to show a parallel comparison between LC imaged via microprism and DC through single photon images, two-photon images, validating laser lesioning, and histology. Accordingly, we navigated many panels of the figure to efficiently highlight the aspects of this comparison. We prefer to keep Figure 2 as one figure with its current format to show this parallel comparison between LC and DC.

      The legend for Figure 2 could be clearer. For instance, there are two descriptions for panel D. Line 1009: "(C-E)" [i.e. C, D, E] and line 1010: "(D and F)".

      Thank you for the reviewer’s comment. It should be C and E, not C-E. We have fixed the mistake [line 1224]

      Line 275: What does 'with no preference' mean?

      Thank you for the reviewer’s comment. We do apologize for the confusion. There are three categories of cells. Some cells respond only to auditory stimulation, while others respond to only somatosensory stimulation. However, there is another group of cells that respond nonselectively to auditory and somatosensory stimulations or Aud/Som-nonsel cells. We edited the sentence to be clearer [lines 303-304].

      Line 281 (and other places): What does 'normalized against modules' mean?

      Thank you for the reviewer’s comment. This normalization was done by dividing the number of responsive cells of the same response type in the matrix by that in the modules. Therefore, the value taken by modules was always 1 and the value taken by the matrix is something around 1. Accordingly, the value for matrix could be > 1 if matrix had more cells than modules. In contrast, the value of matrix would be < 1 if matrix had fewer cells than modules. In the revised version, we used this normalization method to make the revised Figures 5C and 10C to describe the cell fractions responding to pure tone only, AM-noise only, or to both stimuli in the matrix vs modules. 

      Sentence starting on line 288. I don't find that point to be as obvious from the figures as the sentences seem to suggest. Are we to compare magenta points (auditory off cells) from 7C with green points in 7F?

      Thank you for the reviewer’s comment. We came to this conclusion based on our visual comparison of magenta points (now green in the revised version to increase the visibility) representing the auditory offset cells in Figure 7C and the green points in Figure 7F representing the cells responding to both somatosensory and auditory stimulations. In the revised manuscript, we statistically examined if the percentage of onset auditory response and offset auditory responses are different within the responsive cells to both somatosensory and auditory stimulations in the modules vs matrix. We have found that most of the cells responding to both somatosensory and auditory stimulations inside the modules had offset auditory responses, which could indicate a level of multisensory integration between somatosensory input and the offset auditory responses in these cells. We have added the statistical results to the revised manuscript to address this effect [lines 312-317]

      Lines 300-302: "These data suggest that the module/matrix system permits preservation of distinct multimodal response properties in the face of massive integration of inputs in the LC". First, I'm not quite sure what that sentence means. Second, it would be more appropriate for the discussion. Third, the fact that we are more likely to find response enhancement in the modules than in the matrix is nicely consistent with the idea (supported by work from the senior author's lab and others) that excitatory somatosensory input predominantly targets neurons in the modules (which is why we see mostly response enhancement in the modules) and that this input targets GABAergic neurons which then project to and inhibit neurons both outside and inside of their module. Therefore, I would recommend that the authors replace the aforementioned sentence with one that interprets these results in light of what we know about this somatosensory-auditory circuitry.

      Thank you for the reviewer’s comment. Despite the massive multimodal inputs, the LC receives from auditory vs nonauditory regions, the module/matrix system is a platform for distinct multimodal responses indicated by more somatosensory responsive cells in modules versus more auditory responsive cells in matrix, which matches the anatomical differences that were reported before. We edited the sentence in the light of the comparison between the data obtained from awake and anesthetized animals and moved it to the discussion section [lines 503-506].

      The term 'LC imaged via microprism' is used dozens of times throughout the manuscript. Replacing it with a suitable acronym or initialism could improve the flow of the text and would make some of the sentences less cumbersome.

      Thank you for the reviewer’s suggestion. We changed the term “LC imaged via microprism” into LC (microprism) throughout the revised manuscript.

      5A-C: It is unclear what is being compared here. What are the Ns? Different animals?

      Thank you for the reviewer’s comment. We do apologize for this missing information. We have added the number of subjects used in every statistical test in each corresponding figure legend.

      5G: minus symbol missing on the y-axis.

      Thank you for the reviewer’s comment. We gladly have fixed that.

      Figure 6: Are these examples or population averages?

      Thank you for the reviewer’s question. Every figure panel of the old Figure 6 represents a single trace of an example cell. However, we modified Figure 6 to include more examples of cells showing different responses complying with another reviewer’s suggestion. Each panel of the new Figure 6 represents the average response of 5 stimulations of the corresponding stimulus type. We preferred to show the average signal because it was the one used for the subsequent analysis.

      How are module borders defined?

      Thank you for the reviewer’s question. The modules were defined based on the intensity of the green channel that shows the expression of the GFP signals. The boundaries of modules were determined according to the distinction between high and low GFP signal boundaries of the modules. This step was done before data analysis to avoid any bias.

      7JKL: How are these to be interpreted? Does panel 7K, for instance, indicate that the fraction of neurons showing 'on' responses was roughly twice as large in the matrix than in the modules and that the fraction of neurons showing 'off' responses was roughly 10 times larger in the modules than in the matrix (the mean seems to be at about 1/10).

      Thank you for the reviewer’s comment. The data represented by Figures 7J-L defined the normalization of the number of cells of the same response type in the matrix against the modules. This normalization was done per animal, and then the data of the matrix were plotted against the normalization line at 1 representing the modules. The matrix will be claimed to have more cells than modules if the median of the matrix values > 1. In contrast, the matrix will be claimed to have fewer cells than the modules if the median of the matrix values < 1. Finally, if the median of matrix values = 1, this means there is no difference between matrix and modules. However, to match the data obtained from anesthetized animals (Figures 7 and 8) with those obtained from awake animals (Figure 11 or old Figure 10), we ran all data through the Chi-square test in the revised manuscript. Therefore, the format of Figures 7K-L was changed in the revised manuscript, so they became new Figures 7I-K.

      10A suggests that significantly more than half the neurons shown here are not auditory responsive. Perhaps I am misinterpreting something here but isn't that in contrast to what is shown in panel 9F?

      Thank you for the reviewer’s comment. The data shown in Figure 10A (or revised Figure 11A) represents the cellular response to only one stimulus (broadband noise at 80 dB with no modulation frequency), while Figure 9F (revised 10B) represents the cells responding to varieties of auditory stimulations of different combinations of frequencies and amplitudes (pure tones) as well as to AM-noise of different amplitudes and modulation frequencies. Accordingly, the old Figure 9F or revised Figure 10B shows different cell types based on their responses. For instance, some cells respond only to pure tone. Others respond only to AM-noise or to both pure tones and AM-noise. This may also support that the nonresponsive cells in Figure 10A (revised 11A) can respond to other types of sound features.

      The way I understood panels 7L and 8K there were more suppressed neurons in the matrix than in the modules (line 296: "cells in the modules had a higher odds of having an enhancement response to bimodal stimulation than matrix, while cells in the matrix had a higher odds of having a suppressive response to bimodal stimulation"). Now, panel 10F indicates that in awake mice there is a greater proportion of suppressed neurons in the modules than in the matrix. I may very well have overlooked or misread something but I may not be the only reader confused by this so please clarify.

      Thank you for the reviewer’s comment. We do apologize for this confusion. The ambiguity between Figures 7 and 8 (anesthetized animals) as well as Figure 10 (awake animals) comes from the fact that different statistics have been used for each preparation. In the revised version, we have fixed that by running the same statistics for all the data, and we accordingly revised Figures 7, 8, and 10 (new Figure 11). In brief, the matrix preserves a higher percentage of cells with suppressed auditory responses than those with enhanced auditory responses induced by bimodal stimulation in all conditions (anesthetized vs awake). In contrast, modules act differently across all tested conditions. While modules had more cells with enhanced auditory responses induced by bimodal interaction in anesthetized animals, they had more cells with suppressed response in awake animals indicating that modules could be sensitive to the effect of anesthesia compared to matrix. We addressed this effect in the discussion of the revised manuscript [lines 521-553].

      Line 438: ...as early AS...

      Thank you for the reviewer’s comment. We gladly fixed that [line 512].  

      Reviewer #3 (Recommendations For The Authors):

      My minor recommendations for the authors are as follows:

      (1) The text can be a bit difficult to follow in places. This is partly due to the convoluted nature of the results, but I suggest a careful read-through to look for opportunities to improve the prose. In particular, there is a tendency to use long sentences and long paragraphs. For example, the third paragraph of the introduction runs for almost fifty lines.

      Thank you for the reviewer’s comment and suggestion. We have fixed that.

      (2) This might be due to journal compression, but some of the bar graphs in the figures are difficult to read. For example, the individual data points, especially when filled with striped background colors get lost. Axes can become invisible, like the y-axis in 7L, and portions of bars, like in 7F, are not always rendered correctly. Error bars are sometimes hidden behind data points, as in 5C. Increasing line thickness and shifting individual data points away from error bars might help with this.

      Thank you for the reviewer’s comment and suggestion. We made all the data points with black color and filled circles to make the data points visible. We put all the data points behind the main columns, so they don’t block the error bars. We have fixed figures 7 and 5.

      (3) Throughout the manuscript, the authors use a higher SMI to indicate a preference of cells for auditory stimuli with "greater spectral... complexity" (e.g., lines 219 and 372). I find this interpretation a bit challenging since SMI compares a neuron's preference for tones over noise, and to me, tones seem like the least spectrally complex of all auditory stimuli. Perhaps some clarification of what the authors mean by this would help. For example, is the assumption that a neuron that prefers tones over noise is, either directly or indirectly, receiving input sculpted by inhibitory processes?

      Thank you for the reviewer’s comment. In general, higher SMI values indicate an increase in the preference of the cells to respond to pure tones than noise with no modulation (less spectral complexity). We will clarify this statement throughout the manuscript. However, the SMI value was not mentioned in lines 219 and 372. The statement mentioned in line 219 describes the revised figure 5C (old 5B), where more cells in matrix specifically respond to AM-noise compared to modules, which indicates the preference of the matrix to respond to sounds of greater spectral and temporal complexity. The statement in 372 in the discussion section refers to the finding in revised figures 5E and F (old 5D and E). In the revised figure 5E or old 5D, the data show that matrix has more cells responding to pure tones or noise with no modulation than modules, so matrix has a lower threshold to detect the spectral features of sound (revised figure 5E or old 5D). In the revised figure 5F or old 5E, the data show that matrix has more cells responding to AM-noise than modules, which indicates that matrix functions more to process the temporal features of sound. As explained above, all findings were related to the percentage of cells responding to specific sound stimuli and not the exact SMI values. We have revised the figures accordingly by removing the terms SMI and TMI from the figures, and we have clarified that in the text.

      (4) Lines 250-253: How does a decrease in SMI correspond to "an increase in pure tone responsiveness?" Doesn't a decrease suggest the opposite?

      Thank you for the reviewer’s comment, which we agree with. We do apologize for that. We have fixed this statement [lines 275-277] and any related findings accordingly.

      (5) Line 304: Add "imaged via microprism" or similar after "response profiles with the LC.".

      Thank you for the reviewer’s suggestion. We have fixed that. However, we changed the term “LC imaged via microprism” into “LC(microprism)” for simplicity as suggested by another reviewer [line 330].

      (6) Figure 5A and C: Both plots show that more neurons responded to AM-noise than tones, but it would be interesting to know how much the tone-responsive and AM-noise responsive populations overlapped. Were all tone-responsive neurons also responsive to AM-noise?

      Thank you for the reviewer’s comment. We have categorized the cells based on their response to pure tone only, AM-only, and both pure tone and AM-noise when each stimulus is presented individually. We have modified Figures 5A and C, and they are now Figures 5B and D.

      (7) Figure 5G: Missing negative sign before "0.5.".

      Thank you for the reviewer’s suggestion. We gladly have fixed that. However, old Figure 5G became a revised Figure 5H.  

      (8) Figure 7 legend, Line 1102: Missing period after "(C and E)".

      Thank you for the reviewer’s suggestion. We think that the period should be placed before (C and E) at the end of “respectively”. The parentheses refer to the statements after them. We gladly fixed that. [line 1394]

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This study presents valuable findings as it shows that sleep rhythm formation and memory capabilities depend on a balanced and rich diet in fly larvae. The evidence supporting the claims of the authors is convincing with rigorous behavioral assays and state-of-the-art genetic manipulations. The work will be of interest to researchers working on sleep and memory. 

      Public Reviews: 

      Summary: 

      This manuscript investigates how energetic demands affect the sleep-wake cycle in Drosophila larvae. L2 stage larvae do not show sleep rhythm and long-term memory (LTM), however, L3 larvae do. The authors manipulate food content to provide insufficient nutrition, which leads to more feeding, no LTM, and no sleep even in older larvae. Similarly, activation of NPF neurons suppresses sleep rhythm. Furthermore, they try to induce a sleep-like state using pharmacology or genetic manipulations in L2 larvae, which can mimic some of the L3 behaviours. A key experimental finding is that activation of DN1a neurons activate the downstream DH44 neurons, as assayed by GCaMP calcium imaging. This occurs only in third instar and not in second instar, in keeping with the development of sleep-wake and feeding separation. The authors also show that glucose metabolic genes are required in Dh44 neurons to develop sleep rhythm and that DH44 neurons respond differently in malnutrition or younger larvae. 

      Strengths: 

      Previous studies from the same lab have shown the sleep is required for LTM formation in the larvae, and that this requires DN1a and DH44 neurons. The current work builds upon this observation and addresses in more detail when and how this might develop. The authors can show that low quality food exposure and enhanced feeding during larval stage of Drosophila affects the formation of sleep rhythm and long-term memory. This suggests that the development of sleep and LTM are only possible under well fed and balanced nutrition in fly larvae. Non-sleep larvae were fed in low sugar conditions and indeed, the authors also find glucose metabolic genes to be required for a proper sleep rhythm. The paper presents precise genetic manipulations of individual classes of neurons in fly larvae followed by careful behavioural analysis. The authors also combine thermogenetic or peptide bath application experiments with direct calcium imaging of specific neurons. 

      Weaknesses: 

      The authors tried to induce sleep in younger L2 larvae, however the behavioral results suggest that they were not able to induce proper sleep behaviour as in normal L3 larvae. Thus, they cannot show that sleep during L2 stage would be sufficient to form LTM. 

      We agree that the experiments with Gaboxadol feeding in L2 did not perfectly mimic L3 sleep behaviors. However, genetic induction of sleep in L2 was effective in increasing sleep duration and depth similar to that observed in normal L3. As noted below in response to specific reviewer comments, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the gaboxadol manipulation did cause a significant decrease in arousal threshold compared to control larvae. Together these approaches support the hypothesis that sleeping more/more deeply is not sufficient to promote LTM in L2.

      The authors suggest that larval Dh44 neurons may integrate "information about the nutritional environment through the direct sensing of glucose levels to modulate sleep-wake rhythm development". They identify glucose metabolism genes (e.g., Glut1) in the downstream DH44 neurons as being required for the organization of the sleep-wake-feeding rhythm, and that CCHa signaling in DN1a signaling to the DH44 cells via the receptor. However, how this is connected is not well explained. Do the authors think that the nutrient sensing is only occurring in the DH44 neurons and not in DN1a or other neurons? Would not knocking down glucose metabolism in any neuron lead to a functional defect? What is the evidence that Dh44 neurons are specific sensors of nutritional state? For example, do the authors think that e.g. the overexpression of Glut1 in Dh44 neurons, a manipulation that can increase transport of glucose into cells, would rescue the effects of low-sugar food? 

      We thank the reviewer for these suggestions and have added the experiment proposed. We found that knockdown of Hex-C in DN1a neurons did not disrupt sleep-wake rhythms (Fig. S4G-I) suggesting that Dh44 neurons are specialized in requiring glucose metabolism to drive sleep-wake rhythms. We have also added further clarification in the text regarding the existing evidence that Dh44 neurons act has nutrient sensors.

      Some of the genetic controls seem to be inconsistent suggesting some genetic background effects. In Figure 2B, npf-gal4 flies without the UAS show no significant circadian change in sleep duration, whereas UAS-TrpA flies do. The genetic control data in Figure 2D are also inconsistent. Npf-Gal4 seems to have some effect by itself without the UAS. The same is not seen with R76G11-Gal4. Suppl Fig 2: Naïve OCT and AM preference in L3 expressing various combinations of the transgenes show significant differences. npf-Gal4 alone seems to influence preference. 

      The sleep duration and bout number/length data are highly variable. 

      All experiments are performed in isogenized background so variability seen in genetic controls likely reflects stochastic nature of behavioral experiments. Indeed, adult sleep data also shows a great deal of variability within the same genetic background (PMID: 29228366). We agree it is an important point, and we attempt to minimize variability as much as possible with backcrossing of flies and tight control of environmental conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Low sugar exposure and activation of NPF neurons might not induce the same behavioral changes. LS exposure does not enhance mouth hook movements, but overall food intake. NPF activation seems to enhance mouth hook movements, but the data for food intake is not shown. This information would be necessary to compare the two different manipulations. 

      We thank the reviewer for this suggestion. However, we elected not to perform food intake experiments with the NPF activation experiments. Since we are not directly comparing the low sugar and NPF manipulations to each other, we think that both experiments together support the conclusion that immature food acquisition strategies (whether food intake or feeding rate) limit LTM performance. 

      The authors write that the larval feeding assays run for 4 hours, can they explain why that long? Larvae should already have processed food within 4 hours, so that the measurement would not include all eaten food.

      We clarified the rationale for doing 4 hour feeding assays in the results section. We did 4 hours on blue dyed food because initial experiments of 1 hour with control L3 at CT1-4 were difficult to interpret. The measurement does not include all of the eaten food in the 4 hours but does reflect more long-term changes in food intake.

      Sleep induction with Gaboxadol seems to not really work - sleep duration, bout number and length are not enhanced, and arousal threshold is only slightly lower. Thus, the authors should not use this data as an example for inducing sleep behaviour. 

      We agree this approach did not have a large effect in larvae. However, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the Gaboxadol manipulation did cause a mild (but significant) decrease in arousal threshold compared to control larvae. Gaboxadol feeding also caused a significant decrease in total body weight compared to control larvae indicating that even slightly deeper sleep could be detrimental to younger animals.

      Activation of R76G11 with TrpA1 seems to work better for inducing sleep like behaviour. However, the authors describe that they permanently activated neurons. To induce a "normal" sleep pattern, the authors might try to only activate these neurons during the normal enhanced sleep time in L3 (CT13?) and not during the whole day. This might also allow larvae to eat during day time and gain more weight. 

      We apologize that this point was not clearer, but we did do acute activation of R76G11(+) neurons, as proposed by the reviewer. We have clarified the text to make this point.

      It would be interesting to see how larvae fed with high sucrose and low protein diet would behave in this assay. Do the authors suggest that sugar is most important for the development of sleep behaviour or that it is a combination of sugar and protein that might be required? 

      We agree that feeding larvae a high sucrose and low protein diet would be interesting. However, we initially tried a low protein diet and observed significant developmental delays. Therefore, we are concerned that developmental defects on a high sucrose and low protein diet would confound behavioral results. Additionally, the Dh44 manipulations (glucose & GCN2 signaling) suggest that sugar is the most important for the development of sleep behaviors.

      Reviewer #3 (Recommendations For The Authors): 

      The authors could discuss if the interaction between DN1a clock neurons and Dh44 neurons is mediated synaptic or by volume transmission following the extracellular release of the CCHa1 neuropeptide. They write that "the development of Dh44 neuronal competency to receive clock-driven cues" and that "DN1a clock neurons anatomically and functionally connect to Dh44" but a discussion about volume vs. synaptic signalling would be of interest. 

      We thank the reviewer for this suggestion. We revised the discussion to address this point.

      line 223 " demonstrating that post-synaptic processes likely". It would be interesting to read a discussion on whether it is known if these are postsynaptic or peptide-mediated volume effects? 

      We added additional text to the discussion to address these points.

      - The authors may want to include a schematic of the circuit and how its position in the general anatomy of the fly larva. 

      We thank the reviewer for this suggestion. We have added a model figure to Fig. S6.

      "Dh44 neurons act through glucose metabolic genes" - consider rewording e.g. require glucose metabolic genes 

      We revised the text.

      - line 45 "Early in development, young animals must obtain enough nutrients to ensure proper growth" - this is too general, many animals do not feed in early life-cycle stages (e.g. lecitotrophic development), consider rewording 

      We revised the text to be more specific.

      - line 90 "however, L3 at CT1 consume more than L3 at CT12 (Figure S1A)" - typo CT13, also consider rewording to match the structure of the sentence before 'however, L3 consumed more at CT1 than at CT13' 

      We revised the text to fix this error.

      - Line 111 "and loss of deep sleep" - how is deep sleep defined and measured in the larvae? It is not clear from the data or the text. 

      We revised the text to define deep sleep in the results section. We also have a description of how arousal threshold is calculated in the methods.

      - In Figure 3B and G the individual data points are not shown 

      We did not show individual data points for those graphs because we are plotting the average percentage of 4 biological replicates.

      Typo: 

      Figure 1 legend "F, n= n=100-172 " 

      We revised the text to fix this typo.

    1. Reviewer #2 (Public Review):

      This manuscript is motivated by the question of what mechanisms cause overyielding in mixed-species communities relative to the corresponding monocultures. This is an important and timely question, given that the ultimate biological reasons for such biodiversity effects are not fully understood.

      As a starting point, the authors discuss the so-called "additive partitioning" (AP) method proposed by Loreau & Hector in 2001. The AP is the result of a mathematical rearrangement of the definition of overyielding, written in terms of relative yields (RY) of species in mixtures relative to monocultures. One term, the so-called complementarity effect (CE), is proportional to the average RY deviations from the null expectations that plants of both species "do the same" in monocultures and mixtures. The other term, the selection effect (SE), captures how these RY deviations are related to monoculture productivity. Overall, CE measures whether relative biomass gains differ from zero when averaged across all community members, and SE, whether the "relative advantage" species have in the mixture, is related to their productivity. In extreme cases, when all species benefit, CE becomes positive. When large species have large relative productivity increases, SE becomes positive. This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0).

      However, it is very important to understand that CE and SE capture the "statistical structure" of RY that underlies overyielding. Specifically, CE and SE are not the ultimate biological mechanisms that drive overyielding, and never were meant to be. CE also does not describe niche complementarity. Interpreting CE and SE as directly quantifying niche complementarity or resource competition, is simply wrong, although it sometimes is done. The criticism of the AP method thus in large part seems unwarranted. The alternative methods the authors discuss (lines 108-123) are based on very similar principles.

      The authors now set out to develop a method that aims at linking response patterns to "more true" biological mechanisms.

      Assuming that "competitive dominance" is key to understanding mixture productivity, because "competitive interactions are the predominant type of interspecific relationships in plants", the authors introduce "partial density" monocultures, i.e. monocultures that have the same planting density for a species as in a mixture. The idea is that using these partial density monocultures as a reference would allow for isolating the effect of competition by the surrounding "species matrix".

      The authors argue that "To separate effects of competitive interactions from those of other species interactions, we would need the hypothesis that constituent species share an identical niche but differ in growth and competitive ability (i.e., absence of positive/negative interactions)." - I think the term interaction is not correctly used here, because clearly competition is an interaction, but the point made here is that this would be a zero-sum game.

      The authors use the ratio of productivity of partial density and full-density monocultures, divided by planting density, as a measure of "competitive growth response" (abbreviated as MG). This is the extra growth a plant individual produces when intraspecific competition is reduced.

      Here, I see two issues: first, this rests on the assumption that there is only "one mode" of competition if two species use the same resources, which may not be true, because intraspecific and interspecific competition may differ. Of course, one can argue that then somehow "niches" are different, but such a niche definition would be very broad and go beyond the "resource set" perspective the authors adopt. Second, this value will heavily depend on timing and the relationship between maximum initial growth rates and competitive abilities at high stand densities.

      The authors then progress to define relative competitive ability (RC), and this time simply uses monoculture biomass as a measure of competitive ability. To express this biomass in a standardized way, they express it as different from the mean of the other species and then divide by the maximum monoculture biomass of all species.

      I have two concerns here: first, if competitive ability is the capability of a species to preempt resources from a pool also accessed by another species, as the authors argued before, then this seems wrong because one would expect that a species can simply be more productive because it has a broader niche space that it exploits. This contradicts the very narrow perspective on competitive ability the authors have adopted. This also is difficult to reconcile with the idea that specialist species with a narrow niche would outcompete generalist species with a broad niche. Second, I am concerned by the mathematical form. Standardizing by the maximum makes the scaling dependent on a single value.

      As a final step, the authors calculate a "competitive expectation" for a species' biomass in the mixture, by scaling deviations from the expected yield by the product MG ⨯ RC. This would mean a species does better in a mixture when (1) it benefits most from a conspecific density reduction, and (2) has a relatively high biomass.

      Put simply, the assumption would be that if a species is productive in monoculture (high RC), it effectively does not "see" the competitors and then grows like it would be the sole species in the community, i.e. like in the partial density monoculture.

      Overall, I am not very convinced by the proposed method.

      (1) The proposed method seems not very systematic but rather "ad hoc". It also is much less a partitioning method than the AP method because the other term is simply the difference. It would be good if the authors investigated the mathematical form of this remainder and explored its properties.. when does complementarity occur? Would it capture complementarity and facilitation?

      (2) The justification for the calculation of MG and RC does not seem to follow the very strict assumptions of what competition (in the absence of complementarity) is. See my specific comments above.

      (3) Overall, the manuscript is hard to read. This is in part a problem of terminology and presentation, and it would be good to use more systematic terms for "response patterns" and "biological mechanisms".

      Examples:<br /> - on line 30, the authors write that CE is used to measure "positive" interactions and SE to measure "competitive interactions", and later name "positive" and "negative" interactions "mechanisms of species interactions". Here the authors first use "positive interaction" as any type of effect that results in a community-level biomass gain, but then they use "interaction" with reference to specific biological mechanisms (e.g. one species might attract a parasite that infests another species, which in turn may cause further changes that modify the growth of the first and other species).

      - on line 70, the authors state that "positive interaction" increases productivity relative to the null expectation, but it is clear that an interaction can have "negative" consequences for one interaction partner and "positive" ones for the other. Therefore, "positive" and "negative" interactions, when defined in this way, cannot be directly linked to "resource partitioning" and "facilitation", and "species interference" as the authors do. Also, these categories of mechanisms are still simple. For example, how do biotic interactions with enemies classify, see above?

      - line 145: "Under the null hypothesis, species in the mixture are assumed to be competitively equivalent (i.e., absence of interspecific interactions)". This is wrong. The assumption is that there are interspecific interactions, but that these are the same as the intraspecific ones. Weirdly, what follows is a description of the AP method, which does not belong here. This paragraph would better be moved to the introduction where the AP method is mentioned. Or omitted, since it is basically a repetition of the original Loreau & Hector paper.

      Other points:

      - line 66: community productivity, not ecosystem productivity.<br /> - line 68: community average responses are with respect to relative yields - this is important!<br /> - line 64: what are "species effects of species interactions" ?<br /> - line 90: here "competitive" and "productive" are mixed up, and it is important to state that "suffers more" refers to relative changes, not yield changes.<br /> - line 92: "positive effect of competitive dominance": I don't understand what is meant here.

    1. Reviewer #1 (Public Review):

      Summary:

      This paper uses a model of binge alcohol consumption in mice to examine how the behaviour and its control by a pathway between the anterior insular cortex (AIC) to the dorsolateral striatum (DLS) may differ between males and females. Photometry is used to measure the activity of AIC terminals in the DLS when animals are drinking and this activity seems to correspond to drink bouts in males but not females. The effects appear to be lateralized with inputs to the left DLS being of particular interest.

      Strengths:

      Increasing alcohol intake in females is of concern and the consequences for substance use disorder and brain health are not fully understood, so this is an area that needs further study. The attempt to link fine-grained drinking behaviour with neural activity has the potential to enrich our understanding of the neural basis of behaviour, beyond what can be gleaned from coarser measures of volumes consumed etc.

      Weaknesses:

      The introduction to the drinking in the dark (DID) paradigm is rather narrow in scope (starting line 47). This would be improved if the authors framed this in the context of other common intermittent access paradigms and gave due credit to important studies and authors that were responsible for the innovation in this area (particularly studies by Wise, 1973 and returned to popular use by Simms et al 2010 and related papers; e.g., Wise RA (1973). Voluntary ethanol intake in rats following exposure to ethanol on various schedules. Psychopharmacologia 29: 203-210; Simms, J., Bito-Onon, J., Chatterjee, S. et al. Long-Evans Rats Acquire Operant Self-Administration of 20% Ethanol Without Sucrose Fading. Neuropsychopharmacol 35, 1453-1463 (2010).) The original drinking in the dark demonstrations should also be referenced (Rhodes et al., 2005). Line 154 Theile & Navarro 2014 is a review and not the original demonstration.

      When sex differences in alcohol intake are described, more care should be taken to be clear about whether this is in terms of volume (e.g. ml) or blood alcohol levels (BAC, or at least g/kg as a proxy measure). This distinction was often lost when lick responses were being considered. If licking is similar (assuming a single lick from a male and female brings in a similar volume?), this might mean males and females consume similar volumes, but females due to their smaller size would become more intoxicated so the implications of these details need far closer consideration. What is described as identical in one measure, is not in another.

      No conclusions regarding the photometry results can be drawn based on the histology provided. Localization and quantification of viral expression are required at a minimum to verify the efficacy of the dual virus approach (the panel in Supplementary Figure 1 is very small and doesn't allow terminals to be seen, and there is no quantification). Whether these might differ by sex is also necessary before we can be confident about any sex differences in neural activity.

      While the authors have some previous data on the AIC to DLS pathway, there are many brain regions and pathways impacted by alcohol and so the focus on this one in particular was not strongly justified. Since photometry is really an observational method, it's important to note that no causal link between activity in the pathway and drinking has been established here.

      It would be helpful if the authors could further explain whether their modified lickometers actually measure individual licks. While in some systems contact with the tongue closes a circuit which is recorded, the interruption of a photobeam was used here. It's not clear to me whether the nose close to the spout would be sufficient to interrupt that beam, or whether a tongue protrusion is required. This detail is important for understanding how the photometry data is linked to behaviour. The temporal resolution of the GCaMP signal is likely not good enough to capture individual links but I think more caution or detail in the discussion of the correspondence of these events is required.

      Even if the pattern of drinking differs between males and females, the use of the word "strategy" implies a cognitive process that was never described or measured.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      #1) Summary: The transport of effector proteins across membranes from the producing bacterium into a target cell is at the core of bacterial secretion systems. How an additional layer in form of a capsule affects effector export and the susceptibility towards effector import is not fully understood. Here, Flaugnatti and colleagues combined bacterial genetics with phenotypic assays and electron microscopy to demonstrate a dual role of a bacterial capsule in preventing T6SS-mediated effector export and promoting protection from effector import by another bacterium's T6SS. The wide variety of methods used, complementation of the mutants, and validation of the findings across strains strengthen the author's conclusions. Although the main conclusions seem straight forward, the authors unravel the unexpected complexity underlying these phenotypes with strong mechanistic work. In brief, a capsule-deficient mutant (∆itra) is shown to assemble its T6SS similar to the WT, yet secretes more Hcp than the WT and is better in T6SS-mediated killing of other bacteria. A capsule-overproducing mutant (∆bfmS) shows both, a partial deficiency in T6SS assembly and an additional reduction in exported Hcp, and is worse in T6SS-mediated killing than the WT. A mutant with a capsule similar to WT and deficient in cell sensing (∆tslA) forms the least T6SS apparatuses and is yet better in T6SS-mediated killing than the overcapsulated mutant. Together, these data show an effect of the capsule on (i) T6SS apparatus assembly, (ii) effector export, (iii) effector import, and (iv) the need for clearance of accumulating non-secreted Hcp by ClpXP. The work on a clinical isolate of Acinetobacter tumefaciens and the data on an impaired T6SS activity on other cells by antibiotic-induced capsulation is a strong demonstration of the work's clinical relevance in addition to the findings' conceptual novelty.

      • In my view, the manuscript is outstanding with very high quality of experimental data, very well written text and very clear presentation of the data in figures. A few minor comments and suggestions below that I think would strengthen the manuscript.*

      __ Authors’ reply #1: __We thank the reviewer for their enthusiasm.

      • *

      Major comment:

      #2) OPTIONAL: Fig. 4c/l. 320: Having an indirect effect of an antibiotic on T6SS activity by antibiotic-induced capsule formation is very intriguing and contributes to the clinical relevance of the overall findings. When I saw the data in Fig. 4c, the graph instantaneously reminded me of the panel in Fig. 2a, where a similar phenotype is observed by changing the predator:prey ratio in the absence of any antibiotic. The authors themselves comment on the possibility of antibiotic-induced, reduced predator growth (and thereby a change in predator:prey ratio) as a one factor impacting the phenotype here. I am wondering if this data could be strengthened or better disentangled to test more precisely if it is the antibiotic induced capsule formation per se that affects T6SS-mediated killing by A. baumanii in the presence of antibiotics. Would it help to take the bfmS mutant along as a control for direct comparison to see if antibiotic-induced capsule formation of the WT to similar levels of the mutant results in the same killing phenotype? Would it help to test for T6SS-mediated killing in the presence and absence of antibiotics at multiple predator:prey ratios? Could the effect of the antibiotic on A. baumanii growth be measured and considered when choosing the ratio at which the bacteria are mixed?

      __ Authors’ reply #2: __The point raised by the reviewer is very important. As we have stated in the manuscript, the capsule-induced production using antibiotics impacts the growth of A. baumannii and could therefore change the predator-prey ratio, potentially affecting the observed phenotype. However, the antibiotic is expected to equally impact the non-encapsulated ΔitrA strain, yet this strain maintains very strong T6SS killing activity in the presence of chloramphenicol. Thus, we do not believe the predator-prey ratio is causing the observed effect. To address this point more directly, we nonetheless propose to: i) repeat the experiments with different predator-prey ratios (1:1, 2:1, and 5:1), and ii) include a bfmS mutant as a control.

      Minor comments:

      #3) Figure 1D, l. 155, I might have missed this, do the authors happen to have the numbers of E. cloacae as well? This would strengthen the claim on A. baumannii survival because of E. cloacae is being killed.

      __ Authors’ reply #3: __The reviewer is correct; we did not include the survival of E. cloacae in the initial manuscript due to technical reasons (counter-selection of E. cloacae). However, we propose to repeat the experiment using an E. cloacae strain carrying a plasmid conferring kanamycin resistance. This will allow us to counter-select E. cloacae after contact with the A. baumannii predator to determine if E. cloacae is killed by A. baumannii in a T6SS-dependent manner.


      #4) Figure 2, I suggest to write out the species name of the prey in the box with the ratio. With E. cloacae being referred to in the previous figure and starting with similar letters than E. coli, I wasn't sure at first sight what E. c. refers to.

      __ Authors’ reply #4: __We appreciate the comment and will revise the figure as suggested.

      #5) use of the term "T6SS activity" throughout the manuscript (e.g. l. 182, l. 187). I leave this up to the authors. To me, it seems like an umbrella term for the initial observation and I see that such a term can be very handy for the writing. I just would like to mention that the use of the term was not always intuitive to me and sometimes even a bit misleading. For example, l. 182 refers to "increased T6SS activity". As a reader, I only know about 'T6SS activity on other cells' or 'a T6SS-mediated effect on other cells' at this point. T6SS apparatus assembly/firing activity is tested for specifically later and it turns out to differ between mutants. By the time the term is used in the discussion, it captures multiple nuanced phenotypes described by then. The more precise definition of the term in l. 200 helped to capture what exactly is meant by the authors.

      __ Authors’ reply #5: __We propose rephrasing the sentences to include the term "T6SS-secretion activity" when referring to Hcp secretion assays and "T6SS-mediated killing activity" when referring to killing experiments.

      __#6) __l. 198-199 "Collectively, our findings indicate that CPS does not hinder the secretion process of the T6SS or the consequent elimination of competing cells". I might be missing something, I cannot entirely follow this sentence. Didn't the authors just show that the CPS does hinder T6SS-mediated elimination of competing cells in panel 2A and less secreted Hcp in the encapsulated WT compared to the non-encapsulated mutant in panel 2B?

      __ Authors’ reply #6:__ We thank the reviewer for this comment. We realize that the sentence wasn’t well phrased, resulting in confusion. What we meant was that the T6SS is functional regarding its T6SS-mediated killing and secretion in the WT strain, while we also showed that the non-capsulated strain kills and secretes more T6SS material in the supernatant. Thus, there seems to be a balance between capsule production and T6SS activity in the WT. We will revise the sentence to better reflect this meaning.

      #7) l. 224, typo, "in"

      __ Authors’ reply #7:__ We will correct this typo. Thank you.

      • *

      #8) Two connected comments: l. 338, Just a thought, I am wondering about the title of the section. After reading it a second time, I think it is technically correct. When reading it first, I was a bit confused when getting to the data because apparatus assmebly is impaired in the capsule-overproducing strain and although "preserved", doesn't the data indicate that there is less T6SS assembly in the bfmS mutant and that this might be because of less cell sensing and isn't this a main point that there is a difference in apparatus assembly in the capsule overproducing strain compared to WT (other than no difference in apparatus assembly in the strain without capsule)? To me it seems not fully acknowledged as a finding in the interpretation of the data that less cells of the bfmS mutant have a T6SS apparatus. Isn't that interesting? A title along the lines of "Capsule-overproducing strain has preserved sensory function and assembles less T6SS apparatuses" would have been more intuitive for me. l. 352, In case I didn't miss a reference to this data earlier in the manuscript, I am wondering if it would be worth mentioning the finding on the reduced apparatus assembly of the bfmS mutant earlier, together with Figure 3 already. At least a sentence that mentions already that there is more coming later. When I got to this line in the manuscript and read the findings on the apparatus assembly, I first needed to go back to figure 3 and look at the data there again in light of this finding. It is mentioned here on the side but I think very important for the interpretation of the phenotypic data of the bfmS mutant shown earlier, isn't it? The tslA mutant is used beautifully here.

      __ Authors’ reply #8:__ We thank the reviewer for the suggestion and the kind comment about the beautiful usage of the tslA mutant. We will modify the title of the corresponding paragraph as suggested to make it more intuitive.

              Regarding the comment about mentioning the T6SS apparatus assembly defect in the *bfmS* mutant earlier, we respectfully disagree. While we agree that this point is important and can partially explain the difference in killing activity, we believe that showing it together with the *tslA* mutant (Figure 5) makes more sense and is easier for the reader to understand.
      

      #9) Discussion: optional comment. On the one hand, I like the concise discussion. On the other hand, I see more potential here for bringing it all together (potentially at the expense of shortening some of the introduction). I think the subtleties of the findings are complex. For example, I could envision a graphical summary with a working model on all the effects of a capsule on the T6SS and its potential clinical relevance making the study accessible to even more readers.

      __ Authors’ reply #9: __In the revised manuscript, we will include a graphical summary/model.


      Significance

      #10) General assessment: I consider the story very strong in terms of novelty, experimental approaches used, quality of the data, quality of the writing and figures of the manuscript. In my view, the aspects that could be improved are optional/minor and concern only one figure and some phrasing.

      • Advance: I see major advance in the findings (i, mechanistic) on the mechanism of how the capsule interferes with T6SS, (ii, fundamental) on the discovery of ClpXP degrading Hcp, and (iii, clinical) on the meaning of antibiotic treatment for the T6SS of this clinically relevant and often multi-drug resistant bacterial species, which strongly complements existing work on the T6SS and antibiotics in A. baumanii (e.g. of the Feldman group). As the authors write themselves, the starting points of the study of a capsule protecting from a T6SS and the effect of a T6SS on other cells being negatively impacted by a capsule were known, although not studied in one species and not understood mechanistically.*

      • Audience: I see the result of interest to a broad audience in the fields of bacteria-bacteria interactions, Acinetobacter baumanii, type VI secretion, antimicrobial resistance, bacterial capsules.*

      __ Authors’ reply #10: __We once again thank the reviewer and highly appreciate their positive and constructive feedback on our work. We hope the reviewer will be satisfied with the revised version of our manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      #11) In the manuscript by Flaugnatti et al., the authors provide clear evidence of the interplay between capsule outer coat production and the Type VI secretion system (T6SS) in Acinetobacter baumannii. The authors demonstrate that the presence of the capsule or the activity of the T6SS enhances survival against attacking bacteria. However, they also show that in their model bacterium, the (over)production of the capsule likely hinders T6SS dynamics, thereby reducing overall killing efficiency. Additionally, they reveal that the amount of the T6SS component Hcp is regulated in cells that can no longer assemble and/or secrete via the T6SS, presumably by the ClpXP protease. Overall, the experiments are well designed, and most conclusions are supported by the data and appropriate controls. I have however some suggestions that could further strengthen the manuscript prior to publication.

      __ Authors’ reply #11: __We are grateful for the reviewer’s enthusiasm and will implement their comments and suggestions in the revised version of the manuscript.


      Major comments:

      #12) Line 164. The authors use E. coli as prey to test the T6SS activity of A. baumannii. Why not directly use the E. cloacae strain (with or without T6SS) for this purpose? This would provide direct evidence that A. baumannii uses its T6SS to kill E. cloacae, thus confirming the authors conclusions in this section.

      __ Authors’ reply #12: __We thank the reviewer for this comment. We used E. coli to assess the functionality of the T6SS in different strains of A. baumannii, as it is commonly done in the T6SS field. However, as suggested by reviewer 1 (see comment #3) and in response to this query, we will also provide survival data of E. cloacae in the revised manuscript using a plasmid-carrying E. cloacae derivative that allows direct selection.

      #13) In Figure 2, the authors show that a non-capsulated strain kills more effectively and secretes more than a WT, but has a similar number of T6SS. They suggest in their conclusion that "the observed increase in T6SS activity in the non-capsulated strain suggests a compensatory mechanism for the absence of the protective capsule layer." This conclusion implies the presence of an "active" regulatory mechanism that would increase the number of successful T6SS firing events, which has not been demonstrated. Could it not simply be that the capsule blocks some shots that cannot penetrate and are therefore ineffective? This hypothesis is mentioned in lines 204-208. The authors should clarify the conclusion of this section. Given the challenge this may pose in A. baumannii, I suggest that the authors quantify the assembly/firing dynamics of the T6SS under WT and ΔitrA conditions. This would help distinguish between the two hypotheses explaining better firing in non-capsulated cells: i.e., if the number of assembled T6SS is the same in both strains (Fig 2C & 2D), do non-capsulated cells assemble/fire faster, indicating an adaptation in regulation, or do we observe the same dynamics, suggesting a simple physical barrier blocking the passage of certain T6SS firing events?

      __ Authors’ reply #13:__ We realize that the sentence, and more specifically the word "compensatory," might have been misleading and thank the reviewer for bringing this to our attention. What we meant to convey is that there is a balance between capsule production and T6SS activity; if disturbed, the balance shifts in one direction or the other. Specifically, there is more protection through the production of a thicker capsule (e.g., in the ∆bfmSmutant or under sub-MIC conditions of antibiotics, regulated by the Bfm system, as mentioned in the text) or more T6SS activity when less capsule is present (e.g., in the ΔitrA mutant, which we propose is caused by the lack of the steric hindrance). We will rephrase this sentence in the revised manuscript to better convey this message.

              Regarding the quantification of T6SS dynamic assembly/firing events between the capsulated (WT) and non-capsulated (ΔitrA) strains, we do not think this is required for this study, as the amount of secreted Hcp reflects the overall activity of the system. Importantly, we also do not have the technical means to quantify assembly/firing rates under Biosafety 2 conditions, as this requires specialized microscopes with very fast acquisition options (see, for instance, Basler, Pilhofer *et al.*, 2012, *Nature*). Indeed, very few labs in the T6SS field have been able to measure such rates.
      

      #14) Line 428-429. It is mentioned that the deletion of lon does not have a notable effect. However, I observe that the absence of Lon alone causes a more rapid degradation of Hcp in the cells compared to the WT strain (Fig 7B). How do the authors explain that the absence of this protease (whether under conditions of Hcp accumulation or not) increases the degradation of this protein in the cell? This explanation should be included in the manuscript.

      __ Authors’ reply #14: __That’s a fair point. We didn’t address this point further, as the deletion of lon didn’t resolve the issue of why Hcp is degraded. In fact, the opposite seems to be the case, as there is less Hcp in the ∆lon strain compared to the WT. While this observation is not directly relevant to the question of why Hcp is degraded late during growth in secretion-impaired strains, we will properly mention it in the revised manuscript.

              Please also note that a strong growth defect of a Δ*lon*Δ*clpXP* double mutant impaired further investigation in this direction.
      
      • *

      Minor comments:

      #15) Throughout the manuscript, the authors use the term "predator" to refer to A. baumannii. Predation is a specific phenomenon that involves killing for nourishment. To my knowledge, the T6SS has never been shown to be a predation weapon but rather a weapon for interbacterial competition, which is a different concept. If this has not been demonstrated in A. baumannii, the authors should replace the term "predator" with "attacker" (or an equivalent term) to clarify the context.

      __ Authors’ reply #15: __We thank the reviewer for this comment. The term “predator,” as highlighted by the reviewer, typically implies killing for nourishment/cellular products. In the context of T6SS, it facilitates the killing of competitors, releasing DNA into the environment that can subsequently be acquired through natural competence for transformation, as observed in species like Vibrio cholerae (our work by Borgeaud et al., 2015, Science) or other Acinetobacter species such as Acinetobacter baylyi (Ringel et al., 2017, Cell Rep.; Cooper et al., 2017, eLife). The acquisition of DNA reflects the killing for cellular products of the prey. As most A. baumannii strains are also naturally competent, this justifies the usage of the predator and prey nomenclature.

              Apart from this fact, it seems to be a matter of nomenclature, with many papers in the field using one term or the other. Yet, ultimately, this doesn’t change any of the scientific findings. Therefore, to satisfy the reviewer, we will change “predator” to “attacker” throughout the revised manuscript.
      

      #16) Line 274. Since the authors stated that in the Wzc mutant, the capsule is "predominantly found in the supernatant and only loosely attached to the cell," this result is not unexpected. This finding is also consistent with the previous results from Fig. 3A & B, which show sensitivity to complement-mediated killing and the weak amount of (ab)normal CPS produced in that strain, further confirmed by Fig. 3E.

      __ Authors’ reply #16__: We fully agree with the reviewer’s suggestion and will remove the statement.

      #17) Line 299. The authors speculate that "... T6SS may deploy through gaps akin to arrow-slits in the capsule's mesh...". However, this is very unlikely since a WT strain kills (Fig. 3C) and secretes (Fig. 2B & 3D) less effectively than the itrA mutant, suggesting that the T6SS is not assembled in the "right places" devoid of CPS; otherwise, we would expect similar T6SS activity. Based on the results in Fig. 2 (and my earlier comment), this implies that A. baumannii assembles its T6SS randomly, and in the presence of the capsule, its shots would need to be in the right place to penetrate the envelope and reach the target. Could the authors comment on this point and provide a model figure to better visualize the interplay between the capsule and T6SS under the three major conditions: WT, non-capsulated, and capsule overproduction?

      __ Authors’ reply #17: __We thank the reviewer and agree with their comment. We discussed the hypothesis of T6SS deployment through gaps, drawing a parallel to what was proposed for biofilm and T6SS in V. cholerae(Toska et al., 2018, PNAS). However, as mentioned earlier, we believe that the effect of the capsule on T6SS activity is primarily due to steric hindrance, which increases the distance between the T6SS apparatus and the prey cell. To clarify our findings further, we will include a model summarizing our results, as requested by reviewer 1 (see comment #9).


      __ #18)__ In Fig. 5A, the microscopy panels should be adjusted to the same dynamic range as the WT (which represents a true signal), which does not appear to be the case for the tlsA mutant panel for instance. The image gives the impression of a large amount of free TssB-msfGFP in the cytoplasm. However, this effect is due to the dynamic range being adjusted to display a signal. This observation is consistent with the fact that the amount of TssB-msfGFP protein is identical across all strains (Fig. S2F).

      __ Authors’ reply #18: __We will adjust the images to the range of the WT in the revised manuscript, as suggested. However, regardless of how these images are presented, the enumeration of T6SS structures will remain unchanged, which was the sole point of this experiment.

      • *

      #19) Unless I am mistaken, the authors do not comment on the fact that in a ΔbfmS strain, the number of T6SS is halved compared to a WT or ΔitrA strain. If capsule overproduction only partially limits the TslA-dependant T6SS assembly, how can this result be explained? Is it related to the degradation of Hcp in this background, which ultimately limits the formation of T6SS? If so, it would be interesting to mention this connection in the section "Prolonged secretion inhibition triggers Hcp degradation”

      __ Authors’ reply #19: __We did mention that the T6SS assembly of the ΔbfmS mutant is reduced compared to the WT (or ΔitrA), likely due to the defect in sensing the prey (lines 369-374 and 468-472 of the initial manuscript). However, we will revise the sentence to improve clarity in the revised version of the manuscript.

      Significance

      #20) This work is highly intriguing as it not only delves into the specific mechanisms involved but also connects fundamental elements in bacterial competition, i.e., the necessity for self-protection and aggression for survival. The manuscript offers valuable insights into cellular dynamics at a microscale level and prompts new inquiries into the regulation of these systems on a population scale. The work is well-done and the writing is also clear. I am convinced that this work represents another significant step towards understanding bacterial mechanisms and will undoubtedly spark considerable interest in the field.

      __ Authors’ reply #20: __We sincerely thank reviewer #2 for their constructive inputs, which will improve our manuscript.

      • *

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      #21) The manuscript by Flaugnatti et al investigates the relationship between functions of the T6SS in A. baumannii and production of capsular polysaccharide. The manuscript argues that (1) capsule protects A. baumannii against T6SS-mediated attack by other bacteria, (2) capsule also interferes with the bacterium's own T6SS activity, and (3) the T6SS inner tube protein Hcp is regulated by degradation by ClpXP. The main critiques regard the first two conclusions, which seem to be based solely on use of a mutant that has a confounding effect as described below; and to strengthen the third claim by further exploring the results of overexpressing Hcp and by determining whether there is a fitness benefit for Hcp regulation.

      __ Authors’ reply #21: __We thank reviewer #3 for their relevant input. We will conduct additional experiments based on their comments, and these will be incorporated into the revised manuscript.

      • *

      __Main items:____ __

      #22) Throughout the paper, an itrA deletion mutant is used as the capsule-deficient strain and conclusions are drawn about role of capsule based on this mutant. However, itrA deletion also eliminates the protein O-glycosylation pathway (Lees-miller et al 2013), a potential confounder. Analysis of mutants specifically deficient in the high-molecular weight capsule but not protein glycosylation, and/or mutants in the protein o-glycosylation enzyme, should be incorporated into the study to enhance the ability to make conclusions about the role of the capsule.

      __ Authors’ reply #22: __Fair point. We thank the reviewer for this important suggestion. To distinguish between the O-glycosylation pathway and capsule production, we will generate a ∆pglL strain (specific to O-glycosylation), as suggested, and will repeat the key experiments (similar to Fig. 2A and 2B). We are almost done with the engineering of this mutant strain and therefore don’t expect any major delays.

      #23) Evidence could be provided to support the idea raised in lines 482-483 that T6SS component accumulation is toxic ("degradation [of T6SS components] could serve as a strategy to alleviate proteotoxic stress..."). For example, growth curves of ∆clpXP strains with and without hcp could be analyzed, to determine how degrading Hcp is helping the bacteria.

      __ Authors’ reply #23: __We will perform growth curves of ΔclpXP strains with and without hcp, as suggested by the reviewer. However, we are uncertain whether we will be able to observe differences between these strains, as the conditions under which such degradation is significant may be challenging to replicate under standard laboratory conditions.

      __#24) __The possible ClpXP recognition sequence identified at the C terminus of Hcp is interesting-does overexpression of an Hcp variant lacking/altered in this motif alter its protein levels compared to WT Hcp?

      __ Authors’ reply #24: __We thank the reviewer for this suggestion. We are in the process of performing the suggested experiment and will include the data in the manuscript.

      __Minor items:____ __

      #25) *A better explanation could be provided for why overexpressing hcp in WT but not in ∆hcp leads to increased Hcp protein levels. There is a statement about Hcp being regulated post transcriptionally, possibly by degradation (lines 422-423), but would that not also result in regulation in the WT strain? *

      __ Authors’ reply #25: __The reviewer is absolutely correct here. Despite careful genetic engineering, we believe that the hcp mutant used may have a polar effect, causing Hcp accumulation only in the ∆hcp + p-hcp strain but not in the WT + p-hcp strain, which remains capable of secretion. The ∆hcp strain therefore mimics the secretion-impaired tssB mutant. We will clarify this in the revised manuscript.

      #26) *An untreated control is needed in Fig. 4B. *

      __ Authors’ reply #26: __The untreated samples were shown in all previous figures. However, we understand the reviewer's point and will repeat the experiment with the untreated control included in the same experiment.

      #27) *line 179: please clarify "reflecting better invading bacteria" *

      __ Authors’ reply #27: __We appreciate the reviewer mentioning this oversight. We meant to compare this to a situation where a bacterium invades an already existing community, resulting in a predator-prey ratio below 1. We will clarify this further in the revised manuscript.

      #28) *line 351: consider rewording the statement that ∆tslA results in decreased in T6SS assembly and activity using the tssB-msfGFP microscopy assay; it is not clear that activity is measured in this assay. *

      __ Authors’ reply #28: __The reviewer is correct. We will revise the sentence accordingly to better reflect the T6SS assembly.

      #29) *lines 260-265: This experiment could use clarifying, but it would seem that it requires analysis of the secreted capsule levels in the tssB mutant to show it does not produce extracellular capsule to the same extent that ∆bfmS does. *

      __ Authors’ reply #29: __We thank the reviewer for the suggestion and will include these experimental data in the revised manuscript.

      #30) *Fig. 6C and 7A labelling could be improved to avoid potential confusion that the bar graphs are quantifying the western blot. E.g., could add a corresponding vertical label to the Western data, or consider changing "relative expression of hcp" to something reflecting analysis of transcript levels. *

      __ Authors’ reply #30: __We will improve this figure by splitting the qPCR and Western blot data into independent panels. This will eliminate any confusion.


      #31) lines 416-417 and Fig. 7A: states that "hcp mRNA levels increased significantly", but more careful wording could be used because the WT's transcript change is not significant after overexpression (though it is significant in ∆hcp).

      __ Authors’ reply #31: __Point well taken. We will improve the sentence (and Figure) to make its meaning unambiguous.

      • *

      #32) lines 479-480 states that in secretion-impaired strains accumulation of Hcp is mitigated by ClpXP; while this was shown for ∆tssB, was this also the case for ∆bfmS?

      __ Authors’ reply #32: __This is indeed an interesting suggestion. We are in the process of generating the double mutant ∆bfmSclpXP and will include the experimental results in the revised manuscript.


      Significance

      #33) *The strengths of the study are the focus on a clinically significant pathogen, the potential novel roles for the important capsule virulence factor of A. baumannii, and the identification of novel points of control of the T6SS. The analyses of T6SS function are thorough and carefully performed. *

      __ Authors’ reply #33: __We thank the reviewer for their comments, which we believe will significantly strengthen our work, particularly regarding the capsule aspect.

    1. Author response:

      eLife assessment

      This valuable study uses single-cell transcriptomics to explore the mouse vomeronasal organ and represents an advance that enhances our understanding of neural diversity within this sensory system. Findings suggest a unique endoplasmic reticulum (ER) structure in Gnao1 neurons and allow for the synthesis of a developmental trajectory from stem cells to mature vomeronasal sensory neurons. Convincing methods, data, and analyses broadly support the claims, although experiments supporting the main ER-related claim are incomplete and lack quantification of co-expression and statistics on labeling intensity or coverage. Adding these data would greatly strengthen the conclusions of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Devakinandan and colleagues present a manuscript analyzing single-cell RNA-sequencing data from the mouse vomeronasal organ. The main advances in this manuscript are to identify and verify the differential expression of genes that distinguish apical and basal vomeronasal neurons. The authors also identify the enriched expression of ER-related genes in Gnao1 neurons, which they verify with in situ hybridizations and immunostaining, and also explore via electron microscopy. Finally, the results of this manuscript are presented in an online R shiny app. Overall, these data are a useful resource to the community. I have a few concerns about the manuscript, which I've listed below.

      General Concerns:

      (1) The authors mention that they were unable to identify the cells in cluster 13. This cluster looks similar to the "secretory VSN" subtype described in a recent preprint from C. Ron Yu's lab (10.1101/2024.02.22.581574). The authors could try comparing or integrating their data with this dataset (or that in Katreddi et al. 2022) to see if this is a common cell type across datasets (or arises from a specific type of cell doublets). In situ hybridizations for some of the marker genes for this cluster could also highlight where in the VNO these cells reside.

      Cluster13 (Obp2a+) cells identified in our study have similar gene expression markers to those identified with the “putative secretory” cells in Hills et al. manuscript. At the time this manuscript was available publicly, our publication was already finalized and communicated. We welcome the suggestion to integrate data, which we will attempt and address in our revision.      

      (2) I found the UMAPs for the neurons somewhat difficult to interpret. Unlike Katreddi et al. 2022 or Hills et al. 2024, it's tricky to follow the developmental trajectories of the cells in the UMAP space. Perhaps the authors could try re-embedding the data using gene sets that don't include the receptors? It would also be interesting to see if the neuron clusters still cluster by receptor-type even when the receptors are excluded from the gene sets used for clustering. Plots relating the original clusters to the neuronal clusters, or dot plots showing marker gene expression for the neuronal clusters might both be useful. For example, right now it's difficult to interpret clusters like n8-13.

      We will represent the UMAPs to make the developmental trajectory clearer. How neuron clusters are affected by the presence or exclusion of receptors is an interesting question that we will address in our revision, along with showing markers of each neuronal cluster, as suggested by the reviewer.  

      Reviewer #2 (Public Review):

      Summary:

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript.

      Strengths:

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic.

      (2) The analysis suggests that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors.

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons.

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons.

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons.

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community.

      Weaknesses:

      (1) The connection between observations from sc RNA-seq and EM is unclear.

      (2) The lack of quantification for the ER phenotype is a concern.

      We would like to point out that the connection between scRNA-seq and EM was made in our experiments that investigated the localization of ER proteins via IHC (in Figure 5). The intriguing observation that the levels of a number of ER luminal and membrane proteins were higher in Gnao1 compared to Gnai2 neurons, led us to hypothesize a differential ER content or ultrastructure, which was verified by EM. The quantification of ER phenotype would definitely strengthen our observations, which we will add in our revised manuscript.       

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report the enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and co-expression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns.

      Strengths:

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting of a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes.

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...).

      Weaknesses:

      The study still requires refined analyses of the data and rigorous quantification to support the main claims.

      The method description for filtering and clustering single-cell RNA-sequencing data is incomplete. The Seurat package has many available pipelines for single-cell RNA-seq analysis, with a significant impact on the output data. How did the authors pre-process and normalize the data? Was the pipeline used with default settings? What batch correction method was applied to the data to mitigate possible sampling or technical effects? Moreover, the authors do not describe how cell and gene filtering was performed.

      The data in Figure 7-Supplement 3 show that one-sixth of the V1Rs do not express any chemoreceptor, while over a hundred cells express more than one chemoreceptor. Do these cells have unusually high or low numbers of genes or counts? To exclude the possibility of a technical artifact in these observations, the authors should describe how they dealt with putative doublet cells or debris.

      Surprisingly, some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors.

      The identification of the VSN types should be consistent across the different analyses and validated. The data presented in Figure 1 lists four mature VSN types, whereas the re-clustering of neurons presented in Figure 3 leads to a different subdivision. At present, it remains unclear whether these clusters reflect the biology of the system or are due to over-clustering of the data, and therefore correspond to either noise or arbitrary splitting of continua. Clusters should be merged if they do not correspond to discrete categories of cells, and correspondence should be established between the different clustering analyses. To validate the detected clusters as cell types, markers characteristic of each of these populations can be evaluated by ISH or IHC.

      There is a lack of quantification of imaging data, which provides little support for the ER-related main claim. Quantification of co-expression and statistics on labeling intensity or coverage would greatly strengthen the conclusions and the title of the paper.

      scRNA-seq data analysis methods: We agree with the reviewer and will elaborate on the various criterion, parameters and methods in our revision. As described above, our revised manuscript will include analysis of how inclusion / exclusion of VRs affects cell clusters, as well as quantification of the ER phenotype. We will address the reviewer’s concern of over-clustering.

      We think that the cells expressing zero as well as two V1Rs are real and cannot be attributed to debris or doublets for the following reasons:

      a) Cells expressing no V1Rs are not necessarily debris because they express other neuronal markers at the same level as cells that express one or two V1Rs. Higher expression threshold values used in our analysis may have somewhat increased the proportion of cells with zero V1Rs. We will modify figure 7-supplement 3c to add another group showing Gnai2 level in cells expressing zero V1Rs.

      b) Cells co-expressing V1R genes: We listed the frequency of cells co-expressing V1R gene combinations in Supplementary table - 8. Among 134 cells that express two V1Rs, 44 cells express Vmn1r85+Vmn1r86, 21 express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177, and so on. Doublets generally are a random combination of two cells. Here, each specific co-expression combination represents multiple cells and is highly unlikely by random chance. Some of the co-expression combinations were identified earlier and verified experimentally in Lee et al., 2019 and Hills et. al. Furthermore, Figure-7 supplement 3c shows that the level of Gnai2 expression is comparable across cells expressing one or two V1Rs. If the V1R expressing cells are doublets, we expect the level of Gnai2 to be higher, as compared to cells expressing single V1R. We will elaborate on this in our revised manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all three reviewers for their insightful comments. Based on this feedback, we have performed additional experiments, and revised our manuscript. Below, we address each comment and describe the revisions.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Ponomarova et al. showed that neomorphic idh-1 mutation results in increased levels of cellular D-2HG. The authors compared the high D-2HG phenotypes by D-2HG dehydrogenase mutant and identified vitamin B12 dependent vulnerability differences. The downregulated gene function of glycine cleavage system involved in one-carbon donor units exacerbates the phenotypes while adding one-carbone donors suppresses the phenotype. They concluded that the idh-1neo mutation imposes a dependency on the one-carbon pool. The manuscript is very interesting but I think the manuscript should be modified to be more clear for broad audiences.

      Concerns: The authors mention a number of examples for metabolic changes of D-2HG in the first paragraph of introduction. I think that a metabolic map explaining the changes helps readers to understand the questions proposed by the authors.

      Thank you for this suggestion. A figure illustrating the contributing factors in D-2HG metabolism has been added to the manuscript (Figure 1A).

      The authors say that D-2HG affects carcinogenesis in many ways, citing previous works. They should say a higher concentration of D-2HG does affect carcinogenesis or not in dhgd loss of function, if they assume the concentration is most important for carcinogenesis.

      Thank you for pointing this out. We have added this information in lines 70-72 of the revised manuscript: "Increased levels of D-2HG caused by the inhibition of D-2-hydroxyglutarate dehydrogenase activity have also been associated with different cancers (PMID: 29339485, PMID: 34296423, PMID: 35007759)."

      Line 110, mode should be read as model, I guess.

      Thank you - we have corrected this error.

      In Figure 4C, concentrations of formate are shown; 0. 20, 40, 80, 160 mM. Is this correct? the high concentration of substrates changes the osmotic pressure of the medium. Also, high concentration of formic acid is toxic to animals. Considering the concentration of vitamin B12 was 64 nM, I wonder concentration unit of formate is also nM.

      We confirm that we supplemented the media with formate in the millimolar range. The highest doses of supplemented formate somewhat slowed the development of P0 animals, but they consistently produced viable progeny. To clarify this we have added the following line to the text on lines 184-187: "The highest doses of supplemented formate somewhat slowed the development of P0 animals, but restored the survival of idh-1neo embryos to wild-type levels on a regular diet of E. coli OP50 as well as the diet of RNAi-competent E. coli HT115."

      Additionally, the use of sodium formate ensured that the pH of the media remained unchanged.

      I could not understand how embryonic and larval lethality confer the same mechanisms on animal carcinogenesis. Could you explain the logic link between lethal mutation and carcinogenesis. Or do the two phenotypes share only a part of metabolic changes?

      Thank you for this suggestion. We have added this in lines 242-246 of the Discussion:

      "While our results have focused on how the neomorphic idh-1 mutation affects the developing embryo, proliferating cancer cells also have been shown to have increased demand for 1C units, for instance, to synthesize nucleosides (33)(PMID: 24657017). Thus, we can speculate that cancers with mutated IDH1 may be increasingly sensitive to depletion of the 1C pool, also."

      Vitamin B12 is an essential substance and deficiency in humans results in sever diseases. Is the lethal phenotype by treatment of idh-1neo mutants comparable to humans? Is the concentration of vitamin B12 similar in humans?

      The daily dose of human vitamin B12 (cobalamin) in supplements can reach 12.5 µg per kg (PMID: 18606874), while we supplement the media fed to worms with approximately 55 µg cobalamin per kg (64 nM adenosylcobalamin). No known adverse effects are associated with excessive intake of vitamin B12 by healthy individuals; therefore, no tolerable upper intake level has been set (PMID: 23193625). However, the impact of vitamin B12 on patients with IDH1neo-positive cancers has not been studied.

      Reviewer #1 (Significance (Required)):

      I think that the manuscript is interesting and may lead an important progress of this field. However, in general, metabolic disorders are difficult to understand for the people outside the speciality. The authors should explain carefully the structure/property, pathways, enzyme functions, and concentration effects of substances of interest.

      See above, we hope these edits are sufficient.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Increased levels of the metabolite D-2HG (derived from alpha-KG) are associated with multiple disorders. In a previous study, the authors showed that in C. elegans dhgd-1 deletion mutants, embryonic lethality resulting from the accumulation of D-2HG in is caused by a lack of ketone bodies. In this study, the authors generated a new model of D-2HG accumulation in C. elegans, idh-1neo, in order to further understand how D-2HG exerts its toxic effects in different contexts. This allele mimics mutations found in neomorphic mutations of human IDH1 that lead to abnormal D-2HG production from alpha-KG. Interestingly, the authors find that idh-1neo mutants are distinct from animals lacking the D-2HG dehydrogenase dhgd-1 previously reported. Specifically, while vitamin B12 rescues the embryonic lethality in dhgd-1 deletion animals, it enhances the lethality of idh-1neo animals. Through an elegant genetic screen, and complementation studies with specific metabolites, they provide compelling evidence that this vitamin B12-dependent enhancement is due to depletion of the 1C pool. Specifically, a reverse genetic screen revealed that inactivation of components of the 1 C-producing glycine cleavage system (GCS) results in embryonic lethality in idh-1neo, but not wildtype animals. Complementation studies with specific metabolites show that replenishing C groups is sufficient to reverse embryonic lethality.

      This is a very clear, well written paper. Experiments are well controlled and executed, figures are of the highest quality and conclusions are convincing. Prior studies are appropriately referenced. No additional experiments are required by this reviewer.

      Minor points 1) In Figure 2A could authors explain how beta-alanine (increased) is different from alanine (decreased). As a non-specialist this is not clear to me.

      Thank you for pointing this out. We added this explanation to the figure legend (lines 510-512).

      2) Did the authors test inactivation of the lipoamide dehydrogenase (dld-1) has the same effect as the other identified components of the GCS?

      The dld-1 RNAi clone was present in the metabolic library that we screened but was not identified as a "hit." We have added the following in lines 164-168 of the revised manuscript: "Two other GCS genes, gcsh-2 and dld-1 were not identified as 'hits'. gcsh-2 is associated with the same reaction as gcsh-1, indicating that the latter encodes an active enzyme (30). dld-1 functions in other metabolic processes, particularly in lactate/pyruvate metabolism, and confers embryonic lethality when knocked down in wild type animals (31)".

      **Referees cross-commenting**

      Comments to Reviewer #3: 1/ The authors treat the idh-1neo worms with vitamin B12 to reduce 3HP concentrations. The authors should consider conducting experiments to reduce 3HP by other means also. This would help establish a causal relationship between the D-2HG accumulation and observed phenotypes.

      The authors show that adding vitamin B12 to the diet of the idh-1neo significantly increased their D-2HG levels. Furthermore, dhgd-1 RNAi drives a further increase in D-2HG in idh-1neo animals and led to 100% penetrant embryonic lethality among the F1 generation of idh-1neo animals. Together I think this provided strong evidence for a causal relationship between the D-2HG accumulation and observed phenotypes. Further characterizing these phenotypes would be interesting but is beyond the scope of this paper.

      4/ The authors should clarify whether it is really vitamin B12 or any other metabolite from the bacteria (like methionine) that is bringing about the phenotypes. Have they tested metabolically inactive bacteria?

      the authors show that supplementing B12-treated idh-1neo animals with formate (another 1C donor) restored the survival of idh-1neo embryos, supporting a role for B12 in depletion of the 1C pool. They also show that suppressing Met/SAM cycle genes in idh-1neo prevent 1C depletion and restore availability of 1C units. So the evidence that 1C unit depletion is at the core of the observed phenotypes is pretty convincing

      7/ The authors should conduct metabolomic profiling to examine changes in metabolic pathways, including 1C, glycine metabolism, glucose metabolism etc, in idh-1neo animals subjected to GCS gene knockdown, and vitamin B12 supplementation.

      Not clear how these experiments would add to this story. Open up another line of research

      8/ The audience will be limited to the field although the study pertains to an oncometabolite. The study value would have improved if the authors had included cancer cell data. Also, the phenotype studied has not been mechanistically linked to the oncometabolite function, making the study academic in nature.

      The intetest of this study is that it is being carried out in an organismal context.

      Reviewer #2 (Significance (Required)):

      As a geneticist with a general interest in metabolomics I find this an elegans study that offers new insight into how IDH-1 and -2 neomorphic mutations affect metabolic rewiring in the context of a whole animal. Although similarities are observed between idh-1neo mutants and animals lacking the D-2HG dehydrogenase dhgd-1, both of which have increased levels of the metabolite D-2HG, specific metabolic differences are observed. The identification of 1C unit deficiency as a driver of lethality in idh-1neo mutants is highly significant given the central importance of 1C metabolism. This study should therefore be of interest to a wide audience.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Ponomarova et al presents a short follow up of their previous study to elucidate the role of a oncogenic variant of idh-1 that increases the 3HP levels, similar to the Ddhgd-1 mutant. Using a combination of metabolomics and genetics, they show that the defect in idh-1neo worms on high vitamin B12 diet is the draining of the 1C pool, distinct from the mechanisms of lethality observed in the Ddhgd-1 mutant. While the findings are interesting, there is a lack of mechanistic understanding of the basis of the phenotype observed. Moreover, the authors do not establish the link between the oncometabolite, that should support uncontrolled cell division, with the observed phenotype. Some control experiments are missing and should be included in the revised manuscript. there could be many other The comments on the manuscript are as follows, in no particular order:

      1. The authors treat the idh-1neo worms with vitamin B12 to reduce 3HP concentrations. The authors should consider conducting experiments to reduce 3HP by other means also. This would help establish a causal relationship between the D-2HG accumulation and observed phenotypes.

      To further examine the link between 3HP and idh-1neo embryonic lethality, we targeted hphd-1 by RNAi, which increases 3HP levels (Ponomarova et al., 2023). Hphd-1 knockdown did not induce lethality in the wild-type or exacerbate lethality in idh-1neo animals (Figure S3), further demonstrating that lack of 3HP degradation is not linked to this phenotype (lines 143-145).

      Also, see cross-comments from Reviewer #2 above.

      The authors should investigate the functional impact of HPHD-1 inhibition on 3-hydroxypropionate levels and D-2HG accumulation by RNAi knockdown of HPHD-1 in idh-1neo animals.

      We have now performed the suggested experiment please see response to comment 1 above.

      The authors do not clearly mention clearly which diet in some of their experiments. This is imporant since the two diets used (OP50 and HT115) differ in their vitamin B12 content, and thus could have different consequences.

      We added this information in figures, figure legends, and lines 259-260 of the revised manuscript.

      The authors should clarify whether it is really vitamin B12 or any other metabolite from the bacteria (like methionine) that is bringing about the phenotypes. Have they tested metabolically inactive bacteria?

      The reviewer correctly points out that bacterial metabolism may play a role in the effects exerted by vitamin B12. We have not tested metabolically inactivated bacteria, however, our RNAi experiments (Figure 4E) demonstrate that supplemented vitamin B12 acts through the Met/SAM cycle in idh-1neo animals. Please also see cross-comments from Reviewer #2.

      The authors consistently use 64 nM of Vitamin B12. Will the hphd-1 mutant and the idh-1neo mutant have different vitamin B12 thresholds for the observed phenotypes?

      Thank you for raising this interesting point. While 64 nM vitamin B12 virtually eliminates 3HP accumulation in idh-1 animals (Figure 2D), we have not tested if this dose is sufficient to eliminate 3HP accumulation in hphd-1 mutant. However, potential differences in 3HP levels in idh-1neo and hphd-1 animals treated with vitamin B12 would not contradict our conclusion that 3HP is not the cause of embryonic lethality in idh-1neo mutant animals.

      Figure 3b: HT115 has inherently high levels of vitamin B12 so the RNAi effect of genes should be seen on the OP50 diet supplemented with B12.

      Despite reports of elevated B12 levels in E. coli HT115, vitamin B12-induced embryonic lethality of idh-1neo on a diet of OP50 is more severe than on a diet of HT115 bacteria (Figure 4C). Therefore, it may be harder to quantify synthetic lethal interaction of idh1-neo with GCS RNAi knockdown using OP50 strains (which would need to be created).

      The authors should conduct metabolomic profiling to examine changes in metabolic pathways, including 1C, glycine metabolism, glucose metabolism etc, in idh-1neo animals subjected to GCS gene knockdown, and vitamin B12 supplementation.

      While these results would be interesting and further our understanding of metabolic changes that occur in idh-1neo mutant animals we think they are beyond the scope of the manuscript. Also, please see cross-comments from Reviewer #2.

      Perform rescue experiments using different one-carbon donors (e.g., formate, serine) to restore embryonic viability in idh-1neo mutants under conditions of vitamin B12-induced stress. Quantify the efficacy of these interventions using developmental assays.

      In addition to formate rescue experiments (Figure 4C), we supplemented idh-1neo animals with serine (Figure 4D and S7). Similar to formate, serine supplementation resulted in the rescue of idh-1neo embryonic lethality on an E. coli OP50 diet (lines 187-189). The lack of rescue on an HT115 diet could be due to HT115 bacteria containing more glycine (Gao et al., 2017), which might limit the efficiency of serine conversion to glycine needed for 1C unit production.

      Provide experimental evidence to show that idh-1neo animals possess an alternative source of energy.

      We have previously found that diminished production of ketone bodies in ∆dhgd-1 mutants causes embryonic lethality that can be rescued by exogenous supplementation of ketone body 3-hydroxybutyrate (Ponomarova et al., 2023). In contrast to dhgd-1 mutants, idh-1neo embryonic lethality fails to respond to supplemented 3-hydroxybutyrate (Figure S4), indicating the lethality associated with the idh-1neo mutation is caused by a different mechanism, i.e., a depletion in 1C-units.

      The authors use vitamin B12 to inhibit the shunt pathway (line 127). They should explore alternate strategies to do the same, like gene knockdown.

      Please see our response to comment 1 above where we discuss RNAi knock-down of the shunt pathway gene, hphd-1.

      It is not clear why the authors did not follow up with the other phenotypes of the idh-1neo that were visible without the Vitamin B12 supplementation. They should follow up with this and also other phenotypes to explore the broader physiological consequences of D-2HG accumulation.

      We agree that the other physiological consequences of D-2HG accumulation are interesting, and we plan to investigate them in our future studies.

      The authors should include control experiments without supplementation of vitamin B12, ketone bodies etc. in each of their figures.

      We thank the reviewer for this suggestion. We have added these data (Figures S5, 6, 7, and 8).

      The authors posit that the idh-1neo depletes the 1C pool leading to the observed lethality. So, when they supply formate to replenish it, they rescue the lethality of the B12-treated worms. Similar results are obtained by knocking down the enzymes. So where are the 1C units going? Understanding this will provide the much-needed mechanistic understanding to this study.

      We appreciate this insightful comment and expand our discussion to elaborate on this issue (lines 224-227). "We propose that a lack of 1C units in idh-1neo can impede pyrimidine biosynthesis via thymidylate synthase tyms-1, which uses 1C units to generate dTMP. Supporting this hypothesis, RNAi of tyms-1 causes embryonic lethality (36-38)."

      It may be important to measure the D-2HG levels in the mitochondria vs the cytosol.

      While this is an interesting point, we think that this line of inquiry is beyond the scope of this work (and is technically challenging).

      The idh-1neo is an oncometabolite. The authors do not show any data to indicate whether this mutant has any defect in cell division/cell cycle in the somatic tissue or germline.

      In this study we primarily focused on the molecular changes in the metabolic network that occur in idh-1neo mutant animals, which we think is an important advance in understanding the basis for how this mutation affects IDH function. Additional phenotypic outcomes of these perturbed metabolic processes will be the basis of future studies.

      Reviewer #3 (Significance (Required)):

      The audience will be limited to the field although the study pertains to an oncometabolite. The study value would have improved if the authors had included cancer cell data. Also, the phenotype studied has not been mechanistically linked to the oncometabolite function, making the study academic in nature.

      While we agree that the link between idh-1neo, 2HG production and oncometabolite function has not been directly shown we think that our study adds important molecular understanding of metabolic changes that occur in relation to idh-1neo function which are important for future studies of how this mutation affects carcinogenesis. Also, please see cross-comments from Reviewer #2.

      In addition, we specified statistical significance in Figure 2, described statistical tests used (lines 361-363) and corrected a few grammatical errors throughout the text.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The manuscript by Sejour et al. is testing "translational ramp" model described previously by Tuller et al. in S. cerevisiae. Authors are using bioinformatics and reporter based experimental approaches to test whether "rare codons" in the first 40 codons of the gene coding sequences increase translation efficiency and regulate abundance of translation products in yeast cells. Authors conclude that "translation ramp" model does not have support using a new set of reporters and bioinformatics analyses. The strength of bioinformatic evidence and experimental analyses (even very limited) of the rare codons insertion in the reporter make a compelling case for the authors claims. However the major weakness of the manuscript is that authors do not take into account other models that previously disputed "rare or slow codon" model of Tuller et al. and overstate their own results that are rather limited. This maintains to be the weak part of the manuscript even in the revised form.

      We are glad the reviewer thinks our evidence makes “a compelling case for the authors claims”. This was our main aim, and we are satisfied with this.

      The reviewer believes the major weakness of the manuscript is that we do not take into account other models and do not (see below) cite numerous other relevant papers. The reviewer made essentially the same criticism at the first review, at which time we looked quite hard for papers generally meeting the reviewer’s description. We found a few, which we incorporated here. Still, we did not find the body of evidence whose existence the reviewer implies. We are citing every study we know to be relevant, though of course we will have inadvertently missed some, given the huge body of literature. After the first round of review, we wrote “the reviewer did not give specific references, and, though we looked, we weren’t always sure which papers the reviewer had in mind.” We hoped the reviewer would provide citations. But only two citations are provided here, both to A. Kochetov, and these don’t seem central to the reviewer’s points.

      The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data. Moreover several studies have used bioinformatical analyses to point out the evolution of N-terminal sequences in multiple model organisms including yeast, focusing on either upstream ORFs (uORFs) or already annotated ORFs. The authors did not mention multiple of these studies in their revised manuscript and did not comment on their own results in the context of these previous studies.

      Mostly, we do not know to what papers the reviewer is referring. This may be our failing, but it would have helped if the reviewer had cited one of them. There are papers discussing the evolution of N-terminal sequences, but as far as we know, these do not discuss translation speed or codon usage. Of course, we may have missed some papers.

      As such the authors approach to data presentation, writing and data discussion makes the manuscript rather biased, focused on criticizing Tuller et al. study and short on discussing multiple other possible reasons for slow translation elongation at the beginning of the protein synthesis. This all together makes the manuscript at the end very limited.

      We think the reviewer may be considering our paper as being generally about translation speeds, whereas in our minds, it is not. This difference in views as to what the paper is “about” is perhaps causing friction. To us, it is indeed a limited paper. We are narrowly focused on the finding of Tuller that there is an enrichment of rare, slow codons at the 5’ end of genes, and we have sought an explanation of this particular fact. This is not a paper about rates of translation generally—it is a limited paper about the reason for the 5’ enrichment of rare, slow codons.

      To expand on this, the encoded slow 5’ translation due to rare, slow codons (of Tuller et al.) is a small effect (1% to 3%). The possible unencoded slow 5’ translation of unknown mechanism discussed by some other papers (e.g., Weinberg et al. 2016, Shah et al. 2013) is a much larger effect (50% or more). Just from the different magnitudes, it seems likely these are different phenomena. And yet, despite the small size of the encoded effect, it is for some reason this paper by Tuller et al. that has captured the attention of the literature: as we point out below, Tuller et al. has been cited over 900 times. Partly because of the wide and continuing influence of this paper, it is worth specifically and narrowly addressing its findings.

      Reviewer #2 (Public Review):

      Tuller et al. first made the curious observation, that the first ∼30-50 codons in most organisms are encoded by scarce tRNAs and appear to be translated slower than the rest of the coding sequences (CDS). They speculated that this has evolved to pace ribosomes on CDS and prevent ribosome collisions during elongation - the "Ramp" hypothesis. Various aspects of this hypothesis, both factual and in terms of interpreting the results, have been challenged ever since. Sejour et al. present compelling results confirming the slower translation of the first ~40 codons in S. cerevisiae but providing an alternative explanation for this phenomenon. Specifically, they show that the higher amino acid sequence divergence of N-terminal ends of proteins and accompanying lower purifying selection (perhaps the result of de novo evolution) is sufficient to explain the prevalence of rare slow codons in these regions. These results are an important contribution in understanding how aspects of the evolution of protein coding regions can affect translation efficiency on these sequences and directly challenge the "Ramp" hypothesis proposed by Tuller et al.

      I believe the data is presented clearly and the results generally justify the conclusions.

      We thank the reviewer for his/her attention to the manuscript, and for his/her comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review major weakness of the manuscript is the lack of analyses for confounding effects, overstatements of the results (using single amino acid sequence reporter) and the lack of discussion of previous work that argues against Tuller et al model. In my previous review I mentioned multiple other studies that addressed "slow codons" model in more detail.

      No, the reviewer did not cite any specific studies.

      While some of these studies are mentioned in the revised manuscript, authors are still rather biased and selective in their discussions. I should also point out that previous studies, that authors fail again to mention, were focused on either translation initiation, initiation to elongation transition or early elongation effects in relation to mRNA sequence, structure, codons as well as amino acid sequence. Also additional studies with bioinformatic analyses of N-terminal conservation and existence of start sites at the beginning of the protein sequences in multiple model organisms were also omitted.

      Again, we do not know to what papers the reviewer is referring. But this sounds like a lot. Our paper is aimed at a specific, narrow topic: Why is there an excess of rare, slow codons in the 5’ region of genes? We are not trying to make general statements about all things affecting and affected by translation speed, we are just trying to explain the excess of rare, slow codons.

      In general manuscript seems to be too much focused-on discussion of Tuller's paper . . .

      Yes, we are focused on the Tuller findings, the excess of rare slow codons in 5’ regions.

      . . . and arguing with the model that was already shown by multiple other studies to be limited and not correct.

      We find it unsatisfactory that the reviewer states in a public review that there are multiple other studies showing that the Tuller model is not correct, and yet does not cite any of them. Furthermore, for the reviewer to say that Tuller et al. is “not correct” is too sweeping. The core finding of Tuller et al. was the excess of rare, slow codons in the 5’ regions of genes. We confirm this; we believe it is correct; we are not aware of any literature disputing this. Then, Tuller interpreted this as an adaptation to promote translational efficiency. On the interpretation, we disagree with Tuller. But if one is to disagree with this interpretation, one needs an alternative explanation of the fact of the excess rare, slow codons. Providing such an alternative explanation, and doing an experiment to distinguish the explanations, is our contribution. We are not aware of any other paper making our interpretation.

      There are of course many papers that discuss various aspects of translation at the 5’ ends of genes, and we do cite quite a few such papers in our manuscript, though certainly not all. But papers of this general kind do not, and cannot, show that Tuller et al. is “not correct”. As far as we know, no paper provides an alternative explanation for the rare slow codons, and no paper does an experiment to modulate translation speed and look at the effect on gene expression. Notably, the slow translation phenomenon associated with the rare codons found by Tuller et al. is a very small effect—a change of about 1% to 3% of translation speed. Some other papers on translation speed are dealing with possible changes in the range of 50% or more. These are presumably some other phenomenon (if indeed they are even real changes in translation speed), and, whether they are true or not, the results and interpretations of Tuller et al. could still be true or not. Of course, if we knew of some previous paper showing the Tuller paper is not correct, we should and would cite it.

      To expand on the current view of Tuller in the literature, Tuller et al. has been cited 956 times according to Google Scholar. This makes it an extremely influential paper. After finding Tuller et al. in Entrez Pubmed, one can look under “Cited by” and see the five most recent papers that cite Tuller et al. The five papers given on May 23 2024 were Bharti . . . Ignatova 2024; Uddin 2024; Khandia . . . Choudhary 2024; Love and Nair 2024; and Oelschlaeger 2024. We went through these five most recent papers that cite Tuller et al., and asked, did these authors cite the Tuller results as fully correct, or did they mention any doubts about the results? All five of the papers cited the Tuller results as fully correct, with no mention of any kind of doubt. For instance, Kandia et al. 2024 state “The slow “ramp” present at 5’ end of mRNA forms an optimal and robust means to reduce ribosomal traffic jams, thus minimizing the cost of protein expression40.”, while Oelschlaeger (2024) states “Slow translation ramps have also been described elsewhere and proposed to prevent traffic jams along the mRNA [51,52,53].” Although Uddin (2024) cited Tuller as fully correct, Uddin seemed to think (it is a little unclear) that Tuller found an enrichment of highly-used codons, opposite to the actual finding. The multiple contrary studies mentioned by the reviewer do not seem to have been very influential.

      There are papers containing skepticism about the Tuller interpretation, and also papers with results that are difficult to reconcile in a common-sense way with the Tuller interpretation. But skepticism, and a difficulty to reconcile with common sense, are far from a demonstration that a paper is incorrect. Indeed, Tuller et al. may have been published in Cell, and may be so highly cited, exactly because the findings are counter-intuitive, colliding with common sense. Our contribution is to find a common-sense interpretation of the surprising but correct underlying fact of the 5’ enrichment of rare, slow codons.

      Having wrote that in the previous review, I have to admit that Sejour et al manuscript in the main text has a minimal amount of novelty with experimental evidence, the conclusions are based on three reporters with and without stalling/collision sequence with the same amino acid sequence and varying codons. Some more novelty is seen in bioinformatic analyses of multiple yeast sequences and sequence conservation at the N-termini of proteins. However, even this part of the manuscript is not discussed fully and with correct comparison to previous studies. Authors, based on my previous comments discuss further experimental shortcomings in their new and "expanded" discussion but the use of a single reporter in this case cannot relate to all differences that may be coming from ORFs seen in complete yeast transcriptome. There are multiple studies that used more reporters with more than one amino-acid and mRNA sequence as well as with similar variation of the rare or common codons. The handwaving argument about the influence of all other mechanisms that can arise from different start sites, RNA structure, peptide interaction with exit channel, peptidyl-tRNA drop-off, eIF3 complex initiation-elongation association, and etc, is just pointing up to a manuscript that is more about bashing up Tuller's model and old paper than trying to make a concise story about their own results and discuss their study in plethora of studies that indicated multiple other models for slow early elongation.

      We don’t understand why the reviewer is so grudging.

      Discussion of the ribosome's collisions and potential impact of such scenario in the author's manuscript is left completely without citation, even though such work has relevant results to the author's conclusions and Tuller's model.

      This is not true. We cite Dao Duc and Song (2018) “The impact of ribosomal interference, codon usage, and exit tunnel interactions on translation elongation rate variation.” PLoS Genet 14, and Tesina, . . . and Green (2020) “Molecular mechanism of translational stalling by inhibitory codon combinations and Poly(A) tracts. EMBO J., which are two excellent papers on this subject. We also cite Gamble et al. (2016), who found the underlying result, but at that time did not attribute it to ribosome collisions.

      Previous studies (not cited) for example clearly indicate how the length from stalling sequence to start codon is related to ribosome collisions. Moreover such studies are pointing out differences in initiation vs elongation rates that may impact ribosome collisions and protein expression. Both of these topics would be very valuable in discussions of evolutionary changes in the current yeast ORFs. Not to mention that authors do not really discuss also possibilities for differences in 5'UTRs and uORFs in relation to downstream ORFs sequence and codon composition.

      It is not clear to us that such papers are highly relevant to the issue on which we are working.

      The argument about whether cycloheximide or not is doing 5' ribosome slowdown (lines 425-443) is just rambling about Weinberg's paper from 2016 without any real conclusion. In this section authors are just throwing down hypothesis that were more clearly explained in Weinberg's manuscript or shown experimentally in studies done after the Weinberg et al. paper was published.

      Earlier, the reviewer had the criticism that “The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data.” The main study we know of dealing with these issues like these is that of Weinberg et al. 2016. In our opinion, this is a thoughtful paper on these issues. But now, at this point, the reviewer seems to criticize the fact that we do extensively cite results from Weinberg et al. It is true that there is no ultimate conclusion, but why there is no conclusion is a little bit interesting. Weinberg et al show that even in studies that do not use cycloheximide as the first step in ribosome profiling, there is some left-over high density of ribosomes near 5’ ends. But, all these ribosome profiling experiments do use cycloheximide at a later step in the procedure. Until someone does a ribosome profiling experiment without the use of any cycloheximide at any step, there will be no firm conclusion. This is not our fault—and also not the issue we are writing about. And, the reason this paragraph is in the manuscript at all is that the reviewer (we thought) had asked for something like this in the first review.

      At the end, even in the limited novelty of evolutionary arguments about non-existing N-terminal conservation of codons or amino acids they fail to cite and discuss previous work by Kochetov (BioEssays, 2008 and NAR, 2011) which have additional explanation on evolution of N-terminal sequences in yeast, human or Drosophila.

      These two papers of Dr. Kochetov’s have some relevance and we now cite them. These are the only papers cited by the reviewer in his/her two reviews.

      Probably the reviewer would have preferred a paper on a different subject.


      The following is the authors’ response to the original reviews.

      Response to Reviewers:

      We thank the reviewers for their comments, and their evident close reading of the manuscript. Generally, we agree with the reviewers on the strengths and weaknesses of our manuscript. Our revised manuscript has a more extensive discussion of alternative explanations for initial high ribosome density as seen by ribosome profiling, and which more specifically points out the limitations of our work.

      As a preface to specific responses to the reviewers, we will say that we could divide observations of slow initial translation into two categories, which we will call “encoded slow codons”, and “increased ribosome density”. With respect to the first category, Tuller et al. documented initial “encoded slow codons”, that is, there is a statistical excess of rare, slowly-translated codons at the 5’ ends of genes. Although the size of this effect is small, statistical significance is extremely high, and the existence of this enrichment is not in any doubt. At first sight, this appears to be a strong indication of a preference for slow initial translation. In our opinion, our main contribution is to show that there is an alternative explanation for this initial enrichment of rare, slow codons—that they are a spandrel, a consequence of sequence plasticity at the 5’ (and 3’) ends of genes. The reviewers seem to generally agree with this, and we are not aware that any other work has provided an explanation for the 5’ enrichment of rare codons.

      The second category of observations pertaining to slow initial translation is “increased ribosome density”. Early ribosome profiling studies used cycloheximide to arrest cell growth, and these studies showed a higher density of ribosomes near the 5’ end of genes than elsewhere. This high initial ribosome density helped motivate the paper of Tuller et al., though their finding of “encoded slow codons” could explain only a very small part of the increased ribosome density. More modern ribosome profiling studies do not use cycloheximide as the first step in arresting translation, and in these studies, the density of ribosomes near the 5’ end of genes is greatly reduced. And yet, there remains, even in the absence of cycloheximide at the first step, a significantly increased density of ribosomes near the 5’ end (e.g., Weinberg et al., 2016). (However, most or all of these studies do use cycloheximide at a later step in the protocol, and the possibility of a cycloheximide artefact is difficult to exclude.) Some of the reviewer’s concerns are that we do not explain the increased 5’ ribosome density seen by ribosome profiling. We agree; but we feel it is not the main point of our manuscript. In revision, we more extensively discuss other work on increased ribosome density, and more explicitly point out the limitations of our manuscript in this regard. We also note, though, that increased ribosome density is not a direct measure of translation speed—it can have other causes.

      Specific Responses.

      Reviewer 1 was concerned that we did not more fully discuss other work on possible reasons for slow initial translation. We discuss such work more extensively in our revision. However, as far as we know, none of this work proposes a reason for the 5’ enrichment of rare, slow codons, and this is the main point of our paper. Furthermore, it is not completely clear that there is any slow initial translation. The increase in ribosome density seen in flash-freeze ribosome profiling could be an artefact of the use of cycloheximide at the thaw step of the protocols; or it could be a real measure of high ribosome density that occurs for some other reason than slow translation (e.g., ribosomes might have low processivity at the 5’ end).

      Reviewer 1 was also concerned about confounding effects in our reporter gene analysis of the effects of different codons on efficiency of translation. We have two comments. First, it is important to remember that although we changed codons in our reporters, we did not change any amino acids. We changed codons only to synonymous codons. Thus at least one of the reviewer’s possible confounding effects—interactions of the nascent peptide chain with the exit channel of the ribosome—does not apply. However, of course, the mRNA nucleotide sequence is altered, and this would cause a change in mRNA structure or abundance, which could matter. We agree this is a limitation to our approach. However, to fully address it, we feel it would be necessary to examine a really large number of quite different sequences, which is beyond the scope of this work. Furthermore, mRNAs with low secondary structure at the 5’ end probably have relatively high rates of initiation, and also relatively high rates of elongation, and it might be quite difficult to disentangle these. But in neither case is there an argument that slow initial translation is efficient. Accurate measurement of mRNA levels would be helpful, but would not disentangle rates of initiation from rates of elongation as causes of changes in expression.

      Reviewer 2 was concerned that the conservation scores for the 5’ 40 amino acids, and the 3’ 40 amino acids were similar, but slow translation was only statistically significant for the 5’ 40 amino acids. As we say in the manuscript, we are also puzzled by this. We note that 3’ translation is statistically slow, if one looks over the last 100 amino acids. Our best effort at an explanation is a sort of reverse-Tuller explanation: that in the last 40 amino acids, the new slow codons created by genome plasticity are fairly quickly removed by purifying selection, but that in the first 40 amino acids, for genes that need to be expressed at low levels, purifying selection against slow codons is reduced, because poor translation is actually advantageous for these genes. To expand on this a bit, we feel that the 5000 or so proteins of the proteome have to be expressed in the correct stoichiometric ratios, and that poor translation can be a useful tool to help achieve this. In this explanation, slow translation at the 5’ end is bad for translation (in agreement with our reporter experiments), but can be good for the organism, when it occurs in front of a gene that needs to be expressed poorly. Whereas, in Tuller, slow translation at the 5’ end is good for translation.

      Reviewer 2 wondered whether the N-terminal fusion peptide affects GFP fluorescence in our reporter. This specific reporter, with this N-terminus, has been characterized by Dean and Grayhack (2012), and by Gamble et al. (2016), and the idea that a super-folder GFP reporter is not greatly affected by N-terminal fusions is based on the work of Pedelacq (2006). None of these papers show whether this N-terminal fusion might have some effect, but together, they provide good reason to think that any effect would be small. These citations have been added.

    1. Author response:

      Reviewer #1 (Public Review):

      Abbasi et al. assess in this MEG study the directed connectivity of both cortical and subcortical regions during continuous speech production and perception. The authors observed bidirectional connectivity patterns between speech-related cortical areas as well as subcortical areas in production and perception. Interestingly, they found in speaking low-frequency connectivity from subcortical (the right cerebellum) to cortical (left superior temporal) areas, while connectivity from the cortical to subcortical areas was in the high frequencies. In listening a similar cortico-subcortical connectivity pattern was observed for the low frequencies, but the reversed connectivity in the higher frequencies was absent.

      The work by Abbasi and colleagues addresses a relevant, novel topic, namely understanding the brain dynamics between speaking and listening. This is important because traditionally production and perception of speech and language are investigated in a modality-specific manner. To have a more complete understanding of the neurobiology underlying these different speech behaviors, it is key to also understand their similarities and differences. Furthermore, to do so, the authors utilize state-of-the-art directed connectivity analyses on MEG measurements, providing a quite detailed profile of cortical and subcortical interactions for the production and perception of speech. Importantly, and perhaps most interesting in my opinion, is that the authors find evidence for frequency-specific directed connectivity, which is (partially) different between speaking and listening. This could suggest that both speech behaviors rely (to some extent) on similar cortico-cortical and cortico-subcortical networks, but different frequency-specific dynamics.

      These elements mentioned above (investigation of both production and perception, both cortico-cortical and cortico-subcortical connectivity is considered, and observing frequency-specific connectivity profiles within and between speech behaviors), make for important novel contributions to the field. Notwithstanding these strengths, I find that they are especially centered on methodology and functional anatomical description, but that precise theoretical contributions for neurobiological and cognitive models of speech are less transparent. This is in part because the study compares speech production and perception in general, but no psychophysical or psycholinguistic manipulations are considered. I also have some critical questions about the design which may pose some confounds in interpreting the data, especially with regard to comparing production and perception.

      (1) While the cortico-cortical and cortico-subcortical connectivity profiles highlighted in this study and the depth of the analyses are impressive, what these data mean for models of speech processing remains on the surface. This is in part due, I believe, to the fact that the authors have decided to explore speaking and listening in general, without targeting specific manipulations that help elucidate which aspects of speech processing are relevant for the particular connectivity profiles they have uncovered. For example, the frequency-specific directed connectivity is it driven by low-level psychophysical attributes of the speech or by more cognitive linguistic properties? Does it relate to the monitoring of speech, timing information, and updating of sensory predictions? Without manipulations trying to target one or several of these components, as some of the referenced work has done (e.g., Floegel et al., 2020; Stockert et al., 2021; Todorović et al., 2023), it is difficult to draw concrete conclusions as to which representations and/or processes of speech are reflected by the connectivity profiles. An additional disadvantage of not having manipulations within each speech behavior is that it makes the comparison between listening and speaking harder. That is, speaking and listening have marked input-output differences which likely will dominate any comparison between them. These physically driven differences (or similarities for that matter; see below) can be strongly reduced by instead exploring the same manipulations/variables between speaking and listening. If possible (if not to consider for future work), it may be interesting to score psychophysical (e.g., acoustic properties) or psycholinguistic (e.g., lexical frequency) information of the speech and see whether and how the frequency-specific connectivity profiles are affected by it.

      We thank the reviewer for pointing this out. The current study is indeed part of a larger project investigating the role of the internal forward model in speech perception and production. In the original, more comprehensive study, we also included a masked condition where participants produced speech as usual, but their auditory perception was masked. This allowed us to examine how the internal forward model behaves when it doesn't receive the expected sensory consequences of generated speech. However, for the current study, we focused solely on data from the speaking and listening conditions due to its specific research question. We agree that further manipulations would be interesting. However, for this study our focus was on natural speech and we avoided other manipulations (beyond masked speech) so that we can have sufficiently long recording time for the main speaking and listening conditions.

      (2) Recent studies comparing the production and perception of language may be relevant to the current study and add some theoretical weight since their data and interpretations for the comparisons between production and perception fit quite well with the observations in the current work. These studies highlight that language processes between production and perception, specifically lexical and phonetic processing (Fairs et al., 2021), and syntactic processing (Giglio et al., 2024), may rely on the same neural representations, but are differentiated in their (temporal) dynamics upon those shared representations. This is relevant because it dispenses with the classical notion in neurobiological models of language where production and perception rely on (partially) dissociable networks (e.g., Price, 2010). Rather those data suggest shared networks where different language behaviors are dissociated in their dynamics. The speech results in this study nicely fit and extend those studies and their theoretical implications.

      We thank the reviewer for the suggestion and we will include these references and the points made by the reviewer in our revised manuscript.

      (3) The authors align the frequency-selective connectivity between the right cerebellum and left temporal speech areas with recent studies demonstrating a role for the right cerebellum for the internal modelling in speech production and monitoring (e.g., Stockert et al., 2021; Todorović et al., 2023). This link is indeed interesting, but it does seem relevant to point out that at a more specific scale, it does not concern the exact same regions between those studies and the current study. That is, in the current study the frequency-specific connectivity with temporal regions concerns lobule VI in the right cerebellum, while in the referenced work it concerns Crus I/II. The distinction seems relevant since Crus I/II has been linked to the internal modelling of more cognitive behavior, while lobule VI seems more motor-related and/or contextual-related (e.g., D'Mello et al., 2020; Runnqvist et al., 2021; Runnqvist, 2023).

      We thank the reviewer for their insightful comment. The reference was intended to provide evidence for the role of the cerebellum in internal modelling in speech. We do not claim that we have the spatial resolution with MEG to reliably spatially resolve specific parts of the cerebellum.

      (4) On the methodological side, my main concern is that for the listening condition, the authors have chosen to play back the speech produced by the participants in the production condition. Both the fixed order as well as hearing one's own speech as listening condition may produce confounds in data interpretation, especially with regard to the comparison between speech production and perception. Could order effects impact the observed connectivity profiles, and how would this impact the comparison between speaking and listening? In particular, I am thinking of repetition effects present in the listening condition as well as prediction, which will be much more elevated for the listening condition than the speaking condition. The fact that it also concerns their own voice furthermore adds to the possible predictability confound (e.g., Heinks-Maldonado et al., 2005). In addition, listening to one's speech which just before has been articulated may, potentially strategically even, enhance inner speech and "mouthing" in the participants, hereby thus engaging the production mechanism. Similarly, during production, the participants already hear their own voice (which serves as input in the subsequent listening condition). Taken together, both similarities or differences between speaking and listening connectivity may have been due to or influenced by these order effects, and the fact that the different speech behaviors are to some extent present in both conditions.

      This is a valid point raised by the reviewer. By listening to their own previously produced speech, our participants might have anticipated and predicted the sentences easier. However, during designing our experiment, we tried to lower the chance of this anticipation by several steps. First, participants were measured in separate sessions for speech production and perception tasks. There were always several days' intervals between performing these two conditions. Secondly, our questions were mainly about a common/general topic. Consequently, participants may not remember their answers completely.

      Importantly, using the same stimulus material for speaking and listening guaranteed that there was no difference in the low-level features of the material for both conditions that could have affected the results of our statistical comparison.

      Due to bone conduction, hearing one’s unaltered own speech from a recording may seem foreign and could lead to unwanted emotional reactions e.g. embarrassment, so participants were asked whether they heard their own voice in a recording already (e.g. from a self-recorded voice-message in WhatsApp) which most of them confirmed. Participants were also informed that they were going to hear themselves during the measurement to further reduce unwanted psychophysiological responses.

      (5) The ability of the authors to analyze the spatiotemporal dynamics during continuous speech is a potentially important feat of this study, given that one of the reasons that speech production is much less investigated compared to perception concerns motor and movement artifacts due to articulation (e.g., Strijkers et al., 2010). Two questions did spring to mind when reading the authors' articulation artifact correction procedure: If I understood correctly, the approach comes from Abbasi et al. (2021) and is based on signal space projection (SSP) as used for eye movement corrections, which the authors successfully applied to speech production. However, in that study, it concerned the repeated production of three syllables, while here it concerns continuous speech of full words embedded in discourse. The articulation and muscular variance will be much higher in the current study compared to three syllables (or compared to eye movements which produce much more stable movement potentials compared to an entire discourse). Given this, I can imagine that corrections of the signal in the speaking condition were likely substantial and one may wonder (1) how much signal relevant to speech production behavior is lost?; (2) similar corrections are not necessary for perception, so how would this marked difference in signal processing affect the comparability between the modalities?

      One of the results of our previous study (Abbasi et al., 2021) was that the artefact correction was not specific to individual syllables but generalised across syllables. Also, the repeated production of syllables was associated with substantial movements of the articulators mimicking those observed during naturalistic speaking. We therefore believe that the artefact rejection is effective during speaking. We also checked this by investigating speech related coherence in brain parcels in spatial proximity to the articulators. In our previous study we also show that the correction method retains neural activity to a very large degree. We are therefore confident that speaking and listening conditions can be compared and that the loss of true signals from correcting the speaking data will be minor.

      References:

      • Abbasi, O., Steingräber, N., & Gross, J. (2021). Correcting MEG artifacts caused by overt speech. Frontiers in Neuroscience, 15, 682419.

      • D'Mello, A. M., Gabrieli, J. D., & Nee, D. E. (2020). Evidence for hierarchical cognitive control in the human cerebellum. Current Biology, 30(10), 1881-1892.

      • Fairs, A., Michelas, A., Dufour, S., & Strijkers, K. (2021). The same ultra-rapid parallel brain dynamics underpin the production and perception of speech. Cerebral Cortex Communications, 2(3), tgab040.

      • Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      • Giglio, L., Ostarek, M., Sharoh, D., & Hagoort, P. (2024). Diverging neural dynamics for syntactic structure building in naturalistic speaking and listening. Proceedings of the National Academy of Sciences, 121(11), e2310766121.

      • Heinks‐Maldonado, T. H., Mathalon, D. H., Gray, M., & Ford, J. M. (2005). Fine‐tuning of auditory cortex during speech production. Psychophysiology, 42(2), 180-190.

      • Price, C. J. (2010). The anatomy of language: a review of 100 fMRI studies published in 2009. Annals of the new York Academy of Sciences, 1191(1), 62-88.

      • Runnqvist, E., Chanoine, V., Strijkers, K., Pattamadilok, C., Bonnard, M., Nazarian, B., ... & Alario, F. X. (2021). Cerebellar and cortical correlates of internal and external speech error monitoring. Cerebral Cortex Communications, 2(2), tgab038.

      • Runnqvist, E. (2023). Self-monitoring: The neurocognitive basis of error monitoring in language production. In Language production (pp. 168-190). Routledge.

      • Stockert, A., Schwartze, M., Poeppel, D., Anwander, A., & Kotz, S. A. (2021). Temporo-cerebellar connectivity underlies timing constraints in audition. Elife, 10, e67303.

      • Strijkers, K., Costa, A., & Thierry, G. (2010). Tracking lexical access in speech production: electrophysiological correlates of word frequency and cognate effects. Cerebral cortex, 20(4), 912-928.

      • Todorović, S., Anton, J. L., Sein, J., Nazarian, B., Chanoine, V., Rauchbauer, B., ... & Runnqvist, E. (2023). Cortico-cerebellar monitoring of speech sequence production. Neurobiology of Language, 1-21.

      Reviewer #2 (Public Review):

      Summary:

      The authors re-analyse MEG data from a speech production and perception study and extend their previous Granger causality analysis to a larger number of cortical-cortical and in particular cortical-subcortical connections. Regions of interest were defined by means of a meta-analysis using Neurosynth.org and connectivity patterns were determined by calculating directed influence asymmetry indices from the Granger causality analysis results for each pair of brain regions. Abbasi et al. report feedforward signals communicated via fast rhythms and feedback signals via slow rhythms below 40 Hz, particularly during speaking. The authors highlight one of these connections between the right cerebellum lobule VI and auditory association area A5, where in addition the connection strength correlates negatively with the strength of speech tracking in the theta band during speaking (significant before multiple comparison correction). Results are interpreted within a framework of active inference by minimising prediction errors.

      While I find investigating the role of cortical-subcortical connections in speech production and perception interesting and relevant to the field, I am not yet convinced that the methods employed are fully suitable to this endeavour or that the results provide sufficient evidence to make the strong claim of dissociation of bottom-up and top-down information flow during speaking in distinct frequency bands.

      Strengths:

      The investigation of electrophysiological cortical-subcortical connections in speech production and perception is interesting and relevant to the field. The authors analyse a valuable dataset, where they spent a considerable amount of effort to correct for speech production-related artefacts. Overall, the manuscript is well-written and clearly structured.

      Weaknesses:

      The description of the multivariate Granger causality analysis did not allow me to fully grasp how the analysis was performed and I hence struggled to evaluate its appropriateness. Knowing that (1) filtered Granger causality is prone to false positives and (2) recent work demonstrates that significant Granger causality can simply arise from frequency-specific activity being present in the source but not the target area without functional relevance for communication (Schneider et al. 2021) raises doubts about the validity of the results, in particular with respect to their frequency specificity. These doubts are reinforced by what I perceive as an overemphasis on results that support the assumption of specific frequencies for feedforward and top-down connections, while findings not aligning with this hypothesis appear to be underreported. Furthermore, the authors report some main findings that I found difficult to reconcile with the data presented in the figures. Overall, I feel the conclusions with respect to frequency-specific bottom-up and top-down information flow need to be moderated and that some of the reported findings need to be checked and if necessary corrected.

      Major points

      (1) I think more details on the multivariate GC approach are needed. I found the reference to Schaum et al., 2021 not sufficient to understand what has been done in this paper. Some questions that remained for me are:

      (i) Does multivariate here refer to the use of the authors' three components per parcel or to the conditioning on the remaining twelve sources? I think the latter is implied when citing Schaum et al., but I'm not sure this is what was done here?

      If it was not: how can we account for spurious results based on indirect effects?

      Yes, multivariate refers to the three components.

      (ii) Did the authors check whether the GC of the course-target pairs was reliably above the bias level (as Schaum et. al. did for each condition separately)? If not, can they argue why they think that their results would still be valid? Does it make sense to compute DAIs on connections that were below the bias level? Should the data be re-analysed to take this concern into account?

      We performed statistics on DAI and believe that this is a valid approach. We argue that random GC effects would not survive our cluster-corrected statistics.

      (iii) You may consider citing the paper that introduced the non-parametric GC analysis (which Schaum et al. then went on to apply): Dhamala M, Rangarajan G, Ding M. Analyzing Information Flow in Brain Networks with Nonparametric Granger Causality. Neuroimage. 2008; 41(2):354-362. https://doi.org/10.1016/j.neuroimage.2008.02. 020

      Thanks, we will add this reference in the revised version.

      (2) GC has been discouraged for filtered data as it gives rise to false positives due to phase distortions and the ineffectiveness of filtering in the information-theoretic setting as reducing the power of a signal does not reduce the information contained in it (Florin et al., 2010; Barnett and Seth, 2011; Weber et al. 2017; Pinzuti et al., 2020 - who also suggest an approach that would circumvent those filter-related issues). With this in mind, I am wondering whether the strong frequency-specific claims in this work still hold.

      This must be a misunderstanding. We are aware of the problem with GC on filtered data. But GC was here computed on broadband data and not in individual frequency bands.

      (3) I found it difficult to reconcile some statements in the manuscript with the data presented in the figures:

      (i) Most notably, the considerable number of feedforward connections from A5 and STS that project to areas further up the hierarchy at slower rhythms (e.g. L-A5 to R-PEF, R-Crus2, L CB6 L-Tha, L-FOP and L-STS to R-PEF, L-FOP, L-TOPJ or R-A5 as well as R-STS both to R-Crus2, L-CB6, L-Th) contradict the authors' main message that 'feedback signals were communicated via slow rhythms below 40 Hz, whereas feedforward signals were communicated via faster rhythms'. I struggled to recognise a principled approach that determined which connections were highlighted and reported and which ones were not.

      (ii) "Our analysis also revealed robust connectivity between the right cerebellum and the left parietal cortex, evident in both speaking and listening conditions, with stronger connectivity observed during speaking. Notably, Figure 4 depicts a prominent frequency peak in the alpha band, illustrating the specific frequency range through which information flows from the cerebellum to the parietal areas." There are two peaks discernible in Figure 4, one notably lower than the alpha band (rather theta or even delta), the other at around 30 Hz. Nevertheless, the authors report and discuss a peak in the alpha band.

      (iii) In the abstract: "Notably, high-frequency connectivity was absent during the listening condition." and p.9 "In contrast with what we reported for the speaking condition, during listening, there is only a significant connectivity in low frequency to the left temporal area but not a reverse connection in the high frequencies."

      While Fig. 4 shows significant connectivity from R-CB6 to A5 in the gamma frequency range for the speaking, but not for the listening condition, interpreting comparisons between two effects without directly comparing them is a common statistical mistake (Makin and Orban de Xivry). The spectrally-resolved connectivity in the two conditions actually look remarkably similar and I would thus refrain from highlighting this statement and indicate clearly that there were no significant differences between the two conditions.

      (iv) "This result indicates that in low frequencies, the sensory-motor area and cerebellum predominantly transmit information, while in higher frequencies, they are more involved in receiving it."

      I don't think that this statement holds in its generality: L-CB6 and R-3b both show strong output at high frequencies, particularly in the speaking condition. While they seem to transmit information mainly to areas outside A5 and STS these effects are strong and should be discussed.

      We appreciate the reviewer's thoughtful comments. We acknowledge that not all connectivity patterns strictly adhere to the initial observation regarding feedback and feedforward communication. It's true that our primary focus was on interactions between brain regions known to be crucial for speech prediction, including auditory, somatosensory, and cerebellar areas. However, we also presented connectivity patterns across other regions to provide a more comprehensive picture of the speech network. We believe this broader perspective can be valuable for future research directions.

      Regarding the reviewer's observation about the alpha band peak in Figure 4, we agree that a closer examination reveals the connectivity from right cerebellum to the left parietal is in a wider low frequency range. We will refrain from solely emphasizing the alpha band and acknowledge the potential contribution of lower frequencies to cerebellar-parietal communication.

      We also appreciate the reviewer highlighting the need for a more nuanced interpretation of the listening condition connectivity compared to the speaking condition. The reviewer is correct in pointing out that while Figure 4 suggests a high-frequency connectivity from L-A5 to R-CB only in the speaking condition, a direct statistical comparison between conditions might not reveal a significant difference. We will revise the manuscript to clarify this point.

      Finally, a closer examination of Figure 3 revealed that the light purple and dark green edges in the speaking condition for R-CB6 and L-3b suggest outgoing connections at low frequencies, while other colored edges indicate information reception at high frequencies. We acknowledge that exceptions to this directional pattern might exist and warrant further investigation in future studies.

      (4) "However, definitive conclusions should be drawn with caution given recent studies raising concerns about the notion that top-down and bottom-up signals can only be transmitted via separate frequency channels (Ferro et al., 2021; Schneider et al., 2021; Vinck et al., 2023)."

      I appreciate this note of caution and think it would be useful if it were spelled out to the reader why this is the case so that they would be better able to grasp the main concerns here. For example, Schneider et al. make a strong point that we expect to find Granger-causality with a peak in a specific frequency band for areas that are anatomically connected when the sending area shows stronger activity in that band than the receiving one, simply because of the coherence of a signal with its own linear projection onto the other area. The direction of a Granger causal connection would in that case only indicate that one area shows stronger activity than the other in the given frequency band. I am wondering to what degree the reported connectivity pattern can be traced back to regional differences in frequency-specific source strength or to differences in source strength across the two conditions.

      This is indeed an important point. That is why we are discussing our results with great caution and specifically point the reader to the relevant literature. We are indeed thinking about a future study where we investigate this connectivity using other connectivity metrics and a detailed consideration of power.

      Reviewer #3 (Public Review):

      In the current paper, Abbasi et al. aimed to characterize and compare the patterns of functional connectivity across frequency bands (1 Hz - 90 Hz) between regions of a speech network derived from an online meta-analysis tool (Neurosynth.org) during speech production and perception. The authors present evidence for complex neural dynamics from which they highlight directional connectivity from the right cerebellum to left superior temporal areas in lower frequency bands (up to beta) and between the same regions in the opposite direction in the (lower) high gamma range (60-90 Hz). Abbasi et al. interpret their findings within the predictive coding framework, with the cerebellum and other "higher-order" (motor) regions transmitting top-down sensory predictions to "lower-order" (sensory) regions in the lower frequencies and prediction errors flowing in the opposite direction (i.e., bottom-up) from those sensory regions in the gamma band. They also report a negative correlation between the strength of this top-down functional connectivity and the alignment of superior temporal regions to the syllable rate of one's speech.

      Strengths:

      (1) The comprehensive characterization of functional connectivity during speaking and listening to speech may be valuable as a first step toward understanding the neural dynamics involved.

      (2) The inclusion of subcortical regions and connectivity profiles up to 90Hz using MEG is interesting and relatively novel.

      (3) The analysis pipeline is generally adequate for the exploratory nature of the work.

      Weaknesses:

      (1) The work is framed as a test of the predictive coding theory as it applies to speech production and perception, but the methodological approach is not suited to this endeavor.

      We agree that we cannot provide definite evidence for predictive coding in speech production and perception and we believe that we do not make that claim in the manuscript. However, our results are largely consistent with what can be expected based on predictive coding theory.

      (2) Because of their theoretical framework, the authors readily attribute roles or hierarchy to brain regions (e.g., higher- vs lower-order) and cognitive functions to observed connectivity patterns (e.g., feedforward vs feedback, predictions vs prediction errors) that cannot be determined from the data. Thus, many of the authors' claims are unsupported.

      We will revise the manuscript to more clearly differentiate our results (e.g. directed Granger-Causality from A to B) from their interpretation (potentially indicating feedforward or feedback signals).

      (3) The authors' theoretical stance seems to influence the presentation of the results, which may inadvertently misrepresent the (otherwise perfectly valid; cf. Abbasi et al., 2023) exploratory nature of the study. Thus, results about specific regions are often highlighted in figures (e.g., Figure 2 top row) and text without clear reasons.

      Our connectograms reveal a multitude of results that we hope is interesting to the community. At the same time the wealth of findings poses a problem for describing them. We did not see a better way then to highlight specific connections of interest.

      (4) Some of the key findings (e.g., connectivity in opposite directions in distinct frequency bands) feature in a previous publication and are, therefore, interesting but not novel.

      We actually see this as a strength of the current manuscript. The computation of connectivity is here extended to a much larger sample of brain areas. It is reassuring to see that the previously reported results generalise to other brain areas.

      (5) The quantitative comparison between speech production and perception is interesting but insufficiently motivated.

      We thank the reviewer for this comment. We have addressed that in detail in response to the point (1&4) of reviewer 1.

      (6) Details about the Neurosynth meta-analysis and subsequent selection of brain regions for the functional connectivity analyses are incomplete. Moreover, the use of the term 'Speech' in Neurosynth seems inappropriate (i.e., includes irrelevant works, yielding questionable results). The approach of using separate meta-analyses for 'Speech production' and 'Speech perception' taken by Abbasi et al. (2023) seems more principled. This approach would result, for example, in the inclusion of brain areas such as M1 and the BG that are relevant for speech production.

      We agree that there are inherent limitations in automated meta-analysis tools such as Neurosynth. Papers are used in the meta-analysis that might not be directly relevant. However, Neurosynth has proven its usefulness over many years and has been used in many studies. We also agree that our selection of brain areas is not complete. But Granger Causality analysis of every pair of ROIs leads to complex results and we had to limit our selection of areas.

      (7) The results involving subcortical regions are central to the paper, but no steps are taken to address the challenges involved in the analysis of subcortical activity using MEG. Additional methodological detail and analyses would be required to make these results more compelling. For example, it would be important to know what the coverage of the MEG system is, what head model was used for the source localization of cerebellar activity, and if specific preprocessing or additional analyses were performed to ensure that the localized subcortical activity (in particular) is valid.

      There is a large body of evidence demonstrating that MEG can record signals from deep brain areas such as thalamus and cerebellum including Attal & Schwarz 2013, Andersen et al, Neuroimage 2020; Piastra et al., 2020; Schnitzler et al., 2009. These and other studies provide evidence that state-of-the-art recording (with multichannel SQUID systems) and analysis is sufficient to allow reconstruction of subcortical areas. However, spatial resolution is clearly reduced for these deep areas. We will add a statement in the revised manuscript to acknowledge this limitation.

      (8) The results and methods are often detailed with important omissions (a speech-brain coupling analysis section is missing) and imprecisions (e.g., re: Figure 5; the Connectivity Analysis section is copy-pasted from their previous work), which makes it difficult to understand what is being examined and how. (It is also not good practice to refer the reader to previous publications for basic methodological details, for example, about the experimental paradigm and key analyses.) Conversely, some methodological details are given, e.g., the acquisition of EMG data, without further explanation of how those data were used in the current paper.

      We will revise the relevant sections of the manuscript.

      (9) The examination of gamma functional connectivity in the 60 - 90 Hz range could be better motivated. Although some citations involving short-range connectivity in these frequencies are given (e.g., within the visual system), a more compelling argument for looking at this frequency range for longer-range connectivity may be required.

      Given previous evidence of connectivity in the gamma band we think that it would be a weakness to exclude this frequency band from analysis.

      (10) The choice of source localization method (linearly constrained minimum variance) could be explained, particularly given that other methods (e.g. dynamic imaging of coherent sources) were specifically designed and might potentially be a better alternative for the types of analyses performed in the study.

      Both LCMV and DICS are beamforming methods. We used LCMV because we wanted used Granger Causality which requires broadband signals. DICS would only provide frequency-specific band-limited signals.

      (11) The mGC analysis needs to be more comprehensively detailed for the reader to be able to assess what is being reported and the strength of the evidence. Relatedly, first-level statistics (e.g., via estimation of the noise level) would make the mGC and DAI results more compelling.

      We perform group-level cluster-based statistics on mGC while correcting for multiple comparisons across frequency bands and brain parcels and report only significant results. This is an established approach that is routinely used in this type of studies.

      (12) Considering the exploratory nature of the study, it is essential for other researchers to continue investigating and validating the results presented in the current manuscript. Thus, it is concerning that data and scripts are not fully and openly available. Data need not be in its raw state to be shared and useful, which circumvents the stated data privacy concerns.

      We acknowledge the reviewer's concern regarding the full availability of the dataset. Due to privacy limitations on the collected data, we are unable to share it publicly at this time. However, to promote transparency and enable further exploration, we have provided the script used for data analysis and an example dataset. This example dataset should provide a clear understanding of the data structure and variables used in the analysis. Additionally, we are happy to share the complete dataset upon request from research teams interested in performing in-depth secondary analyses.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses to be addressed: 

      (1) More detail is required to understand the effects of genetic and drug manipulations on heart rate as these are important experiments. At the very least, a discussion on the limitations of these manipulations is needed. 

      - For example, how does one separate the pulsatile versus nutritive effects of blood flow/heartrate reduction? 

      - The conclusion that arterial SMC differentiation is driven by pulsatile blood flow needs to be toned down. Indeed, this conclusion is mainly supported by in vitro cell co-cultures exposed to laminar versus pulsatile flow. In vivo, reducing Tnnt2a expression affects cardiac contractility and blood flow does not selectively affect pulsatility. To make this conclusion, the authors would need an experimental means to selectively dampen the pulsatility of blood flow.

      We understand this concern and we toned down the statements related to the pulsatile flow of our conclusion by using 'flow' instead of 'pulsatile flow' in all text except for the in vitro co-cultures part. We also added a paragraph to discuss the limited capability of qualitatively reduce blood flow in vivo, and acknowledge that the effects of nutrients and flow reduction could not be uncoupled in live zebrafish embryos. We proposed that in the future, in vitro 3D vascular culture models may be combined with microfluidics to precisely calibrate nutrient composition in culture media, flow velocity and pulse; these methods would help address these questions more thoroughly. See page 11-12 line 312-322.

      (2) Since mural cells are sensitive to transmural pressure, could the authors elaborate on the potential role of raised intravascular pressure in SMC differentiation? This would better parallel rodents and humans. 

      We thank you for this suggestion. We added a paragraph to discuss the potential role of raised intravascular pressure in VSMC differentiation in the discussion section (see page 11 line 296-311).

      (3) The authors use nifedipine to reduce blood flow. Nifedipine is a specific and potent inhibitor of voltage-dependent calcium channels (VDCC) which are expressed in SMCs. Prior studies (PMID: 35588738) showed that VDCC blockers increased rather than inhibited SMC differentiation. Nifedipine is also likely to act upon VSMC calcium handling in the circle of Willis, which may in turn affect cell maturation. Could the authors comment on this seeming discrepancy?

      It is possible that off-target or indirect effects of Nifedipine decrease smooth muscle cell proliferation, or that altered cardiac contractility fundamentally alters aspects of vascular development other than blood flow. 

      - Additionally, it would be helpful to report the quantitative heart rate reduction achieved with Nifedipine. This would clear up concerns that the heart rate reduction is too large for normal vascular development to occur, and thus decrease proliferation rate independent of changes in blood flow pulsatility. 

      We concur with these comments, which is why our experimentation with Nifedipine is reinforced by employing an alternative, non-pharmacological strategy to inhibit blood flow: the use of morpholino against tnnt2a gene. The results with either Nifedipine or tnnt2a support the lack of VSMCs maturation. In addition, we provided the quantitative heart rate reduction achieved with Nifedipine shown in new Figure S2A-S2C, suggesting that the drug is not completely halting the heart rate but decreasing it. Nevertheless, we report that Zebrafish embryos can survive and develop a normal blood vascular system without any heartbeat. Hence, we exclude that the effect on VSMCs maturation is linked non-specifical effects caused by the loss of heartbeat. Nevertheless, we now acknowledged in our discussion the limitation of nifedipine, as it may affect VSMC through VDCCs (page 12, line 323-334).

      We also added a paragraph in the discussion section to compare nifedipine, an L-type VDCC blocker, and ML218, a T-type VDCC selective inhibitor from the previous study (Ando et al., 2022). We noted that in this previous study, the increase in VSMC differentiation only occur on anterior metencephalic central arteries (AMCtAs) that are more than 40 mm away from the BCA; these AMCtAs are much smaller than CoW arteries and have different geometry hence possible different kinetics of VSMC maturation (Ando et al., 2022) as our manuscript discovery would suggest.

      (4) The authors should provide more information on how blood flow velocity and wall shear stress are calculated from the Circle of Willis vascular structure. It is presumed that these values are dependent upon the 3-D morphology of the vessel network, as labeled by intravenous dextran dye, but this is not clear. (a second reviewer similarly comments: I was unclear how flow velocity values were obtained in Fig. 3E. Are they based on computational simulation, or are they experimentally calculated following the dextran injection?) Small local differences in vessel diameter and shape will influence blood flow velocity, but these morphological changes are not clearly articulated. Further, it is unclear how flow input levels to the CaDI and basilar arteries are decided across time points. For instance, is it possible to measure the blood flow speed empirically with line-scanning or high-speed tracking of labeled blood cells or particles? This would provide validation of the modeling results. 

      The computational fluid dynamic simulation was performed according to previous study from our lab (Barak et al., 2021). Blood flow velocity and wall shear stress are dependent upon the 3D morphology of the vessel network labeled by intravascular dextran. Details on how the computational fluid dynamic simulation was performed are added in method section page 17 line 433-449.

      Moreover, to address this reviewer concern we have now provided new experimental measurement of blood flow using the red blood cell (RBC) velocity via axial line scanning microscopy in Tg(kdrl:gfp;gata1:DsRed)zn1/sd2 zebrafish embryos at 54 hpf, 3 dpf, and 4 dpf. By using the experimental RBC velocity, we re-simulated the computational fluid dynamic. The new findings align with our conclusion and are further elaborated upon in response to this reviewer comment listed as point 6. Details on how RBC velocity calculated is added in method section page 16 line 414-431.

      (5) Does the cardiac injection of dextran itself affect the diameter of the arteries, given the invasiveness of the procedure? This could be examined in fish with a transgenic endothelial label with and without dextran. 

      Here, we performed an experiment on wildtype zebrafish at 5 days post-fertilization (dpf) with and without Dextran injection, examining the effects of Dextran injection on vessel diameters. As shown in the representative image below, the XZ panel clearly illustrates a Dextran-filled PCS vessel with no alteration in vessel size. Dextran microangiography, a technique employed to obtain vessel geometry with fluorescent microsphere, has been well established in zebrafish (Kamei et al., 2010). Our findings, demonstrating that Dextran does not affect vessel size, are consistent with previous studies utilizing Dextran microangiography.

      Author response image 1.

      (6) The data from the microangiography experiment in Figure 3 does not fully support the stated results. The authors report that the CaDI had the highest blood flow speed starting from 54 hpf, but it does not appear to be higher than the other arteries at this time point. Additionally, there is not sufficient evidence that wall shear stress coincides with smooth muscle cell differentiation in the CaDI. Wall shear stress appears to be similar between 54 hpf and 3 dpf in the CaDI, only increasing between 3 dpf and 4 dpf, while differentiation is shown to begin at 3 dpf. The authors need to address this and/or soften conclusions. 

      First, In response to this specific reviewer concern, we measured red blood cell (RBC) velocity by used axial line scanning microscopy to analyze Tg(kdrl:gfp;gata1:DsRed)zn1/sd2 zebrafish embryos (the detailed method was added in Method section in the manuscript). We replaced the computational simulated blood flow velocity by RBC velocity in new Figure 3E-3G, and re-run the computational simulated wall shear stress (WSS) using the RBC velocity in new Figure 3I-3K. We compared RBC velocity and WSS among different vessels at each time point. We confirmed that CaDI has the highest RBC velocity starting from 54 hpf to 4 dpf (new Figure 3A-3C, and 3E-3G) and found an overall increase in average WSS from 54 hpf to 4 dpf (new Figure 3A-3C, and 3H). Further, WSS in CaDI was significantly higher than BCA and PCS at 54 hpf, 3 dpf, and 4 dpf (new Figure 3A-3C, 3I-3K). Altogether, the CFD simulation suggests that CoW arteries experience different hemodynamic WSS that is associated with spatiotemporal pattern of VSMC differentiation on CoW arteries.”.  (Page 6, line 153-162)

      Second, to identify the correlation of WSS and VSMC differentiation in CaDI, we performed Pearson correlation analysis. In the image provided here, we plotted a linear regression with normalized # of acta2+ cells in CaDI and WSS with developmental stages (54 hpf, 3 and 4 dpf), and performed Pearson correlation coefficient analysis by using GraphPad Prism 10.0.3. The correlation coefficient r = 0.595, suggesting that the two variables (acta2+ cells and WSS) tend to increase together with developmental stages (54 hpf, 3 and 4 dpf).

      Author response image 2.

      Third, we softened our conclusion as the RBC velocity across CoW arteries was differentially distributed while VSMC differentiation occurred in these vessels.

      (7) It is unclear if acta2 expression is conferring vascular tone, as would be expected if the cells are behaving as mature VSMCs. Does arterial diameter decrease with an increase in acta2 expression? Are acta2-positive mural cells associated with more dynamic changes in arteriole diameter under basal or stimulated conditions? 

      Thanks for this interesting question. VSMC maturation and its vasoactivity could be further investigated in the future. Our study focused on early stage of VSMC differentiation, in which pdgfrb+ progenitors started to express VSMC marker acta2. We discussed the onset of transgelin expression and loss of abcc9 expression as markers of VSMC maturation. In addition, a previous study found that VSMC covered vessels in zebrafish brain dilate as early as 4 dpf and constrict at 6 dpf (Bahrami & Childs, 2020). Future study may focus on the association between expression of different VSMC markers and VSMC functional maturation. (page 10, line 272-279)

      (8) The authors argue that CoW vessels transition from venous to arterial identity (Fig. 1). However, kdrl is not an ideal arterial marker for this experiment as it is expressed in both arteries and veins. While it is true that many arterial beds have stronger kdrl expression than the veins, its expression in both arteries and veins changes with developmental stage, and its expression level may vary depending on the type of vessel. Therefore, showing that kdrl increases from 32 hpf - 4 dpf in CoW vessels is not convincing because its expression may increase in both venous or arterial vasculature as the vessels mature. In addition, flt4 expression is not exclusively venous; for example, it has noticeable expression in the dorsal aorta at 24-32 hpf stages. It would be helpful to confirm this transition by analyzing additional arterial and venous markers. 

      We acknowledge this and we added a paragraph to discuss the limitation. We combined loss of flt4 and increase in kdrl to establish the temporal sequence of circle of Willis morphogenesis, arterial specification, and VSMC differentiation. We acknowledge that additional arterial and venous markers need to be analyzed for a more thorough characterization of arterial specification in vertebrate brain vascular development. See page 12 line 335-341.

      (9) The authors show that acta2+ VSMCs are absent in tnnt2a MO embryos, concluding that blood flow is required for their differentiation from pericytes. However, there is no data showing that pericytes are still present in tnnt2a MO embryos. Although this has been previously shown by Ando et al 2016, it would be beneficial to confirm in the current study as this is a critical piece of evidence needed for this conclusion. 

      To determine if blood flow is dispensable for pdgfrb+ progenitor recruitment, we performed tnnt2a MO (0.35 ng/embryo) injection in Tg(pdgrb:egfp, kdrl:ras-mcherry) ncv22/s896. Loss of blood flow did not affect pdgfrb+ progenitor emergence around the CoW (new Figure S2G-S2H) at 3 days post fertilization (dpf). This is consistent with previous observation in Ando et al 2016 Figure S2C (Ando et al., 2016).

      (10) The authors show that klf2a MO injected embryos have a reduced number of VSMCs at 3 dpf but a normal number at 4 dpf (Fig. 6), concluding that klf2a is only important to initiate CaDI muscularization. If this is true, it would raise important questions about how VSMCs differentiate at a later stage in the absence of klf2a. For instance, is blood flow not required to differentiate at a later stage, or is there another factor that compensates in the absence of klf2a? The alternative explanation/ caveat is that klf2a MO loses efficacy with development, leading to the recovery of VSMCs at this stage. Therefore, it would be important to confirm this result using a genetic klf2a mutant. 

      Thank you for pointing this out.  We note that based on the klf2a reporter line, klf2a activity in CoW arterial endothelial cells is highly correlated with the number of acta2+ VSMCs in CaDI, BCA and PCS at 3 dpf (r = 0.974, new Figure S5J). Interestingly however, klf2a activity remained stable from 3 dpf to 4 dpf, well beyond initiation of VSMC differentiation. Thus, we speculate sustained klf2a expression may support further maturation of VSMCs, as acta2+ VSMCs showed distinct morphology at 4 dpf compared with 3 dpf. (Page 10, line 268-272). As for the observation that klf2a morphants have normal number of VSMCs at 4 dpf, we think that in addition to the temporary effect of morpholino, a proximal explanation is compensation by paralogous klf2b in zebrafish. We acknowledge that further characterization of CoW VSMC development in klf2a and klf2b double genetic mutants (Rasouli et al., 2018; Steed et al., 2016) may help determine whether klf2b compensates klf2a in CoW VSMC differentiation beyond 4 dpf. See page 10-11 line 292-295.

      (11) A large part of the discussion focuses on Notch and Wnt signaling, as downstream Klf2 effectors. While these are reasonable hypotheses to propose, there is no data on the involvement of these pathways in the current study. It seems excessive to speculate on detailed mechanisms of how Klf2 activates Notch and Wnt signaling in the absence of data showing that these pathways are affected in CoW vessels. Therefore, the discussion could be shortened here unless additional data can be obtained to demonstrate the involvement of these pathways in VSMCs in CoW.

      We concur and have condensed the discussion on Notch and Wnt signaling as downstream klf2 effectors.

      Minor comments: 

      (1) Line 138 "CaDI is the only vessels in the CoW receiving pulsatile arterial blood low ... ". Adding a reference to support this statement would be useful. 

      We agree and revised this sentence into ‘CaDI receive proximal arterial feed through lateral dorsal aorta from cardiac outflow tract (Isogai et al., 2001)’. It was also based on our general observation of zebrafish vascular anatomy and blood flow under a confocal microscope.

      (2) The image insets in Figs. 1A, 2A, 4E-L, 5A, 6A are quite small. Please make them larger to help the reader interpret the findings. 

      We agree. We maximized the image size to help the reader interpret the finding, and to visualize confocal images and schematics side-by-side.

      (3) The schematics in Figs. 1-2, and 4-6 are helpful, but the different cell types are difficult to see because they are small and their colors/shapes are not very distinct. 

      We agree. We increased the size and color contrast to provide better visualization of the schematics in new schematic Figures. 1-2 and 4-6.

      (4) It is stated that there are no diameter differences between different arteries, but statistics are not reported. 

      The statistics in Figure 3D were performed by ordinary two-way ANOVA followed by Tukey’s multiple comparisons test, with a single pooled variance. Here we added pairwise comparisons among vessels in the CoW. Hence when non indicated the difference are non-significant.

      (5) Figure 3F would be better visualized on a log scale, as it is difficult to see the differences between each post-fertilization timepoint. 

      We agree. In the new Figure 3H, the average wall shear stress (WSS) in CoW arteries is presented on log scale in y axis to see the differences between each post-fertilization timepoint.

      (6) Please provide more background and validation on the pericyte cell line, and their use for the questions in this study. 

      Thank you for the question, TgBAC(pdgfrb:egfp)ncv22 was generated and described by Ando et al 2016 to clarify mural cell coverage of vascular endothelium in zebrafish (Ando et al., 2016). We added a describe in the method section to provide background and validation on this pericyte line (see page 13 line 368-372).

      (7) Flow velocity and WSS changes are shown in each vessel in Figs. 3E,G. However, the comparison should be made between different types of vessels to see if there is a statistical difference and PCS, for example, which would explain differences in VSMC coverage. 

      We agreed. We compared the difference among arteries in the CoW at each developmental timepoint and performed ordinary one-way ANOVA with Tukey’s multiple comparisons test. Figure. 3E is replaced by new Figure. 3E-G and Figure. 3G is replaced by new Figure. 3I-K.

      (8) Similarly, between CaDI, the number of klf2a cells in Fig. 5B should be compared between different vessels, not between different stages of the same vessel. 

      We agree. In new Figure 5B-E, the number of klf2a+ cells per 100 μm vessel length are compared among different vessels at each developmental stage and analyzed by ordinary one-way ANOVA with Tukey’s multiple comparisons test.

      (9) When quantifying klf2+ cells in Fig. 5, it would be helpful to quantify klf2 expression level between cells in different vessels. This could be done by quantifying GFP expression in existing images. The difference in expression level may explain the variation between CaDI and PCS more accurately than just the difference in cell number. 

      The GFP expression reflect the stability of GFP protein expression and labels discrete nuclei with active klf2a expression. Hence the quantification of GFP level might not give an accurate readout of klf2a expression per se but rather of its activity. For this reason we don’t think that this experiment will add accurate measurement of klf2a expression.

      (10) Do data points in Figure 4D correspond to different cells in the same chamber experiment? If so, they cannot be treated as independent replicates. Each data point should correspond to an independent replicate experiment. 

      We agree. Now in the figure legend, we report the number of cells analyzed.

      (11) Graph placement is confusing in Figs. 4I, M. An adjacent Fig. 4G shows Nifedipine treated embryos, while the graph next to (Fig. 4I) shows acta+ cell number from tnnt2a 4 dpf experiment. Similarly, the bottom Fig. 4K tnn2a 4 dpf MO experiment has an adjacent graph Fig. 4M, which shows nifedipine treatment quantification, which makes it very confusing. 

      We agreed. We rearranged Figure 4E (representative images of control embryos at 3 dpf and 4 dpf), Figure 4F (tnnt2a MO embryos at 3 dpf and 4 dpf), Figure 4G (nifedipine treated embryos at 3 dpf and 4 dpf).

      Reference:

      Ando, K., Fukuhara, S., Izumi, N., Nakajima, H., Fukui, H., Kelsh, R. N., & Mochizuki, N. (2016). Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development, 143(8), 1328-1339. https://doi.org/10.1242/dev.132654

      Ando, K., Tong, L., Peng, D., Vazquez-Liebanas, E., Chiyoda, H., He, L., Liu, J., Kawakami, K., Mochizuki, N., Fukuhara, S., Grutzendler, J., & Betsholtz, C. (2022). KCNJ8/ABCC9-containing K-ATP channel modulates brain vascular smooth muscle development and neurovascular coupling. Dev Cell, 57(11), 1383-1399 e1387. https://doi.org/10.1016/j.devcel.2022.04.019

      Bahrami, N., & Childs, S. J. (2020). Development of vascular regulation in the zebrafish embryo. Development, 147(10). https://doi.org/10.1242/dev.183061

      Barak, T., Ristori, E., Ercan-Sencicek, A. G., Miyagishima, D. F., Nelson-Williams, C., Dong, W., Jin, S. C., Prendergast, A., Armero, W., Henegariu, O., Erson-Omay, E. Z., Harmanci, A. S., Guy, M., Gultekin, B., Kilic, D., Rai, D. K., Goc, N., Aguilera, S. M., Gulez, B., . . . Gunel, M. (2021). PPIL4 is essential for brain angiogenesis and implicated in intracranial aneurysms in humans. Nat Med, 27(12), 2165-2175. https://doi.org/10.1038/s41591-021-01572-7

      Isogai, S., Horiguchi, M., & Weinstein, B. M. (2001). The vascular anatomy of the developing zebrafish: an atlas of embryonic and early larval development. Dev Biol, 230(2), 278-301. https://doi.org/10.1006/dbio.2000.9995

      Kamei, M., Isogai, S., Pan, W., & Weinstein, B. M. (2010). Imaging blood vessels in the zebrafish. In Methods in cell biology (Vol. 100, pp. 27-54). Elsevier.

      Rasouli, S. J., El-Brolosy, M., Tsedeke, A. T., Bensimon-Brito, A., Ghanbari, P., Maischein, H. M., Kuenne, C., & Stainier, D. Y. (2018). The flow responsive transcription factor Klf2 is required for myocardial wall integrity by modulating Fgf signaling. Elife, 7. https://doi.org/10.7554/eLife.38889

      Steed, E., Faggianelli, N., Roth, S., Ramspacher, C., Concordet, J. P., & Vermot, J. (2016). klf2a couples mechanotransduction and zebrafish valve morphogenesis through fibronectin synthesis. Nat Commun, 7, 11646. https://doi.org/10.1038/ncomms11646

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)):_ _ __ In this manuscript, Jones et al. report on a potential role for fam83fa in zebrafish hatching, radiation response and autophagy. The authors are commended for generating multiple KO lines and maternal-zygotic embryos for analysis. However, important controls are lacking and the data is circumstantial throughout with very little mechanistic insight into the precise roles, if any, of fam83f in these processes.

      We thank the reviewer for recognizing the strengths of our manuscript, and highlighting areas we might improve. Please see the specific comments below addressing the points raised. In respect of mechanistic insight, while we agree that our manuscript does not provide this, it was not intended to. Rather, we aim to communicate our descriptive findings on the role of Fam83fa in vivo, providing data for follow-up studies by other researchers into the mechanistic role of Fam83fa.

      1. Validation of the KO phenotypes (hatching, IR sensitivity) requires rescue with WT fam83fa WT mRNA, but not 1-500 or fam83fb mRNA.

      We thank the reviewer for raising the issue of rescue experiments. Such experiments are frequently used in knock-down experiments, where non-specificity may be a problem, but they are used more rarely in genetic knock-outs, where the gene defect is well defined. In the case of Fam83fa, a particular difficulty is that overexpression of fam83fa itself causes a p53-mediated DNA damage response (DDR) (Salama et al., 2019). Moreover, we have shown by both qRT-PCR and western blotting that injection of fam83fa mRNA into zebrafish embryos (the traditional technique by which rescue experiments are performed) induces a p53-mediated DDR. As a result, it would be very difficult to interpret the results of any rescue experiment, because one would have to be absolutely certain that levels of fam83fa re-expression recapitulate and do not exceed endogenous levels. As a tool for specificity, we therefore used more than one fam83fa-/- mutant line, carrying a different genomic mutation, and validated that the same phenotype was present in both. We are happy to provide the qRT-PCR and western blot data confirming the results of fam83fa mRNA injection, if required. We have included an additional section into the manuscript detailing this issue. 2.

      While the hatching phenotype (Fig 3) is convincing, there is no data on HG development in the null embryos. Does the HG develop normally in the absence of fam83fb? If so, this would support the authors conclusions that the role of fam83fb is functional rather than developmental (indirect effect). In situs as in Fig.1 might be helpful here.

      Thank you to the reviewer for this helpful suggestion. We agree that we did not investigate whether the hatching gland develops normally in the MZ-fam83fa-/- mutant embryos. No gross morphological differences were observed that led us to investigate this, although we agree it is an interesting question for a future project. In terms of functional vs developmental effects, we are confident that MZ-fam83fa-/- mutant embryos develop at a normal temporal rate, as evidenced by the machine learning based classifier used to assess temporal developmental trajectory (Figure S3 and Jones et al., 2022, 2024). This strongly suggests that the effect of fam83fa KO is functional rather than indirect and caused by (for example) developmental delay.

      While the IR sensitivity phenotype (Fig S4) is convincing, IR-induced cell death/apoptosis was not analyzed. There is a large literature describing straightforward assays for cell death/apoptosis detection in zebrafish with assays such as acridine orange or TUNEL labeling, or active casp3 whole-mount IF. Is IR-induced cell death enhanced in fam83fa KOs?

      We thank the reviewer for their positive comments and agree that investigating the nature of the cell death occurring following IR would be very interesting. We did make use of both acridine orange and TUNEL labeling following injection of fam83fa mRNA (see 1 above), and whilst the assays themselves were relatively straightforward, due to technical issues the quantification of fluorescence intensity was not. Similarly, we suspect that a significant degree of necrosis is also occurring, which further complicates the issue of data interpretation from both these approaches. We do, however, think this is an important avenue of questioning, and hope that other researchers will explore the mechanism of IR induced cell death in the MZ-fam83fa-/- mutants in the future,

      Similarly, there are multiple tools to assay autophagy in zebrafish (e.g., Moss et al., Histochem Cell Biol 2020, PMC7609422; Mathai et al., Cells 2017, PMC5617967). Is autophagy affected in the KOs, with or without IR? These experiments might directly implicate fam83fa in autophagy.

      We agree that there are exciting tools with which to assay autophagy in zebrafish, and although we considered some of these, including caudal fin regeneration, we deemed these experiments to be beyond the descriptive scope of this paper, given the time and resources available to us. We hope that other researchers will use our data as a basis for investigating the role of Fam83fa in autophagy further, using assays such as these suggested by the reviewer.

      Figure 4: Isn't there a slight reduction in p53 induction at 10 hours?

      Although the western blot in Figure 4A gives this impression, this is probably due to loading variability (see the anti-β-actin loading control band). Moreover, over three independent experiments (Figure 4B), this apparent difference is not statistically significant. Taken together with other evidence that the p53-mediated DNA damage response is not affected in MZ-fam83fa-/- mutants, we are confident there is no detectable change in the level of stabilized p53 in the MZ-fam83fa-/- mutants compared to WT.

      Given the widely documented, dominant role of p53 in zebrafish IR-sensitivity, the authors should test if the IR sensitivity of fam83fa KO animals is p53-dependent, ideally via a cross into p53 null, but at least via injection of p53 morpholinos.

      We agree that p53 is widely documented as playing an essential role in the IR induced DNA damage response in zebrafish. All our experiments suggest there is no difference between the levels of p53 (protein or mRNA) or any of the p53-induced downstream effectors (that we tested) in MZ-fam83fa-/- mutants compared to WT embryos. This was true whether or not the embryos were subjected to genotoxic stressors, including IR treatment. We therefore conclude that the increased sensitivity phenotype we observe as a result of loss of Fam83fa is not caused by a change in p53 activity, at least not as part of the DNA damage response.

      Do autophagy inhibitors phenocopy the hatching and IR-sensitivity defects of fam83fa embryos? Do the inhibitors exacerbate the mutant phenotypes or synergize with M or Z mutant phenotypes? (I may have missed this but do M and Z fam83fa null embryos have any phenotype? Or do the phenotypes only manifest in MZ embryos?)

      This is an excellent question, and indeed one we attempted to address. We tried to optimize several autophagy inhibitors including bafilomycin A1, chloroquine and wortmannin, as well as the proteasomal inhibitor MG132. In addition, we tried to optimize the autophagy promoters Torin1 and rapamycin. Unfortunately, we regularly saw global effects in zebrafish embryos that were difficult to characterize and control by dosage. At the same time, we were also working to confirm the specific effects of these drugs on autophagy using p62 and LC3-I and LC3-II western blots, which themselves were difficult to optimize. We attempted to optimize these experiments for 6 months before the COVID lockdown occurred, at which point they were abandoned. We would be delighted for future researchers to continue these experiments, as we are now unable to pursue this further due to closure of the Smith lab, but we agree that these are very pertinent questions. We hope the descriptive data provided in our paper will prompt other researchers in the autophagy field to further explore the role of Fam83fa in autophagy. In response to the zygotic phenotype question, this was something we did not investigate. As there was no immediately apparent phenotype in the zygotic generation, for ease of screening larger numbers of embryos we proceeded immediately to the maternal-zygotic (MZ) generation.

      Reviewer #1 (Significance (Required)):

      The role of Fam83f is not known. This study in zebrafish might be the first to clarify the function of this protein in vivo.

      We thank the reviewer for this positive insight, and we agree that our work is the first do so in vivo.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Fam83f is one of the proteins about which little is known. The authors Jones et al., tried to shed light on Fam83f function by knocking out the gene in zebrafish. Here they found that fam83 is expressed in the hatching gland and that larvae without Fam83f hatch significantly earlier than wild-type animals. The authors furthermore investigated the response of fam83f knock-out animals to DNA damage and found increased sensitivity to ionizing radiation and MMS. In order to find out more about Fam83f function in the DNA damage response, the authors performed RNA-seq after employing DNA damage and here they saw upregulation of several autophagy/lysosome-associated proteins and downregulation of some phosphatidylinositol-3-phosphate binding proteins, among others. Finally, the authors found that Fam83f is targeted to the lysosome. The manuscript is overall well written and clear in its general statement.

      We thank the reviewer for their encouraging comments.

      In the manuscript, the authors describe the investigation of several aspects of Fam83f function and particularly the role in hatching seems to be important for Fam83f as the gene is strongly expressed in the hatching gland and its absence leads to a clear and considerable earlier hatching. Unfortunately, all aspects of Fam83f function that are described in the manuscript are investigated very superficially, the conclusions are not supported by data and important controls are lacking. As such, the RNA-seq results are not confirmed by qRT-PCR, the role of the Fam83f LIR domain is not confirmed by co-IPs and it has not been investigated whether the presence of Fam83f in lysosomes is due to its degradation or whether it has a function in this cellular compartment.

      We thank the reviewer for their input and will address each point raised below: -

      • All aspects of Fam83f function are investigated superficially.

      We agree that we have not provided an in-depth analysis of the mechanistic role of Fam83fa. It was because there were so many roles that we decided to make this paper rather descriptive in nature, hoping that the observations will prove useful to other researchers who may wish to define the mechanistic roles of Fam83fa more deeply. Even without in-depth investigation, our findings are previously unreported and the phenotypes we report are clear. We have amended our manuscript to make it apparent that this paper is intended to be descriptive in nature, and we hope this addresses this issue.

      • Important controls are lacking - RNA-seq results are not confirmed by qRT-PCR

      We thank the reviewer for their comment. We did not include qRT-PCR data as a control for the RNA-seq data because 1) each RNA-seq experiment was repeated on three biological replicates across three independent experiments and 2) we conducted RNA-seq on two different MZ-fam83fa-/- mutant lines and only considered genes that were mis-regulated in both mutants. Taken together, we considered this to be sufficient validation for the manuscript. However, we also performed confirmatory qRT-PCR for several of the differentially expressed genes identified, including the three main PI(3)P binding genes. We have now included these data in the supplementary information as an additional control - see Figure S6G which is now also referred to in the main text, and additional primer sequences have been added to Table S1.

      • The role of the Fam83f LIR domain is not confirmed by co-Ips

      We agree with the reviewer that this is an important experiment, and we worked closely with Dr Brian Ludwig and Dr Karen Vousden (The Francis Crick Institute) to test this. We tried to express zebrafish Atg8 and Gabarap (the two main ATG8 proteins that bind to LIR domains) but were unable to express sufficient levels of protein to perform the co-Ips. The text in the manuscript has now been amended to reflect that this experiment is required to confirm the role of the putative LIR domain in Fam83fa.

      • *it has not been investigated whether the presence of Fam83f in lysosomes is due to its degradation or whether it has a function in this cellular compartment *

      Whilst we agree with the reviewer that this is an important question, we did not intend this paper to expand beyond a descriptive role of the observations we made following the loss of Fam83fa in vivo. These are important questions to follow up on to determine the mechanism of action of Fam83fa, and we hope that other researchers will pursue these avenues of investigation following the publication of our observations.

      Also, there is no leading concept in the manuscript. Starting from a role in hatching, the authors go to the DNA damage response and finally to the presence of Fam83f in lysosomes. How are these different aspects linked? Is the presence of Fam83f in lysosomes important for the suppression of hatching and how does Fam83f delays this process? (One would have wished that the authors would not have been that broad and were more focused on a particular aspect which then could have been investigated in depth.)

      We agree with the reviewer that the paper gives a broad overview of our observations and does not examine the underlying mechanisms in detail. However, we believe that descriptive papers such as this, where observations following genetic perturbation are reported, are equally important, providing as they do important foundational data for other researchers to take forward. We do postulate on the links between the hatching, DNA damage and lysosomal phenotypes we observe in the discussion section, and we have expanded on this following the reviewers' comments, to make our hypothesized link between these phenomena clearer.

      Specific comments: - All materials should be described in material and methods including the antibodies that have been used

      The antibodies used together with concentrations and catalog numbers are now in Materials and Methods

      • Abbreviations should be explained

      The manuscript has been revised to ensure all abbreviations are explained. We thank the reviewer for bringing this oversight to our attention.

      • Figure 4A: Levels of p53 should also be shown for untreated fam83f -/-KO1 and KO2 animals

      The authors thank the reviewers for raising this point. Extracts from untreated MZ-fam83fa-/- KO1 and KO2 embryos were not included on this particular blot, as p53 was observed to be undetectable in all embryos, across all our experiments (WT and both mutants) unless genotoxic stress was applied. No quantification could therefore be performed as the expression level was essentially zero. However, we have now included an example p53 western blot in Supplemental Figure 5A, which shows WT, MZ-fam83fa-/- KO1 and MZ-fam83fa-/- KO2 untreated blots for p53 (all undetectable) alongside treated embryos (detected).

      • Some references are missing (e.g. page 17, lane 320/321: As this group of cells arises....)

      This citation and reference have now been added; thank you to the reviewer for highlighting this omission.

      • Lane 369: The authors write about 4 KO lines but only two are shown in the figure.

      We thank the reviewer for this observation. In Figure 2B only KO1 and KO2 schematic diagrams are shown for simplicity (as these are the lines taken forward for further investigation). We have now amended the manuscript text to make this clear.

      • Lane 374/375: The NMD is not proven

      Absolutely - we have now revised the text to change this sentence accordingly and thank the reviewer for noting this.

      • Lane 380: how can RNA levels of fam83fa be upregulated when the gene has been knocked out? Why are these genes only upregulated in KO1? How relevant is this?

      This was a typographical error, and we are very grateful to the reviewer for picking up on this. It should have read 'fam83fb'. As nonsense-mediated decay and associated transcriptional adaptation have been previously reported in zebrafish, this finding may be of considerable interest to the community. It is a side observation, and not necessarily directly related to the role of Fam83fa in vivo, but we felt it important to include. Indeed, as a result of this observation we have recently shared our MZ-fam83fa-/- lines with another group who are planning to investigate precisely this question - why are fam83fb and fam83g only upregulated in KO1?

      • Figure 3C is not mentioned in the text and lacks any labelling

      Figure 3C is now clearly referred to in the text and a label added to the figure.

      • Lane 434/435: all relevant data should be shown (can be done as supplementary figure)

      We have now amended this to include an additional supplemental figure (Figure S5A).

      • Lane 434: The reference to the figure seems to be incorrect (5A4A)

      Amended accordingly - thank you for pointing out this mistake.

      • Figure 4C and 4D: what is the difference?

      Thank you to the reviewer for noticing this omission. These data are from t1 (+2hrs) and t2 (+10hrs) and have now been labelled accordingly.

      • S5C and S5D: why are there 3 clusters?

      We thank the reviewer for raising this as it has provided us with an opportunity to present our data more clearly. There are 3 clusters that represent the combination of the two first principal components, which are time and treatment. Therefore, the clusters represent i) untreated at t1, ii) treated at t1 and iii) treated at t2. However, having two plots with different color schemes made this confusing/misleading. We have now replaced the two PCA plots with one that is colored and labelled accordingly with the 3 aforementioned clusters.

      • Lane 495 to 505: What does this mean that the GO analysis shows upregulation and downregulation of endopeptidases and why "in contrast"?

      We thank the reviewer for this comment, and we agree that this paragraph was misleading/confusing. This has now been rewritten in the main text, clarifying that endopeptidases were consistently upregulated at both timepoints.

      Reviewer #2 (Significance (Required)):

      The strength of the manuscript is certainly that it provides inside into Fam83f function as there is not much known about Fam83f.

      We thank the reviewer for the positive comment, and we agree that very little is known about this highly conserved protein.

      These study is probably most interesting for people in the zebrafish and related fields as the authors convincingly show the expression of Fam83f in the hatching gland and also the earlier hatching in the absence of the protein is very clear.

      Thank you for the positive feedback.

      The weakness of the study is clearly that it does not provide an in-depth analysis. As such, it shows that Fam83f is involved in hatching and can delay the process but it remains elusive how this is achieved. (Likwise, also the investigation into the DNA damage response remains very superficial and does not prove a specific role for Fam83f in the DNA damage response or whether the increased sensitivity is more unspecifically caused by the absence of a gene or eventually even connected to the earlier hatching.

      Please refer to responses above (and changes made to the manuscript) clarifying that this study is intended to be descriptive, and provides important foundational data for further in-depth mechanistic studies by other researchers interested in the role of Fam83fa in vivo.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)):_ _ __ In their manuscript "Zebrafish reveal new roles for Fam83f in hatching and the DNA damage-mediated autophagic response", Jones et al. provide an interesting exploration for the function of a poorly studied protein, Fam83f in embryonic development. Using the zebrafish as a model organism, the study combines loss-of-function genetics, phenotypic analysis and RNA-sequencing to characterize and explore the result of Fam83f loss. Upon critical review of the manuscript and the results we offer suggestions to improve the manuscript (see 'minor technical issues'). Additionally, we would like to highlight a weakness of the study in making the connection between Fam83f to the observed phenotype (increased sensitivity to DNA damage), see 'major issues'.

      Major issues:

      Most of our concern stems from relatively incomplete connection of the loss of fam83f to increased sensitivity to DNA-damage and lysosome function.

      Please refer to comments above and changes made to the manuscript to clarify this is a descriptive paper that is not intended to provide in-depth mechanistic insight into the role of Fam83fa.

      Is the increased sensitivity in fam83f KO embryos a direct effect to fam83f loss? A rescue experiment (by introduction of Fam83fa mRNA into their KO2 fish line) in the presence of ionizing radiation would help us understand the functional role of this protein in this process. Furthermore, can overexpression of any of the down-regulated genes involved in lysosome function restore the early hatching phenotype or the sensitivity to DNA damage? Fam83fa rescue experiments would be very difficult to interpret - please see comments above and the corresponding changes to our manuscript.

      In terms of over-expressing some of the downregulated genes identified in the RNA-seq and qRT-PCR to see if the phenotype can be rescued, we feel these are excellent suggestions and we hope other researchers in future will attempt such experiments.

      Minor technical issues:

      -Methods line 203, clarify how many embryos were used per sample for RNA-seq (this was only described as 15 embryos in the main body results text).

      Text has been amended to clarify this. We thank the reviewer for noticing this oversight.

      -Comment about the expansion of fam83f orthologs in mammals (8) as opposed to only 2 in zebrafish

      We apologize for any confusion: mammals do not have 8 fam83f orthologs. Mammals and zebrafish have 8 FAM83 genes (FAM83A-FAM83H). Zebrafish, unlike mammals, have genome duplication and although mammals have only one FAM83F gene, zebrafish have two: Fam83fa and Fam83fb. We trust this clarifies this issue and believe this to be clear in our main text. However, we are happy to make any suggested amendments should the reviewer consider our wording confusing.

      -Supplementary figure 1C: please include representative images of secondary axis formation in fam83fa overexpressed Xenopus embryos.

      We have not included any images as these are already published in our related paper on FAM83F (Dunbar et al., 2020) which we refer to in the figure legend text. No additional images were captured specifically for this publication.

      -Provide more information about the mis-regulated genes in the RNA-seq analysis, how many are up or down regulated? Perhaps a better plot than a Venn diagram can be an MA-plot with the Venn diagram moved to a supplementary figure.

      The Venn diagrams in Figure 5A-C are to illustrate the number of differentially expressed genes that are shared between KO1 and KO2 (whether up or down regulated), and only those that are common to both lines are taken forward. Following the reviewer's comments, we have now displayed the behavior of the common genes across all replicates in one heatmap, with the data normalized to the WT untreated samples, and the normalized variance stabilized count indicates whether a gene is up or down regulated across each of the replicates and conditions. We believe this addresses the reviewer's comment as these data are now displayed in a more direct way and the genes that are consistently up or downregulated across all replicates (and indeed those that are not) can be clearly seen. We thank the reviewer for raising this and improving our data representation.

      -A better comparison of mis-regulated genes in the fam83f knockouts would be a comparison of KO2 and perhaps KO3, as the compensatory effects in KO1 can lead to additional indirect effect on the transcriptome. We understand the time and cost involved in this experiment and suggest that the differential gene expression analysis be performed individually on up or down regulated genes from KO2, or a comparison of such analysis will be provided with the differential gene expression analysis that was performed on shared mis-regulated genes between KO1 and KO2.

      The reviewer raises an excellent point. At the time of experimental design, we were concerned that omitting KO1 in favor of another line (e.g. KO3) would bias our results by excluding potentially important data. Similarly, as transcriptional adaptation occurs in a sequence specific manner, and the phenotype was present in KO1 regardless, we didn't want to exclude these data. However, with hindsight, we agree that it may have been prudent to exclude KO1 on this basis, and we may have seen an increased concordance of differentially expressed genes (DEGs) between KO2 and KO3. However, this is not possible to repeat now due to the Smith lab closing, and our documented findings are valid and important regardless. We acknowledge however that, with hindsight, what the reviewer suggests may have been better experimental design.

      -Can you confirm with the RNA-seq analysis that fam83g is upregulated in KO1 as opposed to KO2? (i.e. can the compensatory analysis you have observed with qRT-PCR be confirmed with the RNA-seq data?)

      This is an excellent question, and we thank the reviewer for raising this. fam83fb passed our threshold for significance to be deemed as differentially expressed (upregulated) in KO1 only, in accordance with our qRT-PCR data. fam83g did not pass the significance threshold, but perhaps this is not surprising as both fam83fb and fam83g are expressed at particularly low levels to start with and would probably require much greater sequencing depth to be detected.

      Reviewer #3 (Significance (Required)):

      There is fundamental value in clarifying the in vivo function of poorly characterized protein-coding genes. This study fills a gap in the literature, but the broader conceptual impact is limited. The authors do a thorough job at generating and characterizing CRISPR/Cas9 mediated knock-out zebrafish animals. It is further commended that the authors do a meticulous job in a quantitative description of the resulting phenotype. This is a thorough study, with the only major concern being the lack of rescue experiments that would be needed to substantiate the the role of fam83f in sensitivity to DNA damage and lysosome function.

      We thank the reviewer for their comments and trust we have addressed the issues concerned with the changes described above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Perampalam et al. describe novel methods for genome-wide CRISPR screening to identify and validate genes essential for HGSOC spheroid viability. In this study, they report that Netrin signaling is essential for maintaining disseminated cancer spheroid survival, wherein overexpression of Netrin pathway genes increases tumor burden in a xenograft model of ovarian cancer. They also show that high netrin expression correlates with poor survival outcomes in ovarian cancer patients. The study provides insights into the biology of netrin signaling in DTC cluster survival and warrants development of therapies to block netrin signaling for treating serous ovarian cancer.

      Strengths:

      - The study identifies Netrin signaling to be important in disseminated cancer spheroid survival

      - A Novel GO-CRISPR methodology was used to find key genes and pathways essential for disseminated cancer cell survival

      Thanks for the endorsement of our work and its importance to metastasis in ovarian cancer.

      Weaknesses:

      - The term dormancy is not fully validated and requires additional confirmation to claim the importance of Netrin signaling in "dormant" cancer survival.

      - Findings shown in the study largely relate to cancer dissemination and DTS survival rather than cancer dormancy.

      Much of the validation of dormancy and cell cycle arrest in HGSOC spheroids, as well as the culture model, have been published previously and hence was not repeated here.  I think this reviewer will appreciate the updated citations and explanations to better illustrate the state of knowledge.  We have also added new experiments that further emphasize the dormant state of spheroid cells in culture and xenografts, as well as patient derived spheroids used in this study.

      Reviewer #1 (Recommendations for Authors):

      (1) It is unclear what spheroid/adherent enrichment ratio is and how it ties into genes affecting cell viability. Why is an ER below 1 the criteria for selecting survival genes?

      Our screen uses the ‘guide only’ comparison in each culture condition to establish a gene score under that specific condition.  A low adherent score captures genes that are essential under standard culture conditions where cells are proliferating and this can include genes needed for proliferation or other basic functions in cell physiology.  A low spheroid score identifies the genes that are most depleted in suspension when cells are growth arrested and this is an indication of cell death in this condition.  Since gene knock outs are first established in adherent proliferating conditions, essential genes under these conditions will already start to become depleted from the population before suspension culture.  By selecting genes with a ratio of <1 we can identify those that are most relevant to dormant suspension culture conditions.  Ultimately, the lowest enrichment ratio scores represent genes whose loss of function is dispensable in the initial adherent condition, but critical for survival in suspension and this is what we aimed to identify. We’ve updated Figure 1B to illustrate this and we’ve updated the explanation of the enrichment ratio on page 6, lines 144 to 147 of the results.

      (2) The WB for phospho-p38 in figure 1A for OVCAR8 line does not show increased phosphorylation in the spheroid relative to the adherent. If anything, phospho-p38 appears to be reduced in the spheroid. Can the authors provide a better western blot?

      We’ve updated this blot with a longer exposure, see Figure 1A.  Phosphorylation levels of p38 are essentially unchanged in OVCAR8 cells in suspension culture, although the overall levels of p38 may be slightly reduced in dormant culture conditions.

      (3) How did the authors confirm dormancy apart from western blot for phospho-ERK vs phospho-p38? Authors should add EdU/BrdU staining and/or Ki67 staining to confirm dormancy.

      Previous publications that appear as citations 7,10, and 33 in the reference list established the growth arrest state of these cells in suspension culture in the past.  This included measuring other known markers of dormancy and quiescence such as p27, p130, and reduced cyclin/cdk activity and 3H-thymidine incorporation. In addition, other associated characteristics of dormancy such as EMT and catabolic metabolism have been demonstrated in these culture conditions (see citation 11 and Rafehi et al. Endocr. Relat. Cancer 23;147-59).  We’ve added these additional citations to our descriptions of dormant spheroid culture to better clarify the status of these cells in our experiments (see page 6, lines 126-28).  To ensure that cells are growth arrested in the experiments shown in this paper, we have updated Figure 1A to include blots of p130 and Ki67 to further emphasize that spheroid cells are not proliferating as the quiescence marker (p130) is high and the proliferative marker (Ki67) is lost in suspension culture.

      (4) Can the authors report spheroid volume over time in culture? How was viability measured?

      We’ve updated the methods (see page 27, line 574) to better highlight the description of cell survival that answers both of these questions. At the ends of experimental time points in both the screen and viability assays we captured live cells by replating on adherent plasticware. We fixed and stained with crystal violet and photographed plates to illustrate the sizes of spheroids (shown in Fig. 2 Supplement 1E, Fig. 6C, and 7D). We subsequently extracted the dye and quantitated it spectrophotometrically to quantitatively compare biomass of viable cells between experiments irrespective of the relatively random shapes of spheroids. We found reattachment and staining in this manner to match traditional viability assays such as CellTiter-Glo in a previous paper (10). Furthermore, biomass never increases in culture and diminishes gradually over time in culture consistent with the non-proliferative state of these experiments. Double checks of this equivalency of viability and reattached biomass measurments, as well as demonstrating that biomass is lost over time, are shown in Fig. 2 Supplement 1E that compares reattached crystal violet staining measurements with CellTiter-Glo for DYRK1A knock out cells over time in culture. In addition, we include a comparison of crystal violet staining of reattached spheroids with trypan blue dye exclusion in Fig. 5G and H. In both cases reattachment and more direct viability assays demonstrate the same conclusion that Netrin signaling supports viability in dormant culture.

      (5) Please show survival significance of Netrin signaling genes in recurrence/relapse free survival to claim importance in cancer dormancy.

      See Fig. 7 Supplement 1C where we include the recurrence free survival data. Netrin-1, and -3 high expressors also have a numerically shorter progression free survival but it is not statistically significant. Netrin-1 overexpression alone is also shown and it shows shorter survival with a P-value of 0.0735. Elevated survival of dormant cells in a residual disease state is expected to increase the chance of relapse and shorten this interval. Thus, this data is consistent with our model, but lacks statistical significance. 

      There are many alternative ways to interpret what shorter progression free survival, or overall survival, may mean biologically. Since survival of dormant cells is but one of them, we also added new data to experimentally investigate the role of endogenous Netrin signaling in dormant residual disease in Fig. 6 and described on page 12, lines 266-87.  We used xenograft experiments to show OVCAR8 spheroids form and withdraw from the cell cycle equivalently to suspension culture following intraperitoneal injection.  Furthermore, loss of Netrin signaling due to receptor deletions compromises survival during this early window before disseminated lesions form.  This argues that Netrin signaling contributes to survival during this window of dormancy.  In addition, mice engrafted with mutant cells experience prolonged survival when Netrin signaling is blocked.  Together, these experiments further argue that Netrin signaling supports survival in the dormant, non-proliferative phase, and leads to reduced survival of mice.

      (6) The authors show IHC staining of patient ascites derived HGSOC spheroids. However, no marker for dormancy is shown in these spheroids. Adding Ki67 staining or phospho-ERK vs phospho-p38 would be necessary to confirm cancer dormancy.

      We have added new staining for Ki67 and p130 that compares these markers in HGSOC tumors where Ki67 is high and p130 is low with ascites derived spheroids where staining is the opposite. Importantly, expression of p130 is linked to cellular quiescence and is not found to accumulate in the nucleus of cells that are just transiting through G1.  This confirms that the ascites derived spheroids are dormant.  See Fig. 4A-E and described on page 9, lines 201-7.

      (7) Overall, the findings are interesting in the context of cancer dissemination. There is not enough evidence for cancer dormancy and the importance of Netrin signaling in the survival of cancer dormancy. Overexpression of Netrin increases phosphorylation of ERK, leading one to expect an increase in proliferation. This suggests that Netrin breaks cancer cells out of dormancy, into a proliferative state.

      We have found that the discovery of Netrin activation of MEK-ERK in growth arrested cells is counterintuitive to many cancer researchers.  However, this axis exists in other paradigms of Netrin signaling in axon outgrowth that are not proliferation related (see citation 26, Forcet et al. Nature 417; 443-7 as an example).  We have added Fig. 5D and descriptions on page 11, lines 244-52 to better clarify that Netrins CAN’T induce cell proliferation through ERK.  Addition of recombinant Netrin-1 can only induce ERK phosphorylation in suspension culture conditions and not in quiescent adherent conditions.  The small magnitude of ERK phosphorylation induced by Netrin-1 in suspension compared to treating adherent, quiescent cells with the same concentration of mitogenic EGF further emphasizes that this is not a proliferative signal.  Lastly, the new xenograft experiment in Fig. 6A-D (described on page 12, lines 266-81 demonstrates the growth arrested context in which Netrin signaling in dormant spheroids leads supports viability.

      (8) If authors wish to claim cancer dormancy as the premise of their study, additional confirmatory experiments are required to support their claims. Alternatively, based on the current findings of the study, it would be best to change the premise of the article to Netrin signaling in cancer dissemination and survival of disseminated cancer spheroids rather than cancer dormancy.

      I expect that this reviewer will agree that we have added more than sufficient explanations of background work on HGSOC spheroid dormancy from the literature, as well as new experiments that address their questions about dormancy in our experiments.

      Reviewer #2 (Public Review):

      Summary:

      In this article, the authors employed modified CRISPR screens ["guide-only (GO)-CRISPR"] in the attempt to identify the genes which may mediate cancer cell dormancy in the high grade serous ovarian cancer (HGSOC) spheroid culture models. Using this approach, they observed that abrogation of several of the components of the netrin (e.g., DCC, UNC5Hs) and MAPK pathways compromise the survival of non-proliferative ovarian cancer cells. This strategy was complemented by the RNAseq approach which revealed that a number of the components of the netrin pathway are upregulated in non-proliferative ovarian cancer cells and that their overexpression is lost upon disruption of DYRK1A kinase that has been previously demonstrated to play a major role in survival of these cells. Perampalam et al. then employed a battery of cell biology approaches to support the model whereby the Netrin signaling governs the MEK-ERK axis to support survival of non-proliferative ovarian cancer cells. Moreover, the authors show that overexpression of Netrins 1 and 3 bolsters dissemination of ovarian cancer cells in the xenograft mouse model, while also providing evidence that high levels of the aforementioned factors are associated with poor prognosis of HGSOC patients.

      Strengths:

      Overall it was thought that this study is of potentially broad interest in as much as it provides previously unappreciated insights into the potential molecular underpinnings of cancer cell dormancy, which has been associated with therapy resistance, disease dissemination, and relapse as well as poor prognosis. Notwithstanding the potential limitations of cellular models in mimicking cancer cell dormancy, it was thought that the authors provided sufficient support for their model that netrin signaling drives survival of non-proliferating ovarian cancer cells and their dissemination. Collectively, it was thought that these findings hold a promise to significantly contribute to the understanding of the molecular mechanisms of cancer cell dormancy and in the long term may provide a molecular basis to address this emerging major issue in the clinical practice.

      Thanks for the kind words about the importance of our work in the broader challenges of cancer treatment.

      Weaknesses:

      Several issues were observed regarding methodology and data interpretation. The major concerns were related to the reliability of modelling cancer cell dormancy. To this end, it was relatively hard to appreciate how the employed spheroid model allows to distinguish between dormant and e.g., quiescent or even senescent cells. This was in contrast to solid evidence that netrin signaling stimulates abdominal dissemination of ovarian cancer cells in the mouse xenograft and their survival in organoid culture. Moreover, the role of ERK in mediating the effects of netrin signaling in the context of the survival of non-proliferative ovarian cancer cells was found to be somewhat underdeveloped.

      Experiments previously published in citation 7 show that growth arrest in patient ascites derived spheroids is fully reversible and that argued against non-proliferative spheroids being a form of senescence and moved this work into the dormancy field.  We have added extensive new support for our model systems and data to address the counterintuitive aspects of MEK-ERK signaling in survival instead of proliferation. 

      Reviewer #1 Recommendations for Authors

      (1) A better characterization of the spheroid model may be warranted, including staining for the markers of quiescence and senescence (including combining these markers with staining for the components of the netrin pathway)

      See Figure 1A and page 6, lines 126-36 where we have added blots for Ki67 and p130 to better emphasize the arrested proliferative state of cells in our screening conditions.  We have also added these same controls for patient ascites-derived spheroids in Figure 4 and described on page 9, lines 203-7.  One realization from this CRISPR screen, and others in our lab, is that it identifies functionally important aspects of cell physiology and not necessarily ones that are easily explored using commercially available antibodies.  Netrin-1 and -3 staining of patient derived spheroids in Fig. 4, as well as cell line spheroids stained in Fig. 4 Supplement 1 further support the relevance of this pathway in dormant cancer cells because Netrins are expressed in the right place at the right time.  The Netrin-1 stimulation experiments in Fig. 5C were originally carried out to probe HGSOC cells for functionality of Netrin receptors since we couldn’t reliably detected them by blotting or staining with available antibodies.  This demonstrates that this pathway is active in the various HGSOC cell lines we’ve used and specifically, using OVCAR8 cells, we show it is only active in suspension culture conditions.

      (2) In figure 1A it appears that total p38 levels are reduced in some cell lines in spheroid vs. adherent culture. The authors should comment on this.

      These blots have been updated to be more clear.  Overall p38 levels may be reduced in some cell lines and when compared with activation levels of phosphorylated p38 it suggests the fraction of activated p38 is higher. OVCAR8 cells may be an exception where the overall activity level remains approximately the same.

      (3) The authors should perhaps provide a clearer rationale for choosing to focus on the netrin signaling vs. e.g., GPCR signaling, and consider more explicit defining of "primary" vs. "tertiary" categories in Reactome gene set analysis.

      We’ve updated Fig. 1E and the text on page7, lines 161-5 to illustrate which gene categories identified in the screen belong to which tiers of Reactome categories. It better visualizes why we have investigated the Axon guidance pathway that includes Netrin because it is a highly specific signaling pathway that scores similarly to the broader and less specific categories at the very top of the list. As an aside, the GPCR signaling and GPCR downstream signaling have proven to be fairly intractable categories.  As best we can tell the GPCR downstream signaling category is full of MAPK family members and likely represents some redundancy with MAPK further down.  

      (4) In figure 3A-C, including factors whose expression did not appear to change between adherent and suspension conditions may be warranted as the internal control. Figure 3D-F may benefit from some sort of quantification.

      The mRNA expression levels are normalized to GAPDH as an internal control. We have updated this figure and re-plotted it as fold change relative to adherent culture cells with statistical comparisons to indicate which are significantly upregulated in suspension culture.

      The IHC experiments are now in Fig. 4D-F and show positive staining for Netrin-1 and -3.  Netrin-3 is easiest to see, while Netrin-1 is trickier because the difference with the no primary antibody control isn’t intensity, but the tint of the DAB stain.  We had to counter stain the patient spheroids with Hematoxylin in order for the slide scanner to find the best focal plane and make image registration between sections possible.  This unfortunately makes the Netrin-1 staining rather subtle.  For cell line spheroids in the Fig. 4, Supplement 1 we didn’t need the slide scanner and show negative controls without counter stain that are much more convincing of Netrin-1 detection and reassure us that our staining detects the intended target.  We’ve updated the labels in Fig. 4 and Fig. 4, Supplement 1 for this to be more intuitive.  Unfortunately, relying on the tint of the DAB stain leaves this as a qualitative experiment.

      - In figure 4C-E the authors show that Netrin-1 stimulation induces ERK phosphorylation whereby it is argued that this is a "low-level" stimulation of ERK signaling required for the survival of ovarian cells in the suspension. This is however hard to appreciate, and it was thought that having adherent cells in parallel would be helpful to wage whether this indeed is a "low level" ERK activity. Moreover, the authors should likely include downstream substrates of ERK (e.g., RSKs) as well as p38 in these experiments. The control experiments for the effects of PD184352 on ERK phosphorylation also appear to be warranted. Finally, performing the experiments with PD184352 in the presence of Netrin-1 stimulation would also be advantageous.

      We have added a new Netrin-1 stimulation experiment in Fig. 4D (described on page 11, line 244-52) that shows that Netrins can only activate  very low levels of ERK phosphorylation in suspension when proliferation is arrested. Netrin-1 stimulation of quiescent adherent cells where stimulation of proliferation is possible shows that Netrins are unable to activate ERK phosphorylation in this condition.  In contrast, we also stimulate quiescent adherent OVCAR8 cells with an equal concentration of EGF (a known mitogen) to offer high level ERK phosphorylation as a side by side comparison.  I think that this offers clear evidence that Netrin signaling is inconsistent with inducing cell proliferation.  We’ve also updated citations in the introduction to include citation 26 that offers a previously reported paradigm of Netrin-ERK signaling in axon outgrowth that is a non-cancer, non-proliferative context to remind readers that Netrins utilize MEK-ERK differently. 

      We highlight Netrin-MEK-ERK signaling as key to survival for a number of reasons.  First, Netrin signaling in this paradigm does not fit the dependence receptor paradigm where loss of Netrin receptors protect against cell death.  Fig. 5B rules this out as receptor loss never offers a survival advantage, but clearly receptor deletions compromise survival in suspension culture.  Second, positive Netrin signaling is known to support survival by inactivating phosphorylation of DAPK1.  We’ve added this experiment as Fig. 5 Supplement 1D and show that loss of Netrin receptors doesn’t reduce DAPK1 phosphorylation in a time course of suspension culture.  Consequently, we conclude this isn’t the survival signal either.  Since MEK and ERK family members scored in our screen, we investigated their role in survival.  We now show two different MEK inhibitors with different inhibitory mechanisms to confirm that MEK inhibition induces cell death. In addition to the previous PD184352 inhibitor in our first submission, we’ve added Trametinib as well and this is shown in Fig. 5G.  Since it is surprising the MEK inhibition can kill instead of just arrest proliferation, we’ve also added another cell death assay in which we show trypan blue dye exclusion as a second look at survival.  This is now Fig. 5H.  Lastly, we include Trametinib inhibition of ERK phosphorylation in these assays in Fig. 5I.  While we leave open what takes place downstream of ERK, our model in Fig. 5J offers a very detailed look at the components upstream.

      - Does inhibition of ERK prevent the abdominal spread of ovarian cancer cells? The authors may feel that this is out of the scope of the study, which I would agree with, but then the claims regarding ERK being the major mediator of the effects of netrin signaling should be perhaps slightly toned down.

      We agree that loss of function xenograft experiments will enhance our discovery of Netrin’s role in dormancy and metastasis.  We have added a new Fig. 6 that uses xenografts with Netrin receptor deficient OVCAR8 cells (UNC5 4KO).  It demonstrates that two weeks following IP engraftment we can isolate spheroids from abdominal washes and that cells have entered a state of reduced proliferation as determined by lowered Ki67 expression as well as other proliferation inducing genes.  In the case of UNC5 4KO cells, there is significant attrition of these cells as determined by recovering spheroids in adherent culture (Fig.6C) and by Alu PCR to detect human cells in abdominal washes (Fig. 6D).  Lastly, xenografts of UNC5 4KO cells cause much less aggressive disease and significantly extend survival of these mice (Fig. 6E,F).  Not exactly the experiment that the reviewer is asking for, but a clear indication that Netrin signaling supports survival in xenograft model of dormancy.

      - Notwithstanding that this could be deduced from figures 6D and F, it would be helpful if the number of mice used in each experimental group is clearly annotated in the corresponding figure legends. Moreover, indicating the precise statistical tests that were used in the figures would be helpful (e.g., specifying whether anova is one-way, two-way, or?)

      We have added labels to what is now Fig. 8B to indicate the number of animals used for each genotype of cells.  We have also updated figure legends to include more details of statistical tests used in each instance.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required):

      The majority of the conclusions are well supported by strong experimental evidence. The only area where that is not fully the case is the role of Pak1 as a downstream effector of FoxG1-FoxO6 and its effects on macropinocytosis. To further strengthen this claim, the authors should demonstrate that ablation of Pak1 can rescue the functional consequences of forced FoxO6 expression and whether overexpression of Pak1 rescues quiescence exit in FoxO6 knockout. Thank you to the reviewer for these helpful suggestions. To investigate the effects of Pak1 ablation, and therefore more directly the link between FOXG1 and FoxO6 and macropinocytosis, we tested the published Pak1 inhibitor IPA-3. Unfortunately, to distinguish the role of Pak1 in quiescence exit and macropinocytosis, we would need a dosage of IPA-3 that is efficacious but does not affect cell proliferation. It was not possible to optimise such a dosage (a dosage of 10uM is shown to be efficacious at inhibiting Pak1 (Verma et al, 2020; Wong et al, 2013) however even at 2.5uM we see significant cell death in our cells. Indeed, this is potentially due to pleiotropic roles for Pak1.

      Also, it is not feasible to overexpress Pak1 in the FoxO6 KO cells with inducible FOXG1. To ensure we are investigating quiescence exit this would need to be in an inducible manner; however, re-transfecting cells using the PiggyBac system would potentially alter FOXG1 transgene levels by excising the existing transgene.

      As shown in Figure S3, we do not observe clear vacuole formation in F6 (FOXG1-inducible) cells upon Dox addition. As detailed in the discussion, we hypothesise that FoxO6-induced macropinocytosis could represent a stalled state, with other pathways downstream of FOXG1 necessary to be activated concomitantly to ensure cell cycle re-entry, e.g., through increased pinocytic flux that cannot be assessed within our experimental timeframes. Indeed, active Pak1 has been found to modulate pinocytic cycling, enhancing both FITC-dextran uptake and efflux (Dharmawardhane et al, 2000). We therefore would not hypothesise that high Pak1 levels alone would be sufficient to drive quiescence exit.

      Alternatively, the macropinocytosis observed may be a metabolic stress response because of the hyperactivation of signalling pathways upon FoxO6 overexpression. Hyperactivation of Ras signalling, canonical Wnt and PI3K signalling have all been shown to play roles in inducing macropinocytosis (Overmeyer et al, 2008; Tejeda-Muñoz et al, 2019; Recouvreux & Commisso, 2017).

      We believe the observed macropinocytosis phenotype upon Foxo6 overexpression, and the changes in Pak1 expression upon Foxo6 loss or FOXG1 induction provide interesting insights into the function of this underexplored FoxO family member. However, currently we are unable to demonstrate a direct link between these processes and have therefore modified the text to reflect this (see lines 292-4, 330-3, 365-8).

      • The manuscript stresses the role of NSC quiescence exit in GBM and demonstrates that FoxG1 KO reduces FoxO6 levels in a murine GBM cell line but a BMP4-mediated quiescence and dox-induced FoxG1 over-expression or an abolishment of cell cycle re-entry thereof by reduced FoxO6 levels in the case of FoxG1 KO is lacking. But this would significantly substantiate the relevance of the findings. *

      Mouse GBM cells have elevated levels of FoxG1 and have been shown to be refractory to BMP4-mediated quiescence entry, maintaining colony formation following BMP treatment (Bulstrode et al, 2017). It is therefore challenging to specifically investigate cell cycle re-entry/ quiescence exit using these mouse GBM cells, or indeed any GBM cell line due to their inability to respond fully to BMP cues (Caren et al, 2015). It has also been shown by Bulstrode et al, 2017 that Foxg1 null mouse neural stem cells show an increased propensity to exit cycle in response to BMP treatment, and reduced colony formation on return to EGF/FGF-2 growth factors. FOXG1 null cell lines therefore show a reduced response to BMP cues, making it difficult to explore quiescence exit per se.To navigate this, instead we investigated Dox-induced FOXG1 overexpression in FoxO6 WT and KO mouse NS cells, which display similar quiescence characteristics upon BMP treatment (Figure 4).

      • In the introduction and discussion, FoxO6 is mentioned for its oncogenic roles in various cancers but no reference to GBM specifically is cited. It feels like a missed opportunity to not show evidence of this in the IENS cell line that has reduced levels of FoxO6; is there an effect in their proliferative capacity? What are the expression levels of Pak1 following FoxG1 KO in IENS cells? *

      Thank you for the helpful suggestion. It is indeed true the literature on FoxO6 in GBM is lacking, explaining the absence of citations on this. On investigation of expression of the proliferation marker Ki67 in these cells we found no significant difference in expression, now shown in Figure 1H. This is in fitting with previous findings of our lab (Bulstrode et al, 2017) which show that FOXG1 is dispensable for the maintenance of continued NSC or GSC proliferation in vitro. We investigated the expression levels of Pak1 following FOXG1 KO in IENS and found a decrease in both KO lines compared to parental cells (updated Figure 6F).

      As explained in our discussion, these data suggest that Foxg1/FoxO6/Pak1 are not functionally important in sustaining GSC/NSC proliferation, as shown by the lack of proliferation defects upon Foxg1 or FoxO6 deletion (Bulstrode et al, 2017), but impact regulatory transitions, as cells prepare to exit quiescence into the proliferative radial-glia like state.

      *Minor comments *

      - Fig1A shows 4 and 2-fold respectively for the two mouse NSC lines, not 17 and 4-fold increase as written on manuscript, please adjust accordingly.

      The qRT-PCR data are presented as log2(fold change) or - ddCt, where this value equals zero for the calibrator sample, as indicated in the figure legends and axes. The data are presented in this way to enable accurate visualisation of up- and down-regulation of gene expression. Data are stated as ‘fold increase’ in the text for ease of reading, which we have clarified in the text and figure legends (e.g. lines 154 and 176).

        • Fig2G manuscript reports a 235-fold upregulation, but graph looks more like a 7 or 8-fold as shown on Fig1A for the F6 NSC line. I would recommend checking the fold changes reported throughout the paper. *

      See previous comment above. The qRT-PCR data are presented as log2(fold change) or - ddCt, where this value equals zero for the calibrator, as indicated in the figure legends and axes. The data are presented in this way to enable accurate visualisation of up- and down-regulation of gene expression. Data are stated as ‘fold increase’ in the text for ease of reading, which we have clarified in the text and figure legends (e.g. lines 154 and 176).

      • The manuscript describes the increase of FOXG1 after BMP4-induced cell cycle exit as compared to non-BMP4 treated cells (p.8 first paragraph), but I am wondering if this expression is rather compared to dox negative and not vs BMP4 negative treatment. *

      Data are presented relative to the non-BMP treated (EGF/FGF-2) control throughout the manuscript for consistency. This is to enable changes in expression between -Dox and +Dox to be visualised throughout the quiescence-exit time course relative to the initial starting population in EGF/FGF-2 growth media, prior to BMP treatment.

        1. In Fig2G it is interesting that FoxO6 is upregulated in BMP4 treated throughout the experiment with highest values at day10 post treatment. At the same time, non-BMP4 treated cells keep decreasing their FoxO6 levels dramatically but there is no mention or reference to this effect.*

      In Figure 2G, all cells have been treated with BMP4, prior to return to growth media (EGF/FGF) with or without Dox. It is true that in the +Dox condition with FOXG1 induction, FoxO6 levels continue to increase up to Day 10, perhaps reflective of the expansion of a highly proliferative radial glia-like population.

        1. Fig2 would benefit from a western blot like Fig1D where FoxG1 and FoxO6-HA protein levels are also shown in dox-treated comparing BMP4-treated vs non-treated. *

      Due to the lack of specific FoxO6 antibodies and the absence of a FoxO6-HA tag in this cell line, it is not possible to perform protein analysis of FoxO6 levels in this figure as for Figure 1D.

      • The colonies in Fig3E should be quantified, as their ability to form neurospheres seems somewhat compromised upon FoxO6 KO. Fig3B and 3F could perhaps be consolidated into one panel in the interest of space and presentation. *

      Good suggestion. We have now consolidated Fig 3B and 3F into one panel (now Figure 3F) as suggested by the reviewer. We performed additional replicates for Figure 3E to quantify the colony formation efficiency. This showed a small but insignificant decrease in colony forming ability in the KO cells (Figure 3E). Importantly the FoxO6 null cells do form colonies, and our results show that FoxO6 is not essential for proliferation or colony formation of NSCs in EGF/FGF-2 – this therefore does not account for the complete loss in colony formation we see the in the FoxO6 KO cells upon FOXG1 induction.

      • Fig4A shows vs "parental" non-BMP on y axis but wouldn't this show fold change of dox+ parental vs parental. The authors should clarify this. *

      All samples in Figure 4A are compared to parental cells in EGF/FGF-2, i.e. non-BMP treated, as the calibrator sample where log2(fold change) equals zero. We chose to set a single calibrator sample for all data (parental and FoxO6 KO cells included) to allow us to compare changes in FOXG1 transgene across the entire experiment.

      • Perhaps the authors can add a non-BMP4 treated count of % FOXG1 positive cells to Fig4C for reference. *

      As shown in Figure 4A, both parental and FoxO6 KO cells show similar, i.e. negligible, FOXG1 transgene expression without Dox, compared to the parental non-BMP4 treated control, therefore negligible FOXG1-V5 positive cells are seen by ICC. We have edited Figure 4A to include a non-BMP treated and BMP-treated control to show the negligible FOXG1-V5 expression by qPCR as controls.

      • The sentence mentioning Fig5D for the first time (p.10 third paragraph) needs rephrasing for clarity and should also call out Fig5C for the mCherry expression live cell imaging data where appropriate. Fig5D does not appear to be live imaging as implied by the text. If vacuole formation is observed already as early as 10-11h after Dox induction, then it should be shown somewhere in Fig5. Vacuole formation is shown with a higher magnification image inset only in the 22h timepoint image. I think Fig5E should be more substantiated with some sort of quantification, e.g. % of vacuoles positive for EEA1 and/or LAMP1. *

      We apologise for this. The first reference to Figure 5D one line 234 should refer to Figure 5C, this has now been corrected in the text. Vacuoles are visible in Figure 5C panel 10 h 30 min, however, to make this clearer we have also supplied an accompanying movie of the live imaging (Movie 1). The imaging in Fig 5E has not been quantified as this imaging was performed with the purpose of confirming the vacuole structures seen are not simply enlarged lysosomes, due to their similarity in appearance to those published elsewhere (Ramosaj et al, 2021; Leeman et al, 2018). Instead, we have provided Western blotting data in Figure S5E to support this conclusion that there is no clear increase in EEA1 or LAMP1 (early endosomal or lysosomal) expression upon FoxO6-HA induction.

      *- Could the authors comment on the lack of proliferative advantage of the FoxO6 overexpression. FigS3 shows Edu staining, but there is no proliferation assay in either Fig5 or S3. What would be the effect of FoxO6 overexpression on BMP4-mediated quiescence with or without FoxG1 over-expression? *

      Induction of FoxO6-HA overexpression does not provide a proliferative advantage to the cells. Looking at individual cells, those with high FoxO6-HA levels seem to associate with EdU negativity. In Figure S3 we provide quantitative EdU incorporation assay as a proliferation assay (quantification of the number of cells cycling, therefore incorporating EdU, within a 24h pulse period). Quantification of the EdU staining in Figure S3G is provided in Figure S3H. We have now clarified this in the text on page 11, lines 263-4.

      Unfortunately, due to transgene overexpression using the PiggyBac transposon method, it is not feasible to overexpress FoxO6 and FOXG1 in the same cell line, as re-transfecting cells using the PiggyBac system would potentially alter FOXG1 transgene levels and make results difficult to interpret. Given the association of vacuolated cells with EdU negativity, we predict that FoxO6 overexpression would not give an advantage for quiescence exit. Indeed, BMP-treated cells with FoxO6 overexpression show a decrease in EdU positivity, as shown in Figure S3H. As discussed in the text, we hypothesise that cells with FoxO6 overexpression are in a stalled state, potentially due to signalling hyperactivation. While this may not be physiological, it gives us clues as to the function and downstream targets of FoxO6, which remain uncharacterised.

      *- Can the authors clarify if there is a proliferation change in F6 cells in Fig6F as in Fig2F? Fig6F shows Pak1 is already upregulated in quiescent NSCs, what are the expression levels of Pak1 in FoxO6 -/- ANS4 cells upon FoxG1-mediated quiescence exit as shown in Fig4? Is there a particular reason why the F6 cell line data is shown only up to day2 post Dox-induction rather than d4 or d10? For consistency with the rest of similar experimental data this timeline should be extended. Does Pak1 remain elevated, plateaus or keeps reducing further post day2? *

      The data is (previous) Figure 6F is the same assay and cell line as presented in Figure 2, but at an early timepoint (Day 2) during the quiescence exit assay. We have provided in the panel qRT-PCR analysis of Ki67 to show that cells begin to show increased proliferation at this timepoint. Due to our hypothesis that Pak1 is required at an early transition point, we decided to analyse this expression at an earlier timepoint than Figure 2. We have also repeated this at D10 (data below), showing Pak1 levels continue to increase with time, along with FoxO6 and the proliferative marker Ki67. Due to technical issues with variable FOXG1 transgene levels we were unable to analyse Pak1 expression levels in FoxO6+/- ANS4 cells upon FOXG1-mediated quiescence exit.

      *15 . Reviewer #1 (Significance (Required)): *

      The study provides a conceptual advance for exit from stem cell quiescence. There is strong evidence provided for murine neural stem cells, but the link to GBM cancer stem cells is less developed (but perhaps this is the subject of a separate manuscript).

      While FoxG1 is a known regulator of neurodevelopment and glioblastoma, the functions of FoxO6 have not been studied in the context of neural stem cells. In my view, this study should be of high interest to audiences in both neurodevelopment and cancer research. * Expertise: glioblastoma, cancer stem cells, neurodevelopment *

      We have edited the text and title to clarify that neural stem cells are used here as a model for GSCs with high levels of FOXG1 (e.g. lines 36 and 69).


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *Major comments: *

      -The choice of NSCs as a main experimental model to understand the effects of FoxG1 and FoxO6 is not fully justified. The authors had previously shown that FoxG1 is expressed at very low levels in NSCs (Fig. 1A in Bulstrode et al. 2017). FoxO6 also seems to be barely expressed in NSCs (Fig. 1 of the current manuscript) and, in addition, its levels seem to go further down as cells exit quiescence (-Dox line in Fig. 2H). Therefore, these two genes do not seem to play an important role in the normal exit from quiescence of NSCs, with FoxO6 only affecting FoxG1 overexpression-induced exit from quiescence. * * *If the aim is to mimic a GBM-like state by FoxG1 overexpression, this should be made much clearer in the text, including title and abstract. In that case, the authors should also show a direct comparison of the levels of FoxG1 in GBM and upon Dox-induced overexpression in NSCs. *

      We agree with this criticism and suggestion to fix this. It is indeed our aim to mimic a GBM-like state by inducing FOXG1 overexpression and we should have made that more explicit. All experiments are performed in the context of high FOXG1 level. Like Foxg1, FoxO6’s homeostatic roles may be subtle in adulthood, and mostly involved in neural plasticity (Yu et al, 2019). This is in keeping with our finding that basal FoxO6 levels are low in adult NSCs and not required for sustained proliferation but are important for cell state transitions. If the FoxO6 levels activated by elevated FOXG1 represent an acquired dependency of GBM, there may be a therapeutic window to target this pathway. However, given the poorly understood roles of FoxO6, further work is needed to determine its specific value as a therapeutic target. We have modified the title and the text to make this clearer. This is also stated in the first paragraph of the results section on page 7 (line 148).

      We have provided below a Western Blot (Bulstrode, 2016) in which FOXG1 levels in F6 cells induced with Dox (1000 ng/ml the dosage used) with the GBM cell lines G7 and G144, and the normal NS cell line U5. This shows that the FOXG1 levels induced are significantly higher than found in normal neural stem cells (mouse or human). This model has been previously used and published in Bulstrode et al, 2017, upon which this manuscript expands.

      *-While the authors state that they aim to study NSC quiescence, they use a protocol that is closer to modelling astrocytic differentiation. In fact, in their previous work, they use this very same protocol (removal of growth factors and addition of BMP) to study the role of FoxG1 and Sox2 on astrocyte de-differentiation (Bulstrode et al. 2017). While there is arguably no perfect in vitro model of NSC quiescence, the current standard in the field is treatment with both BMP and FGF for 48 to 72 hours (e.g.: Mira et al., 2010, Martynoga et al., 2013, Knobloch et al., 2017, Leeman et al., 2020). BMP alone is regarded as a pro-astrocytic differentiation cue, and 24 hours might not be enough for NSCs to fully commit to either differentiation or quiescence. Therefore, either the claims in the paper are changed to match the astrocytic differentiation model, or a standard quiescence protocol should be used throughout to confirm the findings also apply to the exit from quiescence of NSCs. *

      We agree with the reviewer that there is indeed no perfect in vitro model of NSC quiescence and thank the reviewer for this useful discussion. Coincident with this project, this was an active area of research from our laboratory as explored by Marques-Torrejon et al, 2021 (Nature Comms). After 24 h BMP4 treatment, we found that adult mouse NS cells: exit cell cycle, are growth factor unresponsive, obtain an astrocytic morphology, upregulate astrocytic markers such as Gfap and Aqp4, and downregulate radial glia/NS cell markers such as Nestin and Olig2 (Figure 3).

      We therefore initially viewed them as terminally differentiated. However, the exact state of these cells is difficult to define due to the lack of definitive markers and transcriptional differences that can distinguish terminally differentiated GFAP-expressing astrocytes from quiescent type B SVZ NS cells (which also express GFAP) (Bulstrode et al, 2017; Doetsch et al, 1999; Codega et al, 2014). Findings from our laboratory later suggested some NS cell markers are maintained following BMP4 treatment and these cells can be forced back into cycle with combined Wnt/EGF signalling, or FGF/BMP signalling (Marques-Torrejon et al 2021). This suggests in vitro NS cells may lie along a continuous spectrum of states from dormant quiescent, activated quiescent (primed for cell cycle re-entry) to actively proliferating, similar to that observed in vivo in the mouse SVZ (Dulken et al, 2017). Indeed, after 24 h BMP4 treatment, we observe a minimal level of colony formation in no Dox controls following 10 days of exposure to the growth factors EGF/FGF-2 (Figure 2D-F).

      These non-cycling BMP4-induced astrocytic cells might therefore be better viewed as dormant quiescent NSCs, hence our reference as quiescent NSCs. The assay conditions used in this manuscript differ to those of Marques-Torrejon et al, in terms of density and length of BMP4 treatment; it is therefore likely that our BMP-treated cells are at different stages along the continuum between dormancy and primed quiescent states. Importantly, regardless of the exact cell type induced by 24 h BMP4 treatment, we have considered the changes induced by FOXG1 overexpression, in comparison to the effect of NS cell media alone.

      *-The FoxO6-induced vacuole formation in NSCs is a very interesting finding. However, so far it was only observed upon FoxO6 overexpression. To claim vacuolization is required for quiescence exit, the authors should show whether this phenomenon is also observed upon normal exit from quiescence and FoxG1-induced reactivation of NSCs. From the author's own data, Pak1 (which induces vacuolization) is unlikely to reactivate NSCs, as its expression is highest in BMP-treated cells (Figure 6F). The authors should show whether some vacuolization is present at these stage in NSCs and if not, discuss the possible interplay between Pak1 and FoxO6 in vacuole formation and quiescence exit. *

      As detailed in the discussion, we hypothesise that FoxO6- induced macropinocytosis could represent a stalled state, with other pathways downstream of FOXG1 necessary to be activated concomitantly to ensure cell cycle re-entry, e.g., through increased pinocytic flux that cannot be assessed within our experimental timeframes. Indeed, active Pak1 has been found to modulate pinocytic cycling, enhancing both FITC-dextran uptake and efflux (Dharmawardhane et al, 2000). Alternatively, the macropinocytosis observed may be a metabolic stress response because of hyperactivation of signalling pathways upon FoxO6 overexpression Hyperactivation of Ras signalling, canonical Wnt and PI3K signalling have all been shown to play roles in inducing macropinocytosis (Overmeyer et al, 2008; Tejeda-Muñoz et al, 2019; Recouvreux & Commisso, 2017).

      We do not see clear evidence of vacuoles in FOXG1-induced reactivation of NSCs – this supports that the macropinocytosis seen upon FoxO6 overexpression is a stalled state or due to hyperactivation. While this may not be physical, it gives us clues as to the function and downstream targets of FoxO6, which remain uncharacterised (such as a link of FoxO6 and FOXG1 with Pak1-related pathways). Demonstrating a requirement for vacuolisation in quiescence exit is outwidth this manuscript and therefore we are careful not to claim this. We have modified the text to clarify this.

      As the reviewer noted, it is interesting that Pak1 is highest in BMP-treated cells; it seems that BMP signalling itself is triggering elevated Pak1 levels, likely as cells undergo extensive cell shape changes during the transition from proliferation to quiescence. However, in EGF/FGF-2, Pak1 levels decrease, and our data suggests that FOXG1/FoxO6 are required to increase or maintain Pak1, potentially to again enable the cell shape/metabolic changes required on quiescence exit. We have added to the text to expand upon this observation on page 14 (lines 330-333). -Finally, the data on the regulation of Pak1 expression by FoxO6 is insufficient to draw any strong conclusions. Downregulation of Pak1 in FoxO6 cells is not enough evidence to claim a direct regulation. The authors should show whether Pak1 levels are increased after FoxO6 overexpression and whether FoxG1 is downregulated in FoxO6 KO NSCs (indirectly affecting Pak1 expression).

      We have performed qRT-PCR analysis of Foxg1 expression in FoxO6 KO NSCs and see no consistent difference in expression, indicating this is not indirectly affecting Pak1 expression (see below, 1). We have also investigated Pak1 levels upon FoxO6 overexpression, over a time course following Dox addition (see below, 2). Interestingly, when FoxO6 is overexpressed, Pak1 is not clearly upregulated at any time-point. It may be that as Pak1 is already expressed in the -Dox controls, due to its roles in a variety of cellular functions, that the levels are saturated already. It is clear that Pak1 expression decreases upon FoxO6 loss in EGF/FGF (without coincident Foxg1 downregulation) and in F6 cells, higher FOXG1 correlates with higher Pak1 in EGF/FGF. Together with the induction of macropinocytosis upon FoxO6 overexpression, these data provide interesting insights into the potential pathways downstream of Foxo6 in controlling quiescence exit, directly or indirectly related to Pak1 signalling. We have modified the text to reflect this on page 14 (lines 330-333).

      Minor comments: * Please state in the main text that NSCs are derived from the SVZ. *

      This has been added to the text on page 7 (line 149) and is in the methods ‘Cell Culture’ section.

      Reviewer #2 (Significance (Required)):

      As I said before, I find this work tackles a very important question, how is the exit from quiescence controlled in NSCs. This manuscript will be of interest to researchers in the fields of adult stem cell biology and adult neurogenesis. While my expertise lies mostly on NSC biology, this work is of potential great interest for the cancer field, particularly for brain cancer research. Elucidating the mechanisms GBM cells use to exit quiescence is crucial in order to avoid the relapse of this aggressive form of brain cancer. To increase the relevance of the work to the cancer community, some of the key findings should be reproduced with GBM cells. It would be particularly important to show whether Pak1 induced vacuolization and macropinocytosis can be observed in GBM cells.

      As detailed in the discussion, we hypothesise that FoxO6- induced macropinocytosis could represent a stalled state, with other pathways downstream of FOXG1 necessary to be activated concomitantly to ensure cell cycle re-entry, e.g., through increased pinocytic flux that cannot be assessed within our experimental timeframes. Alternatively, the macropinocytosis observed may be a metabolic stress response because of hyperactivation of signalling pathways upon FoxO6 overexpression Hyperactivation of Ras signalling, canonical Wnt and PI3K signalling have all been shown to play roles in inducing macropinocytosis (Overmeyer et al, 2008; Tejeda-Muñoz et al, 2019; Recouvreux & Commisso, 2017). We do not see clear evidence of vacuoles in FOXG1-indued reactivation of NSCs– this supports that the macropinocytosis seen upon FoxO6 overexpression is a stalled state or due to hyperactivation. We do not therefore think macropinocytosis per se would be observed in quiescence exit of GBM cells – indeed a normal form of macropinocytosis-induced cell death called methuosis has been observed in GBM cells with hyperactivated Ras signalling (Overmeyer et al, 2008). However, this phenotype still gives us clues as to the function of FoxO6 in quiescence exit in GSCs and the downstream signalling pathways it may regulate, such as Pak1-related signalling (discussed on lines 330-3 and 366-9).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: * The overall objective of the paper is to investigate the mechanisms by which co-option of the activity of developmental master lineage regulators by cancer cells allows them to gain fitness. To answer this question, they focus on FOXG1. This TF acts during the specification of the telecephalon. Its expression can be increased in Glioblastoma (GBM) and, more importantly for the paper, FOXG1 has previously been shown to promote exit from quiescence of glioblastoma stem cells (GSCs) and non-transformed neural stem cells (NSCs). In a previous screen, the authors identified FoxO6 as a potential direct target gene of FOXG1. In this paper, they showed that with the gain of expression for FOXG1 in NSCs and loss of FOXG1 in GSCs, FoxO6 is increased or decreased, respectively. Loss of FoxO6 in NSCs does not alter their cell cycle or cell shape and specification. Yet, loss of FoxO6 in NSCs blocks FOXG1-mediated exit from quiescence. To understand the mechanisms, they decided to overexpress FoxO6 in NSCs and demonstrated that the cells undergo macropinocytosis, a process by which cells can engulf large amount of nutriments from the external medium. It remains to be determined whether this macropinocytosis occurs in cells overexpressing FOXG1 and GSCs. The authors provide a first answer by showing that overexpression of FOXG1 induces not only FoxO6 but also the expression of PAK1, one of the key kinases that regulates the membrane engulfment of macropinocytosis in NSCs. In GSC lines, the decrease of FOXO6 decreases PAK1 levels. *

      Major comments: * The paper describes interesting and convincing results (number of cell lines, repeated experiments seems sufficient) but it is difficult to reconcile them all in a single model, and this diminishes the impact of the study. Epistatic interactions between FoxG1, FoxO6, PAK1 and macropinocytosis are not always studied in the same cell models. Whether FOXG1-induced exit from quiescence of NSCs is dependent on a FOXG1-->FOXO6-->PAK1-->Macropinocytosis axis remains to be demonstrated. Also does such an axis operate in tumor cells remains to be fully assessed? In particular, if FoxO6 overexpression in NSCs can induce macropinocytosis, is this cellular process induced by FoxO6 downstream of FOXG1 activity during NSC quiescence exit? Is PAK1 a relay of FoxO6? Experiments looking at macropinocytosis and the involvement of PAK1 in the cell models of Figure 4 will definitely help to bridge the different results all together. *

      We thank the reviewer for this useful insight and discussion for future work.

      To directly investigate the effects of Pak1 ablation, and therefore more directly the link between FOXG1 and FoxO6 and macropinocytosis, we tested the published Pak1 inhibitor IPA-3. Unfortunately, to distinguish the role of Pak1 in quiescence exit and macropinocytosis, we would need a dosage of IPA-3 that is efficacious but does not affect cell proliferation. It was not possible to optimise such a dosage (a dosage of 10uM is shown to be efficacious at inhibiting Pak1 (Verma et al, 2020; Wong et al, 2013) however even at 2.5uM we see significant cell death in our cells. Indeed, this is potentially due to the variety of cellular functions Pak1 is involved in. Conversely, it is not feasible to overexpress Pak1 in the FoxO6 KO cells with inducible FOXG1. To ensure we are investigating quiescence exit this would need to be in an inducible manner; however, re-transfecting cells using the PiggyBac system would potentially alter FOXG1 transgene levels (through excision of the existing transgene) and therefore make results difficult to interpret.

      We hypothesise that FoxO6- induced macropinocytosis could represent a stalled state, with other pathways downstream of FOXG1 necessary to be activated concomitantly to ensure cell cycle re-entry, e.g., through increased pinocytic flux that cannot be assessed within our experimental timeframes (as detailed in the text discussion). Alternatively, the macropinocytosis observed may be a metabolic stress response because of hyperactivation of signalling pathways upon FoxO6 overexpression Hyperactivation of Ras signalling, canonical Wnt and PI3K signalling have all been shown to play roles in inducing macropinocytosis (Overmeyer et al, 2008; Tejeda-Muñoz et al, 2019; Recouvreux & Commisso, 2017). We do not see clear evidence of vacuoles in FOXG1-induced reactivation of NSCs– this supports that the macropinocytosis seen upon FoxO6 overexpression is a stalled state or due to hyperactivation and therefore not a physiological process in quiescence exit. We do not therefore think macropinocytosis per se would be observed in quiescence exit of GBM cells – indeed a normal form of macropinocytosis-induced cell death called methuosis has been observed in GBM cells with hyperactivated Ras signalling (Overmeyer et al, 2008).

      However, we believe the observed macropinocytosis phenotype upon Foxo6 overexpression, and the changes in Pak1 expression upon Foxo6 loss or FOXG1 induction provide interesting insights into the function of this underexplored FoxO family member, in GSCs and the downstream signalling pathways it may control, such as Pak1-related signalling. We have modified the text to reflect the limitations of our current data and discuss this (lines 330-3 and 366-9).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review): 

      Summary:

      The authors demonstrate that the immunosuppressive environment in pancreatic ductal adenocarcinoma (PDAC) can be mitigated by a combination of ionizing radiation (IR), CCR5 inhibition, and PD1 blockade. This combination therapy increases tissue-resident natural killer (trNK) cells that facilitate CD8 T cell activity, resulting in a reduction of E-cadherin positive tumor cells. They identify a specific "hypofunctional" NK cell population in both mouse and human PDAC that supports CD8 T cell involvement. A trNK signature is found to be associated with better survival outcomes in PDAC and other solid tumors.   

      Strengths: 

      Overall, I think this is an interesting study that combines testing of therapeutic concepts in mice with bioinformatics analysis of single-cell transcriptome data in primary tumors and exploration of clinical outcomes using signature genes in TCGA data. The key finding is that immunoregulatory properties of tumor-infiltrating/resident CD56-bright NK cells (assumed to be non-cytotoxic) are beneficial for outcome through cross-talk with DC and recruitment of CD8 T cells. The latter is specifically induced by irradiation combined with CCR5i and PD1 blockade. 

      "These results collectively support the notion that IR/CCR5i/αPD1 combination treatment alters immune infiltration by reducing Tregs and increasing NK and CD8 T cells, thereby resulting in greater local tumor control." I agree with this conclusion.  

      Weaknesses:  

      There are a few points to discuss and that the authors may want to address. 

      (1)   "Notably, CCR5i significantly reduced Treg infiltration but had no effect on the infiltration of other immune cells, indicating the active recruitment of CCR5+ Tregs in PDAC (Figure 2B)." 

      CCR5i treatment seems to inhibit infiltration of CD8 T cells and NK cells to a greater extent, in relative terms, compared to Treg, albeit it is not statistically significant. If this visual inspection of the graph does not reflect reality, additional experiments may be needed to verify the selective targeting of Tregs or confirm the fact that also CD8 T cells and NK cells are affected by single agent CCR5i. The reduced recruitment of Treg, NK cells, and CD8T cells was completely reversed when combined with irradiation. In the data shown in Figure 3E it seems as if CCR5i induced infiltration of Tregs along with other immune cells. However, this said, I agree with the conclusion of the authors that this combined treatment leads to an altered immune composition and ratio between Tregs and effector cells (CD8T cells and NK cells). Could this altered composition be displayed more clearly? 

      We would like to thank the reviewer for their comments and agree that there is a trend for reduced NK and T-cell infiltration during CCR5i standalone treatment (as seen in Figure 2B), although it does not reach significance. To reflect this more clearly, we have added n.s (non-significant) for the NK cells and CD8+ T-cells and adjusted the text to reflect a trend for decreased NK and CD8+ T-cell infiltration (See Lines 162-165). Moreover, to reflect the data accurately, we have taken the Treg data out of the original Figure 2B and present it separately as a percentage of CD45+CD3+ T-cells.

      (2) The definition of active and hypofunctional NK cells based on solely NKG2D expression alone seems like an oversimplification. I realize it is not trivial to test tumor-infiltrating NK cells from these tumors functionally but perhaps scRNAseq of the tumors would allow for characterization of cytotoxicity scores using KEGG or GO analysis or reversed gene set enrichment in responders/non-responders.  

      We agree that scRNA-seq of tumors would add to the overall characterization of the tumor-infiltrating NK cells and their characterization, however we are currently unfortunately not in the position to carry out this experiment. We did however immunophenotype the tumor infiltrating NK cell population in more depth by also looking at NKp46 and NKG2D surface expression. This newly added data demonstrates not only increased infiltration of “bona-fide” trNK cells (based on surface expression of CD103+CD49a+) under the triple treatment combination, but more importantly these trNK have reduced levels of CD69, NKp46, NKG2D and increased TIM-3 surface expression compared to conventional NK cells – suggesting that these trNKs could be more hypoactive compared to the conventional NK cells. These data have been added to the manuscript as Figure 4E, F; Figure supplement 4E-G and Lines 244-260 in the revised manuscript. To clarify this difference, we have replaced the word “hypofunctional” with “hypoactive” throughout the manuscript.

      (3) It seems as if the abstract refers to this phenotype incorrectly since the "hyporesponsive" subset is described as NKG2C-negative. 

      We apologize for the typographic confusion and have corrected our abstract and changed the subset to NKG2D-negative (as was intended).

      (4) "The NK_C1 cluster correlates best with the hypofunction NK phenotype observed in mice as similarly displayed reduced activation (reduced NKG7, NKp80, GZMA, and PRF1) with additional expression of tissue residency markers CD103, CD49a and, surprisingly, the adaptive activating receptor NKG2C (KLRC2) (Figure 5B, C)." 

      There is no doubt that NK_C1 represents tumor-infiltrating NK cells with a CD56bright gene signature with a strong tissue resident score. However, the transcriptional expression of KLRC2 on these is not surprising! It is well established that KLRC2 transcripts (but not protein) are highly expressed on conventional CD56bright NK cells. There are several published sources where the authors can find such data for confirmation. Thus, this is not to be confused with adaptive NK cells having an entirely different transcriptional signature and expressing high levels of NKG2C at the cell surface. I strongly recommend reinterpreting the results based on the fact that KLRC2 is expressed at high levels in conventional CD56bright NK cells. If not, it would be important to verify that these tissueresident NK cells express NKG2C and not NKG2A at the cell surface. 

      We agree with the reviewer and have modified the text accordingly in the revised manuscript (Lines 279-283), including references to tissue-resident adaptive-like cells as described previously in literature. 

      (5) NCAM1 transcript alone is not sufficient to deconvolute CD56bright NK cells in TCGA data (Figure 7A). As a single marker, it likely reflects NK cell infiltration without providing further evidence on the contribution of the bright/dim components. Therefore, the use of the bright Tr NK signature described in Table 1 is very important (Figure 7B). Table 1 is not provided. Nor Supplementary Table 1. There is only one supplementary figure in the ppt attached.

      We agree that a high NCAM1/CD56 single gene signature could also represent NK cell infiltration. We have rephrased this in the text accordingly (Lines 354-357). We apologize for the missing tables and Supplementary figures. We have added these now to the manuscript as Supplementary table 1.

      Reviewer #2 (Public Review)  

      Summary: 

      This work elaborates on a combined therapeutic approach comprising ionizing radiation and CCR5i/αPD1 immunotherapy as a promising strategy in pancreatic cancer. Previous research has established that NK cell-derived CCL5 and XCL1 play a crucial role in recruiting cDC1 cells to the tumor microenvironment, contributing to tumor control. In this study, by using a murine pancreatic cancer model, the authors propose that the addition of radiation therapy to CCR5i and αPD1 immunotherapy could upregulate CD8+ T cells and a subgroup of NK cells within the tumor and result in better tumor control. They further analyzed human single-cell sequencing data from pancreatic cancer patients and identified one subgroup of NK cells (NK C1) with tissue-resident features. Subsequent cell-cell contact analysis reveals the NK-cDC1-CD8 cell axis in pancreatic cancer. By analyzing TCGA data, they found that high NK C1 signature levels were associated with better survival in pancreatic cancer patients. Thus, radiotherapy could benefit the outcome of patients bearing low NK C1 signatures. Importantly, the positive correlation between NK C1 score with survival extends beyond pancreatic cancer, showing potential applicability across various solid cancers.  

      Strengths: 

      This study could add new insight into the clinical practice by introducing such novel combined therapy and shed light on the underlying immune cell dynamics. These findings hold potential for more effective and targeted treatment in the future. Mouse experiments nicely confirmed that such combined therapy could significantly reduce tumor volume. The elegant use of single-cell sequencing analysis and human database examination enriches the narrative and strengthens the study's foundation. Additionally, the notion that NK C1 signature correlates with patient survival in various solid cancers is of high interest and relevance.  

      Weaknesses: 

      The role of CCR5i requires further clarification. While the authors demonstrated its capacity to reduce Treg in murine tumors, its impact on other cell populations, including NK cells and CD8+ T cells, was not observed. Nevertheless, the effect of CCR5i on tumor growth in Figure 2B should be shown. If the combination of radiotherapy and αPD1 already can achieve good outcomes as shown in Figure 3A, the necessity to include CCR5i is questioned. Overall, a more comprehensive elucidation of the roles of CCL5 and CCR5i in this context would be good.  

      We would like to thank the reviewer for their comments and agree that standalone CCR5i also shows a trend of reduced infiltrating NK cells and CD8+ T-cells, although this does not reach significance. We have mentioned this trend in the manuscript (see Lines 162-165) and added n.s to Figure 2B as well. In regards to adding CCR5i; although we observe volumetric control by radiotherapy and anti-PD1, we observe an increase in necrosis induction only in the triple combination compared to radiotherapy combined with anti-PD1 – suggesting that there is an additive effect of CCR5i in our model only as a combination modality. We therefore believe that addition of CCR5i to radiotherapy and anti-PD1 has a beneficial effect. The growth curves for CCR5i alone were already presented in Figure 3A, and we have modified our manuscript to refer to this (see Lines 165-167).

      (1) In line with this, spatial plots in Figure 4 did not include the group with only radiotherapy and αPD1. This inclusion would facilitate a clearer comparison and better highlight the essential role of CCR5i. 

      We agree with the reviewer that inclusion of radiotherapy and αPD1 would facilitate a clear comparison of our data and our experiments did include single controls for radiotherapy and αPD1; however, unfortunately, the tissue slides were of bad quality and therefore not suitable for quantification. In line with this, we have added references to other studies that investigated the effect of immune checkpoint inhibitors in combination with radiotherapy (see Lines 169-172).

      (2) NK C1 cells should be also analyzed in the mouse model. The authors suggest that NKNKG2Dve could be the cell population. Staining of inhibitory markers should be considered, for example, TIGIT and TIM3 as presented in Figure 5B. 

      As per the reviewer suggestion, we have now included some additional data on the surface expression of inhibitory markers/activating receptor on tumor-infiltrating NK cells in our model under the triple combination. These additional data demonstrate increased infiltration of trNK under the triple combination that seem to be more ‘hypoactive’ than conventional NK cells.  This data has been added as Figure 4E in the revised Figure.

      (3) While the cell-cell contact analysis generated from single-cell sequencing data is insightful, extending this analysis to the mouse model under therapy would be highly informative. NK and CD8 cells in the tumor increased upon the combined therapy. However, cDC1 was not characterized. Analysis regarding cDC1 would provide more information on the NK/cDC1/CD8 axis. 

      We agree that looking into cDC1 would be highly interesting in our treatment model and its characterization is currently under investigation. The importance about the interaction between cDC1-NK cells has been described before by various groups, and we have provided additional references for that in our manuscript (see Lines 449-455)

      (4) Human database analysis showed a positive correlation between NK C1 score and CCL5 in pancreatic cancer. Furthermore, radiotherapy could benefit the outcome of patients bearing low NK C1 scores. It would be interesting to test if radiotherapy could also benefit patients with low CCL5 levels in this cohort. 

      We would like to thank the reviewer for their suggestion and please see the figure below for the comparison. Patients with CCL5high are enriched for NK_C1 (Figure 7D) and CCL5high patients with NK_C1high have significantly increased overall and disease-free survival compared to NK_C1low (Figure 7E); where those with NK_C1low significantly benefit from radiotherapy (Figure 7B). Accordingly, patients with CCL5high have significantly decreased overall survival compared to CCL5low patients, again confirming CCL5 as a prognostic marker (Figure 1A, Figure R1). When we look at CCL5low patients however, there is no additional significant benefit for radiotherapy (see insert below) in the CCL5low group (not significant; only significant p-values are shown). These data collectively support the strong correlation between CCL5 levels and NK_C1 enrichment, and imply that radiotherapy alone is insufficient to drive NK_C1 cells in the absence of high CCL5 gradients to improve overall survival. However, given the increased overall survival of CCL5low compared to CCL5high it is likely that other factors are at play. Future studies will be required to further elucidate the role of CCL5 gradients on NK_C1 cells and the beneficial effect of radiotherapy.

      Author response image 1.

      Overall survival of CCL5high versus CCL5low patients stratified into groups with and without radiotherapy using TCGA-PAAD. Log-rank p-value indicates the significance level across all groups while individual significant comparisons are shown as indicated.

      Reviewer #3 (Public Review):

      Summary

      In the submitted manuscript by Go et al, the authors evaluated the tumor microenvironment in pancreatic ductal adenocarcinoma (PDAC) and made a number of interesting observations, including the following: 1) CCL5 expression within the tumor microenvironment negatively correlated with clinical outcomes in human patients with PDAC; 2) there were both positive and negative correlations between CCL5 expression and the expression of specific genes (e.g. those encoding CD56 and CD16, respectively) included among gene signature lists for Treg, MDSC, TAM, and NK cells; 3) CCR5 inhibition with the inhibitor, maraviroc, reduced Treg infiltration but not that of other immune cell types in an orthotopic murine model of PDAC; 4) CCR5 inhibition augmented anti-PD1 immunotherapy when combined with ionizing radiation (IR) therapy in the murine model; 5) the above therapy resulted in increased infiltration of CD8+ cytotoxic T cells as well as of a subset of NKG2D-negative, tissueresidency (tr) marker expressing NK cells (deemed Cluster 1 NK in their data sets) that inversely correlated with the number of E-cadherin+ cells (i.e. tumor cells) and showed predicted interactions with cDC1 dendritic cells (including XCL1/XCL2 expressed by the NK and XCR1 expressed by the cDC1); 6) the authors identified a number of putative signals stemming from the trNK (e.g. IL-16, TNFSF14, FASLG, CSF, MIF) as well as incoming from cDC1s to NK (e.g. BAG6-NKp30); 7) these trNK cells positively correlated with good outcomes and with CD8+ T cell infiltrations in human PDAC as well as in many other solid tumor types; and 8) importantly, the benefit of IR therapy was specific to the subset of PDAC patients (represented in the TCGA dataset) that were predicted to have low amounts of trNK cells. The authors used murine experimental models, multiplexed imaging analyses, and a number of publicly available sequencing data sets from human tumor samples to perform their investigations. Based on their findings, the authors proposed that combining IR with CCR5 inhibition and anti-PD1 immunotherapy is a promising strategy to treat solid cancers.  

      Strengths

      Overall, the collective analyses and conclusions appear to be novel and could be of high and rapid impact on the field, particularly in terms of directing clinical trials to incorporate IR with CCR5 inhibition and immunotherapy. The manuscript is well written; the figures are for the most part clear; and the Discussion is very thoughtful.   

      Weaknesses

      There were a number of minor typographical errors, missing references, or minor issues with the figures. In general, while many of the observations provided strong suggestive evidence of relationships, phenotypes, and functions, the authors often used language to indicate that such things were confirmed, validated, or proven. In fact, there was a paucity of such functional/confirmatory experiments. This does not necessarily detract from the overall significance, excitement for, and potential impact of the study; but the language could likely be adjusted to be more in keeping with the true nature of the findings. The main title and running title are a bit different; consider making them more similar.

      We apologize for the typographical errors, missing references and issues with the figures. We have revised our manuscript, with a major focus on adjusting our language to more carefully reflect our data, and hope to have addressed all the concerns of the reviewer. The slight discrepancy between the main title and running title are to be able to convey the contents of this manuscript in a comprehensive way. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Please make sure all files are made available. Also please check available datasets describing KLRC2 transcripts in CD56brights. This is not to be confused with an adaptive-like signature. 

      We have added the missing table to the supplementary figures and revised the manuscript text in regards to KLRC2 transcript in our NK_C1 cluster and its implications for an adaptive-like signature in the context of tissue-residency (see Lines 279-283; 465-474).

      Reviewer #2 (Recommendations For The Authors): 

      Additional experiments as mentioned in the 'weakness' section could help to further strengthen this study. Besides these points, I would recommend the following: 

      (1) The description in the figure should be more precise and clear. Especially in Figure 3A, it seems the addition of IR into CCR5i or CCR5i/aPD1 leads to a bigger tumor volume.  

      We have adjusted the figure descriptions to more clearly describe the figures. We apologise for the confusion in Figure 3A, this was a figure legend error and has been correctly rectified in the revised Figures (i.e. closed symbols represent +IR conditions).

      (2) The definition of Tregs in figures should be described, e.g. it is not specified which population is shown in Figure S2c.  

      We have added a definition of Tregs (i.e. Live/CD45+CD3+CD4+FOXP3+) in our revised manuscript (see Lines 162-165). To avoid confusion, we have removed the subsequent gating of CCR5 and PD-1 of Tregs in our revised Supplementary Figures.

      (3) Please add a bar in all histology figures, for example, Figure 2A, S2A, S3E. It seems in Figure S3D, E, the green group is missing.  

      We have added the scale bar to all the indicated figures. Unfortunately, indeed as correctly pointed out by the reviewer, we are missing the green group (i.e. IR+CCR5i) as we felt that the excessive growth seen with CCR5i alone may have given a false impression of the extent of infiltration, therefore we did not include this in the original analysis and do not have the data in the Figure.

      (4) Please check through the manuscript, there are some grammar mistakes.  

      We apologise for the grammar mistakes in our original manuscript and have carefully revised the current manuscript to avoid grammar mistakes

      (5) Figure S7B, the left cell lacks a name.  

      We have annotated the left cell accordingly in our revised supplementary figure.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Abbreviations (e.g. PDAC) should be spelled out the first time introduced in the manuscript.

      We have adjusted this in our revised manuscript.

      (2) Referring to the tissue-resident NK cells as "hypofunctional" may not be useful...they seem to be functional, just not in the conventional sense. The authors may want to consider another term, such as non-cytotoxic (given the low expression of cytolytic granules, etc) or immunoregulatory (as they actually refer to them on line 310).

      We agree with the reviewer and have revised the manuscript to refer to them as “immunoregulatory” or “hypoactive” when appropriate. The latter is supported by the additional experiments as shown in Figure 4E.

      (3) Barry et al 2018 Nat Med demonstrated that NK cells in melanoma could support cDC1s and promote positive clinical outcomes in the setting of immunotherapy. It would likely be beneficial to also cite this paper (e.g. on line 425). 

      Thank you for the suggestion, which would work in line with our hypothesis of crosstalk between NK_C1 and cDC1. We have looked for FLT3L in our NK_C1 cluster and did not find any enrichment for FLT3L transcript (see Figure 5E). Nevertheless, we have added the reference in the discussion of our manuscript to further support the importance of crosstalk between cDC1 and NK cells (see Lines 449455)

      (4) Figure 2B: by eye, it looks like the difference between CD8+ T cells in the two conditions would be significantly different; is this not the case? Same thing for the NK cells...what are the pvalues? 

      We have added n.s. to our revised Figure 2B. The p-values for CD8+ T-cells and NK cells were 0.14 and 0.19 {2-tailed students t-test), respectively.

      (5) The murine data strongly suggest that the combination therapy promotes trNK cell infiltration into the tumors, in turn resulting in cDC1-mediated CD8+ T cell infiltration and/or activation. It could be highly valuable/useful to functionally determine (e.g. by depleting NK cells in this model) if NK cells are required for the effects seen. 

      We agree that depletion of NK cells could really solidify the findings even more, and it is part of ongoing investigations for future projects. However, it would be imperative to first characterise these NK cells in more depth as conventional global ablation of NK cells is excepted to highly impact immunosurveillance as well. This is part of current ongoing work.

      (6) Figure 7B: how were "high" and "low" defined (for the NK signature)?

      An enrichment score of the NK_C1 gene signature (see Table supplement 1) was first calculated per patient sample in the TCGA RNA-seq dataset using the Gene Set Variation Analysis (GSVA) method. A cut-off value was then determined using the maximally selected rank statistics (max-stat R package) method to divide patients into “high” and “low”. 

      (7) Lines 164-165 of the Results: it would be good to include a reference supporting the statement.

      We have added rephrased the manuscript and added corresponding references (see Lines 170-173 in revised manuscript).

      (8) There are many conclusions and very speculative language based only on sequencing results, and these have not been validated (e.g. in the Discussion, lines 447-453). As another example, it was concluded that a decrease in NKG2D+ NK cells implied a reduction in overall NK cell cytolytic activity and that NKG2D- NK cells were hypofunctional and did not kill well. This was not tested. Generally, it would be useful for the authors to use language that conveys that the data are primarily suggestive (rather than "confirmatory", line 447) of relationships, phenotypes, and functions at this point. 

      We thank the reviewer for their concerns and have carefully adapted the manuscript text to more clearly clarify the findings in a careful manner.

      (9) On lines 246-247 the authors refer to cluster 3 NK cells, which express CD16, as "immature". The rationale for this designation is not provided, and most human NK cell development models hold that CD16+ NK cells represent the most mature subset(s). 

      We apologize for the typographic error – later on we refer to the NK_C3 cluster as cytotoxic NK cells and we have corrected this in our revised manuscript (see Lines 273-275).

      (10) On line 351, the authors reference supplemental Figure 7C...but I don't see this figure in the accompanying powerpoint file. 

      This should have been Supplementary Figure 7B, and we have corrected it in the revised manuscript (see Lines 374-377)

      (11) On line 417, the authors reference NKp40; this is likely a typographical error. 

      This has been corrected in the revised manuscript to NKp46 (see Lines 439-442).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1 (Public Review):

      He et al. investigate the requirement and function of Blimp1 (encoded by Prdm1) in murine NK cells and ILC1. Employing a conditional knockout mouse model (Prdm1flox x Ncr1cre), the authors describe impaired abundance and maturation of Prdm1-deficient NK cells and ILC1 in different tissues. Blimp1-deficient NK cells have reduced expression of cytotoxic molecules (Gzmb, Prf1) and, in some instances, Ifng production, and Prdm1flox x Ncr1cre mice show impaired tumor control in experimental metastasis models. Using single-cell RNA sequencing analysis, the authors propose that Prdm1 regulates JunB expression and NK cell maturation. Based on in silico analyses, the authors suggest manifold intercellular communication between NK/ILC1 and macrophages. Without following up on any of these potentially interesting suggestions, the authors conclude their study reiterating that Prdm1 regulates IFNg-production of tumor-infiltrating NK cells and ILC1. Many of the reported functions of Blimp1 in NK cells have previously been identified using a mixed-chimera strategy comparing Prdm1 WT and KO NK cells (Kallies et al., Blood 2011). Here, the authors expand on these findings using a conditional model to delete Prdm1 in NK/ILC1 and single-cell sequencing and provide a more refined analysis of the functions of Blimp1 in these cells. Cell-chat analysis suggests close interactions of Blimp-dependent NK/ILC1 subsets with hepatic macrophages, but these suggestions are not followed up by experiments. Potentially interesting differences in the macrophage compartment of Ncr1-Cre x Prdm1-fl/fl mice are suggested by the scRNA-Seq data but are not validated e.g. by FACS. The study falls short in providing new mechanistic insights. Nevertheless, it is an interesting confirmation of "old" suggestions in a more refined setting, and the provided single-cell mRNA-Seq data represents a potentially valuable resource for the community. There are some control analyses that are required to support the conclusions of the authors, and I have a few suggestions that would help to improve the manuscript.

      We sincerely appreciate your careful review and insightful feedback on our manuscript. We have carefully considered your comments and present the results of new experiments conducted in response to your suggestions. Please find the detailed responses below.

      Major comments

      Comment 1: The authors do not control for the potential effects of Cre expression. Expression of Cre from within the Ncr1 locus (using the mouse model established by Narni-Mancinelli et al.) has significant effects on NK cells and especially ILC1s (reducing their frequency and absolute numbers and altering their functionality. The authors should characterize the Ncr1cre mice used here (developed by Shanghai Model Organism Center) in this regard and should use proper controls (Ncr1Cre+ Prdm1wt/wt as control for Ncr1Cre+ Prdm1fl/fl, instead of WT littermates) for all of their key data, e.g. those depicted in Fig 1FG, 2ADFH, 7D, S2,3,4.

      Response 1: This is a very insightful question that has posed a challenge for many researchers, including us, engaged in conditional knockout studies. The expression of Cre and the insertion of loxP sequences both have the potential to influence gene expression. This is because the region where loxP is inserted may contain regulatory sequences for the gene of interest. Ncr1-Cre is a frequently used transgenic mouse model in our laboratory. In our prior research, we also had concerns about the possible impact of Cre on NKp46 expression, which could lead to a decline in NK cell function. Therefore, in our previous studies focused on Smad4 expression in NK cells, we conducted similar experiments. In Figure 6 of our published paper in the Journal of Clinical Investigation (Wang et al., J Clin Invest, 2018), we compared NKp46-iCreTgfbr2fl/flSmad4fl/WT with NKp46-iCreTgfbr2fl/flSmad4fl/fl. Although the primary purpose is to establish Smad4's independence from TGF-β, it also allows for a comparison between Smad4fl/fl and Smad4fl/WT in the presence of Cre. In the critical phenotype we assessed, NKp46-iCreTgfbr2fl/flSmad4fl/fl (compared with NKp46-iCreTgfbr2fl/flSmad4fl/WT) exhibited the same phenotype as NKp46-iCreSmad4fl/fl (compared with NKp46WTSmad4fl/fl). This suggests that Cre's influence on NK cells may be within a reasonable and controllable range. Furthermore, in contrast to the decrease in Ncr1 expression caused by Cre, the reduction in the expression levels of genes targeted by Loxp knockout, such as Prdm1 in this study (Figure 1 E), is more significant. Therefore, with the current techniques and research methods, we believe that the data provided in this study can support the role of Prdm1 in

      NK cells.

      Comment 2: Several of the phenotypic findings on NK cells have been described before by Kallies et al. in 2011 (Ref 29), although using a different genetic Prdm1-ablation model (Prdm1-GFP/GFP knockin/knockout model). This study reported impaired NK cell maturation, reduced Gzmb expression, impaired in vivo cytotoxicity against subcutaneous RMA-S cells, impaired in vitro proliferation, comparable in vitro killing, increase in BM NK cell numbers. The authors should discuss/mention this more prominently in their manuscript, and highlight where they confirm or refine these previous findings, and where they actually provide new information.

      Response 2: We appreciate your valuable suggestions. The article you referred to, published in Blood, is indeed an excellent work. While we had cited this article, our discussion regarding its specific content was limited. Based on your advice, we have made revisions and included the following content in our discussion section (page 24; line 489-493):

      “In a study involving systemic knockout combined with competitive transplantation, it was found that Prdm1 promotes NK cell maturation and the expression of Gzmb. On the contrary, the same study also found that NK cells with Prdm1 deficiency exhibit heightened proliferation, increased survival, enhanced migratory abilities towards tumors, and greater cytotoxicity against subcutaneously implanted RMAS tumors (31).”.

      Comment 3: What is the reason to refer to the enriched cluster in Blimp1-deficient NK cells as "Junbhi"? There is no follow-up for a function of Junb, and there are many other genes upregulated in these cells. Most critically, these cells seem to represent exactly the c-Kithi cells that Kallies et al. already showed and discussed in their paper. The authors should stain for Kit, and also refer to this. Also, MacKay et al. performed Blimp1-Chip-Seq (in T cells), maybe it would be interesting to check to which of the identified DEGs Blimp1 can bind.

      Response 3: We appreciate the suggestion from the reviewer. We think a gene that supports the development of lymphocytes doesn't necessarily positively regulate their function. For example, JunB is essential for T cell development but can also induce T cell exhaustion (Lynn et al., Nature. 2019). Therefore, while Prdm1 has been shown to promote NK cell development, it cannot be assumed that it always positively regulates NK cell function, especially for anti-cancer immune surveillance. In this respect, we try to find a driving-factor of the impaired anti-tumor ability of Prdm1_Δ_Ncr1 NK cells. Although there are many other genes upregulated in this cluster (e.g. Kit), JunB attracts more our interest of its potential for regulating NK cells functions in cancer, whereas c-Kit is more likely a marker of NK cells maturation, which has been well-demonstrated by Kallies et al. and other studies. Our previous studies also showed that the expression of c-kit was decreased in mature NK cells, compared immature NK cells (Wang et al., J Clin Invest, 2018). 

      The lack of following experiments of Junb is because we cannot find valuable surface markers to investigate the follow-up function of _Junb_hi cNK cluster. If we use intracellular markers, it is more likely an analysis of gene expression pattern, which has been well-described in our RNA-seq data. As we describe above, our study did not aim to further investigate the role of prdm1 in NK cells maturation, as the c-Kit expression was upregulated in Prdm1-kncok NK cells and correlated with NK cell maturation, which has been validated by Kallies et al.. 

      We also have discussed the potential DEGs that could be bound and regulated by Prdm1 in our revised manuscript (page 27-28; line 561-571):

      “Prdm1 and Hobit directly bound and repressed Tcf7 (18), which encoded TCF-1, a TF binding and limiting the activity of Gzmb regulatory element (69). Gzmb has been demonstrated directly bound and activated by Junb in NK cells, which suggested Gzmb expression regulated by multiple Prdm1/Hobit downstream signals (26). In human T cells, binding motif of JUNB was enriched in the binding sites of PRDM1 (70), indicating the essential role of PRDM1-JUNB axis during NK cell and T cell development. In NK cells deficient in Prdm1 expression, we noted a decrease in Gzmb levels alongside with an elevation in Junb expression. This indicates that Prdm1 not only facilitates the expression of Gzmb in NK cells but also suppresses Junb expression. Given that Junb is recognized as a positive regulator of Gzmb (71), this presents a complex interplay that seems contradictory. Therefore, it is imperative to develop a theoretical framework to comprehensively understand and interpret this paradoxical relationship.”.

      Comment 4: cNK cells are considered circulating cells, that transiently pass through the liver.

      Previous studies have suggested almost identical gene expression patterns in hepatic and splenic NK cells. In functional tests, they often "perform" identically. I am therefore a bit surprised that the authors find a differential dependency of Blimp1 for the IFNg production of splenic (no role of Blimp1) versus hepatic (Blimp1 regulating IFNg production) NK cells (Fig S3). Do the authors have any suggestions on that? The analyses are performed by 12+4h stimulations with IL12/18, which could involve the effects of altered bystander cells (as suggested by Figure 6). Therefore, these analyses should be provided upon standard 4h stimulations with IL12/18 and also with PMA/I under BFA. Note: liver and splenic cNK cells look quite different in the chosen histograms in Figures 7 A, B, C, yet there is massive variability in these analyses - is there any systematic/technical problem?

      Response 4: We appreciate the valuable suggestion from the reviewer. Studies have suggested that, at the gene expression or transcriptomic level, liver NK cells exhibit more similarity to splenic NK cells while displaying greater divergence from liver ILC1s. However, we do not think that splenic NK cells or peripheral blood NK cells (which are more abundant in circulation) are entirely indistinguishable from liver NK cells. Notably, there are substantial differences in their maturity levels, with liver NK cells being more mature. Since we are examining the protein levels, a 4-hour stimulation period may not fully capture these distinctions. Even when considering the potential impact of bystander cells, the experimental design specifically targets Prdm1 knockout within NK cells, ensuring that the study accurately elucidates the role of Prdm1 in NK cells. For each experiment, we have implemented control measures, and any variances observed in the figures may be attributed to individual variations among the animals. It is also possible that the MFI values measured by flow cytometry exhibit larger variations than a percentage.

      Comment 5: Figure 4 H/I - In contrast to NK cells in Fig 4E, F, the KO and WT ILC1s seem to co-cluster largely. Authors should validate differentially expressed genes. How strong is the effect of Blimp1 in ILC1s? Or is Blimp1 a critical TF driving effector differentiation in NK cells, while it has only subtle effects in ILC1 (these may be regulated by Hobit?)? This seems an interesting finding that should at least be discussed. For these types of small differences in ILC1, FACS confirmation analyses should be performed and findings be reevaluated using Cre-expressing controls (see above).

      Response 5: We appreciate the suggestion from the reviewer. As request, we analyze the DEGs in liver cNK cells and ILC1s from our scRNA-seq data (revised Supplemental Figure 8, A and B). There only a few valuable DEGs in ILC1s compared to cNK cells. It’s likely that Prdm1 have more essential effect of cNK cells transcriptional program, while it plays more important role in keep the homeostasis of ILC1s population. We have discussed these points to better inform the readers. (page 27; line 554-561): 

      “Previous studies have identified Hobit and Prdm1 as central regulators instructing tissue-dependent programs and retention of diverse tissue-resident lymphocytes (18, 51, 53). Liver ILC1s required Hobit, but not necessary for cNK cells (6). Expression of Prdm1 was remarkably higher in cNK cells versus ILC1s (18). While in our study, cNK cells and liver ILC1s reduced simultaneously in Prdm1ΔNcr1 mice, and even more significant in ILC1s. This indicates that while Prdm1 is expressed at lower levels in ILC1s, its role in preserving the quantity of ILC1s may be more crucial. Thus, Prdm1 and Hobit may have parallel program in instructing ILC1s functional development and maturation.”. 

      We cannot find valuable surface marker to evaluate the change in ILC1s, as most of changes are intracellular markers.

      Comment 6: The authors describe and discuss some of Figure 1 and 2 data as if Blimp1 would be involved in alternative NK versus ILC1 fates, but there is no evidence for this.

      Response 6: There is no evidence that Prdm1 could alter the fate decision of the progenitor towards liver cNK or ILC1s. Although some studies reported the conversion between cNK cells and ILC1s in special contexts, it was widely accepted that liver cNK cells and ILC1s originated from different progenitors. While we observed changes in the proportions of liver cNK cells and ILC1 in Prdm1 KO mice, we still lack sufficient evidence to support the relative independence of NK and ILC1 development, as well as evidence to indicate that Prdm1 is exclusively responsible for NK and ILC1.

      Regarding the changes in NK and ILC1 proportions after Prdm1 KO, we believe that both NK and ILC1 cells require Prdm1 to maintain their populations, with ILC1 possibly requiring it to a greater extent. This is the reason for the altered balance between NK and ILC1 cells following Prdm1 KO. We wish to clarify this point to prevent any misconceptions among readers. To address this, we have added the following content to the discussion section (page 25; line 509-516):

      “Furthermore, although both liver NK cells and liver ILC1s require Prdm1 to maintain their quantity, liver ILC1s demonstrate a more pronounced dependency on Prdm1. However, it is currently widely believed that liver NK cells and liver ILC1s originate from different progenitors. It is worth noting that while we observed changes in the NK and ILC1 proportions after Prdm1 knockout, our data does not support the hypothesis that Prdm1 affects progenitor differentiation decisions, thereby influencing the fate selection of NK and ILC1. Further research may be needed to elucidate how Prdm1 regulates the balance between NK cells and ILC1s.”.

      Comment 7: There are several recent studies suggesting a role for Hobit, homologue of Blimp1, in NK cells and in ILC1, and in the control of liver metastases. The authors should discuss similar and unique functions of Hobit and Blimp1, also in the regulation of gene expression patterns, and should refer to these studies.

      Response 7: We would like to express our gratitude to the reviewer for your insightful comments, which bring forth a critical perspective. In accordance with the reviewer's suggestion, we have updated our discussion to include the diverse functions guided by Hobit and Prdm1 in regulating the development and function of cNK cells and ILC1s (page 27; line 554-561):

      “Previous studies have identified Hobit and Prdm1 as central regulators instructing tissue-dependent programs and retention of diverse tissue-resident lymphocytes (18, 51, 53). Liver ILC1s required Hobit, but not necessary for cNK cells (6). Expression of Prdm1 was remarkably higher in cNK cells versus ILC1s (18). While in our study, cNK cells and liver ILC1s reduced simultaneously in Prdm1ΔNcr1 mice, and even more significant in ILC1s. This indicates that while Prdm1 is expressed at lower levels in ILC1s, its role in preserving the quantity of ILC1s may be more crucial. Thus, Prdm1 and Hobit may have parallel program in instructing ILC1s functional development and maturation.”.

      As shown in Supplemental Figure 8, we analyzed two published scRNA-seq data performed with Hobit_KO mice and integrated DEGs in cNK cells and ILC1s with our data. We observed overlaps of DEGs in _Prdm1_Δ_Ncr1 and Hobit_KO between cNK cells and ILC1s, such as _Junb, Tcf7, Gzmb, and Prf1 (Supplemental Figure 8), indicating the similar regulatory network of Prdm1 and Hobit. These data are now described on page 19; lines 386-395:   

      “We also compared the gene expression patterns between Prdm1 and Hobit (homologue of Blimp1) with two published scRNA-seq data (51, 53). Following the knockout of Hobit, the DEGs were primarily identified within ILC1s. Conversely, after the knockout of Prdm1, a greater number of DEGs were observed in cNK cells. This indicates that Prdm1 likely possesses a broader range of target genes within cNK cells, whereas Hobit appears to have a more pronounced impact on gene expression within ILC1s (Supplemental Figure 8, C-F). There are some overlaps between the downstream transcriptional profile of Prdm1 and Hobit in liver cNK cells and ILC1s (Supplemental Figure 8, G and H), such as Junb, Fosb, Tcf7, Kit, Gzmb, Prf1, and Cxcr6 was simultaneously upregulated or downregulated in both Prdm1ΔNcr1 and _Hobit_KO liver cNK cells or ILC1s, indicating the similar regulatory networks of Prdm1 and Hobit.”.

      Comment 8: Figure 4: The authors should discuss (and cross-validate) their liver gene expression analyses in the context of published datasets of NK and ILC1, such as the ones by Lopez et al, Friedrich et al, Ducimetiere et al and Yomogida et al.

      Response 8: We thank the reviewer for raising this important point. To address this question, we have now analyzed the gene expression of liver cNK cells and ILC1 in two published data mentioned above, also in the context of Hobit-knock. We compared gene expression of different clusters and described in our revised manuscript (page 19; lines 386-395). 

      “We also compared the gene expression patterns between Prdm1 and Hobit (homologue of Blimp1) with two published scRNA-seq data (51, 53). Following the knockout of Hobit, the DEGs were primarily identified within ILC1s. Conversely, after the knockout of Prdm1, a greater number of DEGs were observed in cNK cells. This indicates that Prdm1 likely possesses a broader range of target genes within cNK cells, whereas Hobit appears to have a more pronounced impact on gene expression within ILC1s (Supplemental Figure 8, C-F). There are some overlaps between the downstream transcriptional profile of Prdm1 and Hobit in liver cNK cells and ILC1s (Supplemental Figure 8, G and H), such as Junb, Fosb, Tcf7, Kit, Gzmb, Prf1, and Cxcr6 was simultaneously upregulated or downregulated in both Prdm1ΔNcr1 and _Hobit_KO liver cNK cells or ILC1s, indicating the similar regulatory networks of Prdm1 and Hobit.”.

      Recommendations For The Authors:

      Comment 9: The use of a paired t-test analysis when comparing cells/groups from different mice is not correct. Instead, the authors should consider using e.g. an unpaired t-test and re-test the indicated significance (e.g. Figure 1F, Figure 2H).

      Response 9: We thank the reviewer’s comments. As we used littermates for the experiments and they are compared side by side, so the paired t-test analysis is acceptable. We reanalysis the significance in the results of Figure 1F, and Figure 2H using unpaired t-test. The statistics significance of Figure 1F using unpaired t-test was same as using t-test. However, in Figure 2H, the reduced IFN-γ production not reach statistics significance when used un-paired t-test (Supplemental Figure 12B). It may attribute to the variation between different littermates, but the trend is still under the scope of our conclusion. We believe that employing a paired t-test between littermates could be also meaningful. As such, we kept both statistical methodologies to ensure a thorough evaluation.

      Comment 10: In several instances, it is unclear whether data are pooled or representative (and if so, of how many analyses). This information needs to be provided for all analyses. 

      Response 10: We apologize for the lack of details and have now provided the sufficient information in our figure legends. 

      For example, we delete the number in original histogram to avoid the misunderstanding of the unclear whether data are pooled or representative (e.g. original Figure7 A-C; revised Figure7 A-C). Furthermore, we added the “representative” in figure legends of all flow cytometric plots to better inform readers (e.g. original Figure2, D and F; revised Figure2, B and D).

      Comment 11: In the title and abstract authors use "type 1 ILCs" for both NK cells and ILC1, and it is difficult to understand which phenotypes correspond to cNK cells versus ILC1. Most of the analyses clearly separate these two different cell types. I would appreciate a lot being more accurate in the abstract, and describing cNK and ILC1 phenotypes in a clear way.

      Response 11: We are really sorry for our inaccurate descriptions. According to Spits et al., (Spits et al., Nature Reviews Immunology, 2013) and other related studies, we have now adopted a more appropriate nomenclature as “Conventional NK cells” correspond to “cNK cells”, “Type 1 innate lymphoid cells” to “ILC1s”, and “Group 1 ILC” as the collective name of cNK and ILC1s. 

      The definition of these cells was described in the introduction (page 4, line 52-53; line58-62): 

      “Group 1 ILCs consist of cNK cells and ILC1s (1, 2), with distinct developmental trajectories and effect molecules (3).”, “In a state of homeostasis, liver group 1 ILCs (CD45+CD3-NK1.1+NKp46+) can be discriminated into cNK cells and ILC1s by the differential expression of CD49a and CD49b (2): cNK cells are marked by the expression of CD49b, while liver ILC1s exhibit a distinctive positivity for CD49a. Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) is also expressed on liver ILC1s, but not on cNK cells (10, 11).”. 

      We also describe cNK and ILC1 phenotypes in our scRNA-seq data, as shown in page 13; line 259-261: 

      “cNK cells expressed high levels of Itga2 (CD49b) and Eomes, while ILC1s had high levels expression of Itga1 (CD49a) and Tnfsf10 (Supplemental Figure 5, F and G).”.

      Comment 12: In the abstract authors state "The present study unveiled a novel regulatory mechanism of Prdm1 in liver Type 1 ILCs, showing promising potential for developing innovative immune therapy strategies against liver cancer." - maybe authors should discuss how their findings could be used for therapeutic approaches?

      Response 12: We appreciate comments from the reviewer. As there hasn't been a clear consensus on the role of Prdm1 in NK cells prior to this, some studies have suggested that Prdm1 can inhibit cytokine secretion by NK cells. Particularly, Kallies et al. in their 2011 article in Blood found that Prdm1 might suppress NK cell anti-tumor activity. Hence, there hasn't been any immunotherapy targeting Prdm1 in NK cells for cancer treatment. Our research demonstrates the enhancing role of Prdm1 in NK cell anti-tumor activity, providing theoretical support for NK cell therapy targeting Prdm1. 

      We added the following content to the discussion section (page 29; line 605-609): 

      “Further research may provide deeper insight into the role of PRDM1 in the anti-tumor function of human NK cells, enabling a more direct investigation of its application in cancer therapies. Given its important role in preserving liver cNK cells and ILC1s functional heterogeneity, enhancing Prdm1 function in human NK cells could potentially be a strategy to promote NK cell-based immunotherapy for cancer.”.

      Comment 13: The authors should explain or interpret their data a bit more (e.g. what is the consequence of GSEA enriched in negative regulation of Il6 production? (Fig. 3D)  do NK cells produce Il6 (Figure 3)? What's the impact of Il17 signaling in NK/ILC1 (Figure 5). Do the authors suggest JunB-driven metabolic reprogramming (Suppl. Fig 6D-F?).

      Response 13: We appreciate comments from the reviewer. The question of IL-6 production in NK cell also raised by another reviewer. We have checked the GSEA results, and found no valuable genes in IL-6 production in NK cells. According to the suggestions of another reviewer (Response to Reviewer 2 Comment, Comment 14), it may be prudent to omit this figure.

      IL-17 signaling indicated the plasticity of ILC1s, that may be originated from the differentiation of ILC3, we added more discussion of this part (page 17; line 341-344). 

      “Several ILC3 signature genes, such as Rora, Tmem176a, and Tmem176b (45), highly expressed in this cluster (Supplemental Figure 7D). Considering the close relationship between IL-17 mediated immunity response and ILC3 (1, 46), it is plausible that _Il7r_hi ILC1 cluster may be attributed, at least in part, to potential plasticity between ILC1 and ILC3 subsets.”.

      The decreased mitochondrial function may have more relevance to NK cell exhaustion in tumors. Our data suggest that the elevated expression of JunB in NK cells may predispose them to exhaustion. Currently, our hypothesis regarding the promotion of NK cell exhaustion by high JunB expression is based on the observed correlation between JunB expression levels and exhaustion phenotypes (at the gene expression and IFN-γ secretion levels) and the findings in reference 67 (Lynn et al., Nature, 2019), where JunB was found to promote T cell exhaustion. However, we have not demonstrated causation between high JunB expression and exhaustion in NK cells. We propose that in NK cells, especially mature NK cells, excessive JunB expression may make them more sensitive to exhaustion inducers. Nevertheless, further research is needed to confirm this. To clarify this, we added the following content in the discussion section (page 26; line 537-543): 

      “While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junbhi cluster, demonstrates an exhaustion-like phenotype.

      The significant increase in this cell population following Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 14: Ref 25 and Ref 57 are the same publication?

      Response 14: We are really sorry for our careless mistakes. We have checked all the reference and corrected the wrong format.

      Comment 15: Figure 1, E - The method description of RT-PCR is missing. I apologize if I have overlooked this information.

      Response 15: We have now added the description of RT-PCR in our revised method section (page 31; line 638-644):

      “RNA was extracted from FACS-sorted NK cells or splenocytes using RNASimple Total RNA Kit (TIANGEN Biotech, 4992858) and subsequently reverse transcribed to cDNA with SuperScript VILO Master Mix (Thermo Fisher Scientific, 11755050) according to manufacturer’s instructions. qPCR was performed with SYBR Green Mix (Thermo Fisher Scientific, A25742) and CFX Opus 96 Real-Time PCR System (Bio-Rad). The relative mRNA expression level was calculated using 2-ddCt method. Primer sequences:           Prdm1: 5’-CAGAAACACTACTTGGTACA-3’; 5’-GATTGCTTGTGCTGCTAA-3’.”

      Comment 16: Figure 1, F - The NKp46+CD3- gate for the liver seems to cut the population, not all cells are included.

      Response 16: We appreciate the review’s comment and apologize for our carelessness. We expend our data with more samples and reanalyzed them with a more convincing gating strategy. We now update our figures (revised Figure 1G; revised Supplemental Figure 2A). Several changes have occurred in the data and conclusions, and we have accordingly revised these contents in our manuscript.

      The original text is:

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage of cNK cells (CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues except bone marrow and lymph nodes (Figure 1F; Supplemental Figure 2A). However, no significant difference was observed in the percentage of cNK cells among bone marrow-derived lymphocytes between Prdm1ΔNcr1 and Prdm1+/+ mice. The absolute number of cNK cells in blood, lung, liver, and spleen also decreased in Prdm1ΔNcr1 mice (Figure 1F; Supplemental Figure 2A). Only a slight decrease in the number of cNK cells was observed in the lymph nodes of Prdm1ΔNcr1 mice, which did not reach statistical significance either (Supplemental Figure 2A). In contrast, the absolute number of cNK cells in Prdm1fl/fl mice bone marrow is moderately higher than Prdm1ΔNcr1 mice (Figure 1F).”

      The revised text is (page 8; line 142-146):

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage and absolute number of NK cells (CD45+CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues, whereas increased number of NK cells were observed in bone marrow (Figure 1G; Supplemental Figure 2A).”

      Comment 17: Figure 1, The y-axis labeling of lung CD3-NKp46+ cells (x10^3) is not correct.

      Response 17: We are really sorry for our carelessness. We now check the labels and make sure they are correct.

      Comment 18: Figure 1, The statistical significance of absolute numbers of NKp46+ cells in the bone marrow should be reviewed.

      Response 18: We expend our data with more samples and reanalyzed them with a more convincing gating strategy. We observed significant increase of bone marrow NK cells quantity in our updated data. These changes are now described in our revised manuscript.

      The original text is: 

      “However, no significant difference was observed in the percentage of cNK cells among bone marrow-derived lymphocytes between Prdm1ΔNcr1 and Prdm1+/+ mice”, “In contrast, the absolute number of cNK cells in Prdm1fl/fl mice bone marrow is moderately higher than Prdm1ΔNcr1 mice (Figure 1F).”

      The revised text is (page 8; line 142-146):

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage and absolute number of NK cells (CD45+CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues, whereas increased number of NK cells were observed in bone marrow (Figure 1G; Supplemental Figure 2A).”

      Comment 19: Figure 1, G - CD27 and CD11b are used to define maturation stages within NK cells. Here the authors are analyzing group 1 ILC instead (containing both NK cells and ILC1, especially in the liver). It would be better to pre-gate on Eomes+ or CD49b+ NK cells for this analysis.

      Response 19: We apologize for the lack of details in this analysis. We have pre-gate CD49b+ NK cells for the maturation stages analysis. We have now added this statement in our revised manuscript and figure legend (page 8; line 149-151)

      “The maturation of cNK cells (gated by CD45+CD3-NK1.1+NKp46+CD49b+) from blood, bone marrow, lung, liver, spleen, and lymph nodes were assessed, based on the expression of CD11b and CD27.”.

      Comment 20: Supplementary Figure 1, A - The NKp46+CD3- gate seems to cut the population, not all cells are included. y-axis labeling of spleen CD3-NKp46+ cells (%) is not correct.

      Response 20: Thanks, we have corrected these errors and shown in our revised supplementary Figure 2A.

      Comment 21: Figure 2, D-G - Did the authors analyse the ILC1/NK compartment of the tumor? What is the abundance and phenotype of these cells dependent on Prdm1 expression? Proper Crecontrols should be used (see above).

      Response 21: We appreciate the suggestions from the reviewer. As request, we have now added the analysis of cNK/ILC1s population in the context of tumor. The proportion changes of cNK cells and ILC1s in Prdm1_Δ_Ncr1 mice was similar with the no tumor-burden condition, while the number of both cNK cells and ILC1s decreased in tumor bearing liver (revised Figure 7D). These contents have been updated in our revised manuscript (page 23; line 479-481):

      “The proportion changes of cNK cells and ILC1s in Prdm1ΔNcr1 mice was similar with the no tumorburden condition, while the number of both cNK cells and ILC1s have significant decreased in tumor-bearing liver (Figure 7D).”.

      The reason why we did not use Cre-controls was described in comment 1.

      Comment 22: Figure 2, H - Prdm1-deficient NK and ILC1 produce less Ifng in response to in vitro stimulations with Il-12 and /or Il-18, and bulk Seq analysis (Fig 3F) shows reduced Il12rb2 expression. Does the expression of cytokine receptors correlate with the maturation of NK cells? This could be analyzed from the single-cell RNA-seq dataset. The statistical significance of %Ifng after Il12/Il18 stimulation should be revisited (see above).

      Response 22: We thank the reviewer for the suggestions. To address this question, we explored the expression of IL-12 and IL-18 receptors in cNK and ILC1 clusters. Within cNK clusters, Il12rb2, Il18r1 and Il18rap was highly expressed in Prf1hi and Cxcr3hi cNK clusters (revised Supplemental Figure 6H), indicating the IL-18 receptor expression correlated with the NK cell maturation. While in ILC1, these receptors mostly expressed on Il7r_hi and _Gzmb_hi ILC1 clusters (revised Supplemental Figure 7C). Significant decreased of _Il18r1 expression in Prdm1_Δ_Ncr1 cNK cells and ILC1s may associated with the impaired ability to produce IFN-γ. We now added this analysis (page 18; line 364-368):

      “Within cNK cells, Il12rb2, Il18r1 and Il18rap was highly expressed in Prf1hi and Cxcr3hi cNK clusters (Supplemental Figure 6I), indicating the IL-18 receptor expression correlated with the NK cell maturation. While in ILC1, these receptors mostly expressed on Il7r_hi and _Gzmb_hi ILC1 clusters (Supplemental Figure 7D). Significant decreased of _Il18r1 expression in Prdm1ΔNcr1 cNK cells and ILC1s may associated with the impaired ability to produce IFN-γ.”.

      The un-paired t test of IFN-γ production was displayed in revised supplemental Figure 12 B. Difference in IFN-γ production was found to be not significant when analyzed using an unpaired ttest in original Figure 2 H. However, significance was observed in tumor-bearing liver cNK cells and ILC1s, specifically under the context of IL-12/IL-18 stimulation, as depicted in the original Figure 7E using an unpaired t-test. These variations may be attributed to differences among different littermates. Despite these variations, the trend remains consistent with our overall conclusions. We believe that employing a paired t-test between littermates could be also meaningful. As such, we kept both statistical methodologies to ensure a thorough evaluation.

      Comment 23: Figure 3, A-E - For bulk sequencing analysis, splenic CD3-NK1.1+NKp46+ were isolated. This population also contains ILC1 in the spleen (e.g. Flommersfeld et al.), although much less abundant compared to NK cells, and compared to the liver compartment. However, have the authors tested the abundance of splenic ILC1 in Prdm1-deficient mice, which may impact the gene expression data? In line with this the detection of altered Cxcr6 expression in Figure F, which is usually expressed by ILC1 rather than NK cells, may indicate an alteration in ILC1 numbers. The authors should validate the altered expression of CXCR6, Itga1, and Cx3cr1 on NK cells by flow cytometry.

      Response 23: We cited the work of Flommersfeld et al. into our manuscript and have expanded our Results section to include the following information (page 19; line 377-385):

      “Previous research found that spleen NK cells could be divided into three distinct groups based on their expression levels of CD27, CD62L, CD49a, and CD49b (52). CD27+CD62L- NK cells have remarkable high expression of Batf3, while it was only barely expressed in CD27+CD62L+ and CD27-CD62L+ NK cells (52). Based the sequencing data published by Flommersfeld et al., (GSE180978), a notable negative correlation was observed between the expression levels of Prdm1 and Batf3 (Supplemental Figure 8I). On top of that, our findings unveiled the negative regulatory influence of Prdm1 on Batf3 within both spleen and liver NK cells. This discovery highlights a potential upstream mechanism that may influence the hemostasis of the spleen NK cell subpopulations through Batf3.”.

      We validated the expression of CD49a (Itga1) and CX3CR1 in liver cNK cells and ILC1s in our revised manuscript, which is described in our revised manuscript (page 9; line 170-174, page 14; line 231-233):

      “Increased CD49a expression was also observed in Prdm1ΔNcr1 liver ILC1s, while it showed decreased expression in NKp46+ cells in the liver, bone marrow, and lymph nodes (Supplemental Figure 2, F and G).”, “The percentage of CX3CR1+ cNK cells was significantly decreased in multiple tissues of Prdm1_Δ_Ncr1 mice, while the proportion of CX3CR1+ ILC1 was increased in the liver (Figure 3F).”

      Comment 24: Figure 3, F - Tnfsf26: which gene is this? is this a typo? Is a function of this gene in NK cells reported? Altered Batf3 expression suggests an impact on ILC1-like NK cells (Flommersfeld et al).

      Response 24: We are very sorry for our mistakes. We have removed Tnfrsf26 from the heatmap.

      Comment 25: Figure 3, G-J refer to Kallies data?! 

      Response 25: Kallies‘s data has mentioned the reduced GzmB expression in Blimp1gfp/gfp mice. However, compared with Kallies’s study, we further analyzed the GzmB and Perforin expression in different mature stages of NK cells. Reduced GzmB expression not only due to the less mature phenotype in Prdm1-deficient NK cells, highlighting the role of Prdm1 in regulating NK cell function. So, we added these contents in the revised manuscript (page 12; line 233-242):

      “Lower GZMB and PRF1 production was observed in Prdm1-deficient splenic cNK cells, liver cNK cells and ILC1s (Figure 3, H-K; Supplemental Figure 4, A-I). Notably, the proportion of GZMB+ and PRF1+ cNK cells was decreased among almost all of the maturation stages of cNK cells (Figure 3, J and K). The relative mean fluorescent intensities (MFIs) of GZMB and PRF1 consistently show a reduction across all developmental stages in PrdmΔNcr1 NK cells (Supplemental Figure 4, H and I). Yet, no statistical difference of PRF1 was found within the CD11b-CD27+ and CD11b+CD27+ subsets, likely due to the relatively lower perforin levels in these populations (Supplemental Figure 4I). These findings suggest that Prdm1 may directly influence cytotoxic molecule in NK cells, rather than impacting their anti-tumor abilities solely by affecting the maturation phenotype of Prdm1-deficient NK cells.”

      In Discussion section (Kallies’s work is cited here in revised manuscript) (page 24; line 500-502):

      “Our results not only confirmed a decrease in cytotoxic molecules in Prdm1-deficient NK cells (31) but also showed that the reduction in Gzmb and perforin is not solely attributable to the diminished maturation of these cells.”

      Comment 26: Figure 3, G, I - How do the authors explain the high variability of GzmB and Prf1 in Prdm1+/+ cells? 2 samples have comparable values to Prdm1-deficient cells.

      Response 26: This may be due to the inherent differences in MFI among different samples. In the revised version, we have added data on percentages, which exhibit much less variability (Figure 3, H and I). The MFIs of GZMB and PRF1 are moved to supplemental Figure 4 E and F.

      Comment 27: Did the authors test the mice for potential germline recombination of the floxed allele, which has been suggested as a potential problem of Ncr1cre?

      Response 27: We appreciate the insightful comments provided by the reviewer, and this is a really good question. In Prdm1fl/fl mice, germline recombination typically results in a systemic knockout of Prdm1, which can lead to embryonic lethality. Given that mice were successfully born in the current study, it is almost unlikely that germline recombination of Prdm1 occurred due to leaky expression of Cre.

      To confirm this issue, we isolated splenocytes and assessed Prdm1 expression using qPCR. We observed no significant difference in Prdm1 expression between splenocytes from Prdm1+/+ and Prdm1ΔNcr1 mice (revised Figure 1F). This also indicated that germline recombination issues are unlikely to be present in the Prdm1ΔNcr1 mice.

      Comment 28: Histograms do not show MFI

      Response 28: We appreciate the comments provided by the reviewer. The MFI value was omitted.

      Comment 29: Supplementary Figure 4, B - FACS plot labelling: Typo, Histograms do not show MFI.

      Response 29: We sincerely thank the reviewer for careful reading. The typo in this figure was corrected. The MFI is omitted.

      Comment 30: Figure 4, A - What are the cells in the red cluster in the middle of the UMAP, do they belong to B cells? Why do they cluster so separately? It is interesting, but also surprising that NK and ILC1 cluster map so far apart from each other (rather with CD8 or B cells? or NKT cells) - do the authors have any comments?

      Response 30: We sincerely apologize for the mistakes in labeling a group of cells in our previous analysis. Upon a thorough re-evaluation, we have corrected the labels of several cell clusters that were previously misidentified. The revised heatmap (revised Supplemental Figure 5C) represents the marker genes for each cluster. Additionally, in our updated analysis (revised Figure 4A), we have included clusters for Epithelial cells, CD4+ T cells, NKT cells, and Kupffer cells. Please note, the red cluster identified in the center of the original heatmap corresponds to the CD4+ T cells.

      We checked the markers of cNK cell and ILC1 clusters and confirmed they are labeled correctly, as Ncr1 and Klrb1c (NK1.1) was highly expressed in these clusters compared to others (revised Supplemental Figures 5E).

      Comment 31: Does Junb expression correlate with the maturation stages of NK cells?

      Response 31: Our previous research indicated that during the maturation process of NK cells, there was a decrease in the expression levels of Junb (negative correlation), whereas there was an increase in the expression levels of Prdm1 (Wang et al., J Clin Invest, 2018; Supplemental Figure 5c and Supplemental Figure 11).

      Comment 32: The authors may consider validating their scRNA-seq data (e.g. by FACS analysis for highlighted markers, eg. cKit, Tcf7, Gzma, Cxcr3).

      Response 32: We appreciate the suggestion from the reviewer. We validated several marker genes, including Gzmb, Prf1, and Cx3cr1 by FACS, as shown in the revised Figure 3 F-K. Currently, FACS cannot distinguish liver NK cells into as many distinct clusters as can be achieved through scRNAseq analysis. However, we expect that as technology progresses, we will be able to enhance our validation of the scRNA-seq data.

      Comment 33: It is a bit unclear to me why authors refer to Cxcr3hi NK cells as tissue-resident. This is based on Cxcr3 and Ccr2 expression. To make this statement, a much more detailed analysis would be required. How are CD69, CD49a, or CXCR6 expression of these cells?

      Response 34: We appreciate the suggestion from the reviewer. The primary reason for classifying this specific cluster of NK cells as tissue-resident is derived from the differential expression genes (DEGs) and Gene Ontology (GO) analysis, which demonstrate significant chemokine receptor activity within this cluster.

      To make this statement more clearly, we check the expression of the above markers, but only Cd69 had expression in cNK clusters, which was highly expressed in _Junb_hi and _Cxcr3_hi cNK cells (revised Supplemental Figure 6D). We also used top30 DEGs in ILC1s versus cNK to calculate the module score in all cNK clusters, as _Cxcr3_hi cNK had highest score among these clusters (revised Supplemental Figure 6D). This part has been updated in our manuscript (page 15; line 298-308):

      “Expression of tissue-resident markers Cd69 was also highly expressed in this clusters (Supplemental Figure 6D). The enrichment of chemokine receptors in the genes upregulated in the Cxcr3_hi cluster implying a greater likelihood of this cluster being tissue-resident compared with other cNK cell clusters (Figure 4H). To further confirmed tissue-resident properties of this clusters, we calculated the module score based on top30 DEGs in ILC1 versus cNK clusters, including _Cxcr6, Itga1, Cd160, Cd226, etc. _Cxcr3_hi cNK clusters have the highest score among all cNK clusters (Supplemental Figure 6H), indicating the similarity with liver ILC1s. In the tumor microenvironment, reports indicated that NK cells could transform into ILC1s (25). If this conversion of cNK cells into ILC1s also occurred under normal physiological conditions, then _Cxcr3_hi cNK cell cluster might be the most susceptible to such transformation.”

      Comment 35: The authors suggest that Prdm1 regulates chemokine receptor expression. An alternative explanation could be that this is an indirect effect of altering the abundance of NK cell subsets.

      Response 35: We are sorry for lacking the details in these figures. The input cell number of each genotype has now been added in following figure legends. 

      Figure 4F, “Proportions of cNK cells among total cNK cells (left; 211 cells in Prdm1+/+, and 141 cells in Prdm1ΔNcr1) and within clusters (right).”; Figure 5C, “Proportions of ILC1s among total ILC1s in different genotypes (left; 114 cells in Prdm1+/+, and 63 cells in Prdm1ΔNcr1) and within each cluster (right).”; Figure 6C, “Proportions of MDMs and KCs among total macrophages in different genotypes (510 cells in Prdm1+/+, and 624 cells in Prdm1ΔNcr1).”

      To minimize the effects of discrepancies in input numbers between samples with different genotypes, we represented the relative proportions of each cluster within its specific genotype (e.g. Supplemental Figure 6B; Supplemental Figure 7B; Supplemental Figure 9B).

      Comment 36: Supplementary Figures 6 and 7, A - The formatting of gene annotations does not fit the heat maps (the gene names on the last rows are missing).

      Response 36: We apologize for our careless mistakes. We have now addressed these mistakes.

      Comment 37: Supplementary Figures 6 and 7, What is the consequence of compromised mitochondrial function? Increase apoptosis?

      Response 37: In our experiments, we did not find that Prdm1 has an effect on the apoptosis of NK cells. Conversely, previous studies have found that Prdm1 might inhibit the proliferation of NK cells (C. Kucuk, et. al., PNAS, 2011). We acknowledge that there is ongoing debate regarding the precise definition of NK cell exhaustion. In our experiments, no changes were detected in the expression levels of surface markers (TIGIT) associated with exhaustion on NK cells following the knockout of Prdm1. However, we did note a significant reduction in the cytokine secretion capacity and tumor control efficacy of NK cells after Prdm1 knockout. We prefer to say that the consequence of compromised mitochondrial function might be increased exhaustion. As we mentioned in discussion part (line 482-483), mitochondrial fragmentation has been confirmed to be closely associated with NK cell exhaustion in tumor (Zheng et al. Nature immunology, 2019). Although the evidence to define the exhausted NK cells in Prdm1_Δ_Ncr1 was not sufficient, our data may support the compromised mitochondrial functions, at least in part, associated with the exhausted phenotype of Prdm1_Δ_Ncr1 NK cells in cancer. 

      We have discussed these points in our revised manuscript (page 26; line 529-543): 

      “Mitochondria are pivotal organelles crucial for cellular metabolism. Disruptions in mitochondrial function have been linked to T Cell exhaustion, attributed to glycolytic reprogramming (66). Similarly, mitochondrial fragmentation has been closely associated with NK cell exhaustion (67).

      However, the concept of NK cell exhaustion isn't as firmly established as it is for T cells. Exhausted NK cells should primarily exhibit diminished functions. This is characterized by a diminished ability to destroy tumor cells, a reduced capability to activate other components of the immune system, and compromised proliferation and survival rates. Additionally, this reduced functionality is associated with a decline in the expression of molecules responsible for cytotoxic activity, lower production of IFN-γ, and metabolic disturbances that may arise from mitochondrial dysfunction. While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junb_hi cluster, demonstrates an exhaustion-like phenotype. The significant increase in this cell population following _Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 38: Figure 5, Describing the scRNA Seq data, the authors are switching a lot between Figure 4 and Figure 5. Maybe a reorganization of the Figures (Figure 4: NK cell; Figure 5: ILC1) could help.

      Response 38: We appreciate the reviewer’s suggestion. We have now reorganized the Figure 4 and Figure 5.

      Comment 39: Figure 5, We suggest naming one of the ILC1 clusters "Gzmbhi" to keep it consistent with the FACS data.

      Response 39: We agree with this excellent suggestion and have now renaming the “Gzmahi” ILC1 cluster as “Gzmbhi” ILC1 cluster.

      Comment 40: Figure 5, C - How was the JunB score derived (which genes were used)?

      Response 40: The JunB score was calculated based on the expression of marker genes in _Junb_hi cNK clusters (DEGs in _Junb_hi cNK cluster compared to other clusters, as shown in revised Supplemental figure 6A). The score was calculated using “AddModuleScore” R package.

      Comment 41: Figure 5, G, I - The authors highlight Il17 signaling pathway, what is the impact of Il17 on NK/ILC1? Did the authors check for ILC3 (Rorc expression) within the ILC1 cluster?

      Response 41: The enrichment of IL-17 signaling pathway in Il7r_hi ILC1 indicated that this cluster encompass ILC1s originate from the conversion of Rorγt+ ILC3s. Although the Rorc expression was undetectable in all ILC1 clusters, we found several ILC3 marker genes highly expressed in this clusters (e.g. Rora, Tmem176a, Tmem176b) according to the ILC3 transcriptomes (Robinette et al., _Nature Immunology, 2015). 

      We have added these contents in our revised manuscript (page 17; line 341-344): 

      “Several ILC3 signature genes, such as Rora, Tmem176a, and Tmem176b (45), highly expressed in this cluster (Supplemental Figure 7D). Considering the close relationship between IL-17 mediated immunity response and ILC3 (1, 46), it is plausible that _Il7r_hi ILC1 cluster may be attributed, at least in part, to potential plasticity between ILC1 and ILC3 subsets.”.

      Comment 42: Figure 5, The authors detect more Ly49E+ cytotoxic ILC1 in Prdm1fl Ncr1cre mice.

      How does this observation fit to the reduced cytotoxicity of NK cells?

      Response 42: The proportion of _Klra_hi ILC1 was increased, while the _Gzmb_hi ILC1 was decreased in _Prdm1_ΔNcr1 mice. Moreover, total number of three ILC1 cluster was reduced in _Prdm1_ΔNcr1 mice.

      Comment 43: Line 350/351: Citation required.

      Response 43: We added the respective reference. (reference 55 and 56).

      Comment 44: Figure 6, The Cell-chat analysis provides interesting suggestions, but none are experimentally addressed. It is also difficult to evaluate these analyses: are any of the Mac subsets altered in frequency or phenotype in either genotype? This could be analyzed from the single-cell data in Fig 4. At the very least, flow cytometric validation of predicted shifts in the Mac compartment should be confirmed.

      Response 44: We gratefully thanks for these valuable suggestions. As requested, we analyzed macrophages and validated some of the scRNA-seq data by flow cytometry. We have re-written this part with the analysis of altered proportion of two macrophage clusters (Kupffer cells and Monocyte-derived macrophages) (page 20-21; line 399-436):

      “The scRNA sequencing analysis identified two well-established subpopulations of liver macrophages: the resident Kupffer Cells (KCs) and the Monocyte-Derived Macrophages (MDMs) (Figure 6, A-C; Supplemental Figure 9A). When comparing the total proportion of macrophages within the immune cell population of the liver between WT and Prdm1ΔNcr1 mice, there is an increase in Prdm1ΔNcr1 mice (Figure 6C). To confirm these findings, we utilized flow cytometry to define macrophages, including both KCs and MDMs, gating by CD45+Ly6G-F4/80+CD11b+ (Figure 6D).

      Our analysis showed that, following the deletion of Prdm1 in Group 1 ILCs, there is a significant increase in both the proportion and number of macrophages in the liver (Figure 6D).

      According to the transcriptional profile, liver macrophages further clustered and were labeled as “Ly6c2_hi”; “_Cxcl2_hi”; “_Ear2_hi” MDMs, and “_Mrc1_hi”; “_C1q_hi” KCs (Figure 6A, Supplemental Figure 9, A-E). Increased proportion of MDMs and KCs was observed in _Prdm1ΔNcr1 cells (Supplemental Figure 9B). Within MDMs clusters, Ly6c2_hi MDMs mainly compose of _Prdm1+/+ cells, while Prdm1ΔNcr1 cells concentrated in Cxcl2_hi cluster (Figure 6C). The scRNA-seq data reveal that following Prdm1 knockout in NKp46+ cells, there is a decrease in the proportion of KCs within the macrophage population, while the proportion of MDMs increases (Figure 6D). CX3CR1, a chemokine receptor, is extensively utilized to distinguish KCs and MDMs within macrophages. Cells expressing CX3CR1 are identified as MDMs, whereas those without CX3CR1 expression are categorized as KCs (56). Employing flow cytometry and leveraging CX3CR1 expression, we assessed the ratios of KCs and MDMs. However, diverging from the scRNA-seq findings, flow cytometry indicates that post-Prdm1 knockout in group 1 ILCs, there is a minor increase in the proportion of KCs within the total liver macrophages, and a decrease in the proportion of MDMs (Figure 6D; Supplemental Figure 9B). This discrepancy could stem from the different bases of classification: scRNA-seq defines KCs based on gene expression profiles, whereas flow cytometry differentiates between KCs and MDMs using the single surface marker, CX3CR1. Analysis of the macrophage subsets identified by scRNA-seq reveals that, while MDM clusters generally show high CX3CR1 expression, there exists a subset within MDMs, labeled _Mrc1hi, that also exhibits high levels of CX3CR1 (Supplemental Figure 9C). Consequently, if flow cytometry solely employs CX3CR1 for differentiating between KCs and MDMs, it could result in disparities when compared to scRNA-seq outcomes. Both KCs and MDMs has significantly increased in Prdm1ΔNcr1 mice, which was consist with the scRNA-seq data (Supplemental Figure 9, B and F). Despite the decrease in the proportion of Ly6c2hi MDMs in Prdm1ΔNcr1 mice, the expression levels of Ly6c2 exhibited minimal variation between WT and Prdm1ΔNcr1 mice (Supplemental Figure 9D). Intriguingly, within certain cellular subsets, notably the Ear2hi cluster, the Ly6c2 expression levels in KO mice were found to be higher than those in WT mice. Additionally, we employed flow cytometry to examine Ly6C expression within the macrophages. Similar with the scRNA-seq findings, there were no notable differences in Ly6C expression levels between WT and KO mice (Figure 6E; Supplemental Figure 9G).”.

      The changes of the macrophage compartment indicated the potential influence of functional NK cells to macrophages. We have revised these parts in our results and discussion (line 590-601). However, to address more analysis on macrophage is worthy but would go beyond the scope of this manuscript, which will be a direction of our further work.

      Comment 45: Figure 6, C1qhi Mac only are few cells/events, and interactions (or cells?) seem to be gone in the Prdm1-floxed mice. Is that true? Does it make sense to perform cell-chat analysis on so few cells?

      Response 45: We have now added KCs to the cell-chat analysis, and this cluster was belonged to C1qhi KCs. We have revised the analysis of corresponding parts in our manuscript (page 20-21; line 408-428):

      “According to the transcriptional profile, liver macrophages further clustered and were labeled as “Ly6c2_hi”; “_Cxcl2_hi”; “_Ear2_hi” MDMs, and “_Mrc1_hi”; “_C1q_hi” KCs (Figure 6A, Supplemental Figure 9, A-E). Increased proportion of MDMs and KCs was observed in _Prdm1ΔNcr1 cells (Supplemental Figure 9B). Within MDMs clusters, Ly6c2_hi MDMs mainly compose of _Prdm1+/+ cells, while Prdm1ΔNcr1 cells concentrated in Cxcl2_hi cluster (Figure 6C). The scRNA-seq data reveal that following Prdm1 knockout in NKp46+ cells, there is a decrease in the proportion of KCs within the macrophage population, while the proportion of MDMs increases (Figure 6D). CX3CR1, a chemokine receptor, is extensively utilized to distinguish KCs and MDMs within macrophages. Cells expressing CX3CR1 are identified as MDMs, whereas those without CX3CR1 expression are categorized as KCs (56). Employing flow cytometry and leveraging CX3CR1 expression, we assessed the ratios of KCs and MDMs. However, diverging from the scRNA-seq findings, flow cytometry indicates that post-Prdm1 knockout in group 1 ILCs, there is a minor increase in the proportion of KCs within the total liver macrophages, and a decrease in the proportion of MDMs (Figure 6D; Supplemental Figure 9B). This discrepancy could stem from the different bases of classification: scRNA-seq defines KCs based on gene expression profiles, whereas flow cytometry differentiates between KCs and MDMs using the single surface marker, CX3CR1. Analysis of the macrophage subsets identified by scRNA-seq reveals that, while MDM clusters generally show high CX3CR1 expression, there exists a subset within MDMs, labeled _Mrc1hi, that also exhibits high levels of CX3CR1 (Supplemental Figure 9C). Consequently, if flow cytometry solely employs CX3CR1 for differentiating between KCs and MDMs, it could result in disparities when compared to scRNA-seq outcomes.”.

      Comment 46: Figure 6, C - Here the interactions of both Mac+ILC1 and Mac+NK are shown together. It would be interesting to separate this analysis (also Suppl. Fig 9A-B) into comparisons of Mac+ILC1 vs Mac1+NK from WT or Prdm1fl Ncr1 mice.

      Response 46: As request, we re-analyzed this part in each genotype, which was showed in the Supplemental Figure 10. These data have now been described in (page 22; line 445-447).

      “The reduction of interaction mostly occurred in the cross-talk of ILC1-MDM and ILC1-KC, whereas no difference was observed in cNK-MDM and cNK-KC interaction (Supplemental Figure 10, A-H)”

      Comment 47: Supplementary Figure 9, A, B - Is this analysis using WT and Prdm1fl Ncr1cre dataset together? 

      Response 47: Yes, we used WT and Prdm1_Δ_Ncr1 data together. As the request above, we separate this analysis from WT or Prdm1_Δ_Ncr1 Ncr1 mice. These data have now been described in (page 22; line 445-460):

      “The reduction of interaction mostly occurred in the cross-talk of ILC1-MDM and ILC1-KC, whereas no difference was observed in cNK-MDM and cNK-KC interaction (Supplemental Figure 10, A-H). A reduction in the interaction of ligand-receptor, such as Mif-CD74, Cxcl16-Cxcr6, and Cxcl10-Cxcr3 was observed in Prdm1ΔNcr1 mice compared to Prdm1+/+ mice (Supplemental Figure 11). Compared to Prdm1+/+ mice, the information flow of CXCL and MIF pathways significantly decreased in Prdm1ΔNcr1 mice (Figure 6, H and I; Supplemental Figure 10, B, D, F, and H). These pathways play a crucial role in facilitating macrophage migration. The CXCL signaling was sent from Ly6c2_hi _Cxcl2_hi MDMs and _C1q_hi KC, targeting all ILC1 clusters and _Cxcr3_hi cNK cell clusters (Figure 6J). Of note, although the population of _Cxcl2_hi macrophage primarily comprised cells from _Prdm1ΔNcr1 mice, the interaction within the CXCL pathway between macrophages and group 1 ILCs was obviously less than Prdm1+/+ sample (Figure 6J). These changes could be linked to a decreased population of ILC1s and Cxcr3_hi cNK cell cluster in _Prdm1ΔNcr1 mice, implying that the homeostasis of _Cxcl2_hi macrophages required sufficient signals from cNK cells and ILC1s. The impaired CXCLCXCR interactions might subsequently lead to reduced recruitment and activation of group 1 ILCs and macrophages within the tumor microenvironment.”.

      Comment 48: Figure 7, A-C -What is the consequence/interpretation of reduced Mitotracker staining? Any metabolic assays performed? The definition of NK cell "exhaustion" is unclear, is reduced IFNg enough for that? Is the concept of NK cell exhaustion clearly established? Only shortly touched upon in the discussion, the rationale for suggesting an exhausted phenotype, should be explained.

      Response 48: MitoTracker was used to assess the mitochondrial mass. The reduced staining indicated compromised mitochondria function, which associated with mitochondrial fragmentation.

      We believe that the exhaustion of NK cells is not as well-established a concept as it is for T cells. The purpose of detecting mitochondria in this study is to provide evidence for the relationship between Prdm1 and the exhaustion of NK cells. In the discussion section, we have added the following content (page 26; line 529-543):

      “Mitochondria are pivotal organelles crucial for cellular metabolism. Disruptions in mitochondrial function have been linked to T Cell exhaustion, attributed to glycolytic reprogramming (66). Similarly, mitochondrial fragmentation has been closely associated with NK cell exhaustion (67).

      However, the concept of NK cell exhaustion isn't as firmly established as it is for T cells. Exhausted NK cells should primarily exhibit diminished functions. This is characterized by a diminished ability to destroy tumor cells, a reduced capability to activate other components of the immune system, and compromised proliferation and survival rates. Additionally, this reduced functionality is associated with a decline in the expression of molecules responsible for cytotoxic activity, lower production of IFN-γ, and metabolic disturbances that may arise from mitochondrial dysfunction. While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junb_hi cluster, demonstrates an exhaustion-like phenotype. The significant increase in this cell population following _Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 49: Figure 7, x-axis labelling (MFI) of histograms is not correct. Do bar graphs and FACS plots show the same data? Does the number in the FACS plots indicate the MFI? If so, the FACS plots do not show representative samples?

      Response 48: We appreciate the valuable comments provided by the reviewer. In the revised Figure 7, the MFI values have been removed. Bar graphs now display summary data from FACS histograms.

      A representative sample close to the group's mean value was chosen for display in the histograms.

      Comment 50: Figure 7, D - How are these data different from Figure 2H? Why is it now called "exhaustion", but not in 2H? Is the detected IFNg only driven by ex vivo stimulation with Il12/Il18? As above, a "standard" 4h assay should also be provided to allow better interpretation of potential differences. In the introduction, the authors cite the Ducimetiere study (Ref 5) highlighting "the primary function of ILC1 in suppressing the seeding of metastatic tumor cells in liver tissue". Thus, it would be interesting to test Ifng production by liver ILC1 and NK cells ex vivo at early time points of tumor inoculation.

      Response 50: Tumors grow and proliferate within tissues, constituting one of the major causes of lymphocyte exhaustion. This part of the current study aims to investigate whether Prdm1 aids NK cells or ILC1 in resisting the exhaustion induced by malignant tumors. Specifically, we seek to ascertain whether the absence of Prdm1 renders NK cells or ILC1 more susceptible to exhaustion within the tumor microenvironment. Therefore, we will consider the capacity to secrete IFN-γ upon IL-12/IL-18 stimulation as one indicative aspect of exhaustion. It's crucial to emphasize that this assessment serves as only one piece of evidence, not the sole determinant. Overnight stimulation is a conventional method for studying NK cells and has been widely used across different laboratories, including our lab (e.g. Bream et al., Blood, 2003; Yu et al., Immunity, 2006; Wang et al., J Clin Invest, 2018). It's essential to clarify that our approach does not involve stimulating with tumor cells to evaluate the secretion capacity of IFN-γ by NK cells or ILC1.

      Reviewer 2 (Public Review):

      Summary:

      This study offers a significant advancement in understanding liver innate lymphoid cell (ILC) biology by elucidating the role of the transcription factor Prdm1. It shows that Prdm1 is crucial in maintaining the balance between conventional natural killer (cNK) cells and ILC1s in the liver, with knockout models revealing a vital role in cancer defense mechanisms. Despite not affecting direct cytotoxicity, Prdm1 deficiency leads to increased cancer metastasis and reduced secretion of key molecules like IFN-γ, pointing to its importance in immune regulation. The use of single-cell RNA sequencing further underscores Prdm1's role in cellular communication within the liver's immune milieu. This study is a robust contribution to the field, providing insights that could inform new immunotherapy approaches for liver cancer.

      Strengths:

      The study's strength lies in its comprehensive approach, combining the specificity of Prdm1 conditional deletion in Ncr1-cre mice with integrative omics analyses and cutting-edge cytometry to delineate Prdm1's role in liver Type 1 ILC biology and its functional implications in tumor immunity. This multifaceted strategy not only clarifies Prdm1's influence on ILC composition and maturation but also conveys potential therapeutic insights for liver cancer immunotherapy.

      We sincerely appreciate your interest and critical assessment of our manuscript. We have carefully read your comments and suggestions, and I am truly grateful for your expert guidance. We have worked on addressing each of your concerns and comments, and below we provide a point-to-point response. Please find the detailed responses below:

      Weakness

      Comment 1: A notable weakness of the study is the limited scope of in vivo disease models, primarily relying on the B16F10 melanoma model, which may not fully capture the complex behavior of Type 1 ILCs across diverse cancer types. Furthermore, the absence of direct human data, such as the effects of PRDM1 deletion in human NK cells or stem cells during their differentiation into NK and ILC1, leaves a gap in translating these findings to clinical settings.

      Response 1: We appreciate the reviewer for raising these important points, which we see as a unique opportunity for future work to transform our understanding of Prdm1 and its targets as opposed to a weakness of the present study. 

      In our revised manuscript, we have discussed these limitations of our study (page 29; line 602-609):

      “While our findings underscore the importance of Prdm1 in liver cNK cells and ILC1s tumor immune surveillance, it does not be validated in human NK cells, whereas previous studies have found that PRDM1 might inhibit the proliferation and function of human NK cells (33, 73). Furthermore, we not provided an in-depth evaluation in multiple tumor models. Further research may provide deeper insight into the role of PRDM1 in the anti-tumor function of human NK cells, enabling a more direct investigation of its application in cancer therapies. Given its important role in preserving liver cNK cells and ILC1s functional heterogeneity, enhancing Prdm1 function in human NK cells could potentially be a strategy to promote NK cell-based immunotherapy for cancer.”.

      Recommendations For The Authors:

      (Introduction) 

      Comment 2: Reference 1 appears slightly misplaced. You might find the nomenclature discussion in Spits et al., Nature Reviews Immunology, 2013, more appropriate.

      Response 2: We are really sorry for our inaccurate descriptions. According to Spits et al., (Spits et al., Nature Reviews Immunology, 2013) and other related studies, we have now adopted a more appropriate nomenclature as “Conventional NK cells” correspond to “cNK cells”, “Type 1 innate lymphoid cells” to “ILC1s”, and “Group 1 ILC” as the collective name of cNK and ILC1s. 

      The definition of these cells was described in the introduction (page 4, line 52-53; line58-62): 

      “Group 1 ILCs consist of cNK cells and ILC1s (1, 2), with distinct developmental trajectories and effect molecules (3).”, “In a state of homeostasis, liver group 1 ILCs (CD45+CD3-NK1.1+NKp46+) can be discriminated into cNK cells and ILC1s by the differential expression of CD49a and CD49b (2): cNK cells are marked by the expression of CD49b, while liver ILC1s exhibit a distinctive positivity for CD49a. Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) is also expressed on liver ILC1s, but not on cNK cells (10, 11).”. 

      We also describe cNK and ILC1 phenotypes in our scRNA-seq data, as shown in page 13; line 259-261: 

      “cNK cells expressed high levels of Itga2 (CD49b) and Eomes, while ILC1s had high levels expression of Itga1 (CD49a) and Tnfsf10 (Supplemental Figure 5, F and G).”.

      Comment 3: It has come to my attention that Reference 9 has been retracted. I recommend removing this citation to maintain the integrity of your references (https://doi.org/10.1182/blood.2023022801).

      Response 3: We thank the reviewer’s comment and we now have removed this citation.

      Comment 4: For a more comprehensive context around reference 15, consider citing Thierry Walzer's work ([https://rupress.org/jem/article/211/3/563/41636/T-bet-and-Eomes-instruct-thedevelopment-of-two)]) which aligns closely with your discussion.

      Response 4: We agree with the reviewer’s suggestion and have added this citation in our introduction (page 4; line 64-66):

      “Liver environment facilitated T-bet expression in the early stage of NK cells development, which results in Eomes repression. The repression of T-bet is required for Eomes+ NK cells (17).”.

      (Results) 

      Comment 5: The NK cell signature referenced in 32 has been questioned for its reliability as discussed by Cursons et al., CRI 2019 (https://pubmed.ncbi.nlm.nih.gov/31088844/). Reanalysis of data in Figure 1 B/C and Supplementary Figure 1 with the refined NK cell signature from Curson's work would be advantageous.

      Response 5: We thank the reviewer’s comment. As requested, we reanalyzed our data using the refined NK cell signature from Cursons et al. (revised Figure 1 A-C; revised Supplemental Figure 1). Of note, the overall survival of liver cancer (LIHC) patients only reached statistics significance when compared high and low expression of refined PRDM1-NK signature with a median cutoff (Figure 1, A-C). The overall survival performed with quartile high and low expression of refined PRDM1-NK signature was moved to supplemental figure 1, G-I. 

      The original text is: “Examination of 363 liver hepatocellular carcinoma (LIHC) patient samples from The Cancer Genome Atlas (TCGA) revealed a positive correlation between the expression of NK cell-associated genes (NCR1, NCR3, KLRB1, CD160, and PRF1) (32) and PRDM1 expression (Figure 1A). Patients with top and bottom quartiles of NK-PRDM1 signature expression were chosen for survival analysis (Figure 1B). Notably, patients with the NK-PRDM1_hi signature had better overall survival compared to the these with NK-_PRDM1_lo signature (Figure 1C). Similar results were also found in skin cutaneous melanoma (SKCM, n=454) and lung adenocarcinoma (LUAD, n=497) patients (Supplemental Figure 1, A-F). These data suggested that _PRDM1 in NK cells might be essential for immune surveillance in some solid tumors, including liver cancer. These findings prompted us to investigate the impact and mechanism of PRDM1 in NK cells and ILC1 within the context of liver cancer.”

      We have rewritten this part in our revised manuscript (page 7; line 119-132): 

      “Examination of 363 liver hepatocellular carcinoma (LIHC) patient samples from The Cancer Genome Atlas (TCGA) revealed a positive correlation between the expression of NK cell-associated genes (34) (NCR1, KLRB1, CD160, PRF1, etc.) and PRDM1 expression (Figure 1A). The patients are ordered from highest to lowest based on the expression of NK-Prdm1 for survival analysis (Figure 1B). Notably, patients exhibiting higher levels of NK-PRDM1 expression (above the median) experienced better survival outcomes compared to those with lower levels of NK-PRDM1 expression (below the median) (Figure 1C). Similar results were also found in skin cutaneous melanoma (SKCM, n=454) and lung adenocarcinoma (LUAD, n=497) patients (Supplemental Figure 1, A-F). Patients within the highest quartile of NK-PRDM1 signature expression demonstrated enhanced overall survival, a result that achieved statistical significance in LUAD and SKCM patients (Supplemental Figure 1, G-I). These data suggested that PRDM1 in NK cells might be essential for immune surveillance in solid tumors, including liver cancer, and prompted us to investigate the function and mechanism of PRDM1 in NK cells and ILC1 within the context of liver cancer.”.

      Comment 6: The origin of the Ncr1-cre mice utilised should be clarified; is this the line developed by Eric Vivier? (https://www.pnas.org/doi/10.1073/pnas.1112064108).

      Response 6: We did not use the line developed by Eric Vivier, our Ncr1-cre mice was purchase from Shanghai Model Organism Center, Inc.. We described this in our method parts (page 29-30; line 612-614): 

      Prdm1fl/fl mice were purchased from The Jackson Laboratory. Ncr1-iCre and B2m-/- mice were purchased from Shanghai Model Organisms Center, Inc.. Six- to twelve-week-old littermates were used for the experiment.”

      Comment 7: Considering the known reduction of Ncr1 expression in Ncr1-cre mice and its implications, it is recommended to repeat the B16F10 experiments with the correct control, Ncr1cre/+ Prdm1+/+.

      Response 7: This is an excellent question, and it has been raised by another reviewer and comprehensively answered (Reviewer 1, Comment 1). The answer is below: 

      The expression of Cre and the insertion of loxP sequences both have the potential to influence gene expression. This is because the region where loxP is inserted may contain regulatory sequences for the gene of interest. Ncr1-Cre is a frequently used transgenic mouse model in our laboratory. In our prior research, we also had concerns about the possible impact of Cre on NKp46 expression, which could lead to a decline in NK cell function. Therefore, in our previous studies focused on Smad4 expression in NK cells, we conducted similar experiments. In Figure 6 of our published paper in the Journal of Clinical Investigation (Wang et al., J Clin Invest, 2018), we compared NKp46iCreTgfbr2fl/flSmad4fl/WT with NKp46-iCreTgfbr2fl/flSmad4fl/fl. Although the primary purpose is to establish Smad4's independence from TGF-β, it also allows for a comparison between Smad4fl/fl and Smad4fl/WT in the presence of Cre. In the critical phenotype we assessed, NKp46iCreTgfbr2fl/flSmad4fl/fl (compared with NKp46-iCreTgfbr2fl/flSmad4fl/WT) exhibited the same phenotype as NKp46-iCreSmad4fl/fl (compared with NKp46WTSmad4fl/fl). This suggests that Cre's influence on NK cells may be within a reasonable and controllable range. Furthermore, in contrast to the decrease in Ncr1 expression caused by Cre, the reduction in the expression levels of genes targeted by Loxp knockout, such as Prdm1 in this study (Figure 1 E), is more significant. Therefore, with the current techniques and research methods, we believe that the data provided in this study can support the role of Prdm1 in NK cells.

      Comment 8: The proportion of ILC1 in wild-type mouse livers is notably higher than standard references. Could you confirm whether liver perfusion was performed before analysis? This procedure was not clearly detailed in the methods section.

      Response 8: We apologize that we did not provide enough detail regarding this point in our original method. We had performed the liver perfusion before analysis. This has now been clarified in the method section of the revised text (page 30-31; line 630-636): 

      “Mice were perfused with 1◊ PBS by portal vein puncture before harvesting tissues. Liver and lung was digested with 0.05% collagenase II for 30 minutes and filtered through 70 µm cell strainers, and mononuclear cells were isolated after subjected to density gradient using 30% and 70% percoll. Spleen were also removed and pressed through 70 µm filterers to obtain splenocytes. Peripheral blood mononuclear cells were obtained from peripheral blood after lysis of red blood cells (Biolegend, 420301). Flushing femurs and mechanical disruption of inguinal lymph nodes were performed to obtain cells from bone marrow and lymph nodes.”.

      The lymphocyte proportions in mice from different laboratories may exhibit slight variations, possibly due to genetic background disparities. To minimize the influence of genetic backgrounds, paired littermates were used in the current study, wherein one is Prdm1 WT and the other has the Prdm1 gene knocked out in NK cells.

      Comment 9: There appears to be inconsistency in reference formatting; for instance, Ref 39 does not match the formatting of other references. A thorough review of your citation format is suggested.

      Response 9: We apologize for the inadvertent errors and we reviewed the citation format.

      Comment 10: The information in Figures 2B and C may be better suited to the supplementary section as it does not significantly contribute to the main text.

      Response 10: We agree with the reviewer’s suggestion and these are now moved to supplementary figures (Supplemental Figure 2).

      Comment 11: The citation of reference 40 could be strengthened by including Sathe et al., 2014, which directly pertains to your findings (https://www.nature.com/articles/ncomms5539).

      Response 11: We added the suggested reference.

      Comment 12: Can the findings presented in Figure 2D/F be replicated using alternative models?

      This would substantiate the versatility of your results.

      Response 12: The current predominant in vivo tumor model for NK cells is primarily based on the use of B16F10 melanoma cells. These melanoma cells, with their low expression of MHC-I molecules, evade T cell-mediated immune surveillance, rendering them ideal targets for NK cells. Typically, this experimental melanoma metastasis assay involves tail vein injection, followed by nodules' detection in the lungs. To align with our investigation of liver-resident cNK and ILC1, we've introduced splenic injection (via the portal vein) and evaluated melanoma metastasis in the liver to reflect the anti-tumor capabilities of liver group 1 ILCs. We also explored subcutaneous tumor models, but we believe they may not effectively support Prdm1's role in cNK cells, particularly liver-resident NK cells and ILC1. While we've experimented with models using mouse liver tumor cells like Hepa 1-6, we found them less stable than B16F10 and less conducive to quantification. Should more suitable models or cells line emerge, we remain open to exploring them in future research.

      Comment 13: The absence of in vitro killing assessments against B16F10 and YAC-1 leaves a gap in the NK cell characterisation which would be valuable to address.

      Response 13: Isolating NK cells for ex vivo cytotoxicity assays typically requires stimulation with high concentrations of IL-2. Under such high IL-2 stimulation, many intracellular differences that contribute to difference in cytotoxicity, such as changes in transcription factors, are often masked. Another issue is that current ex vivo NK cell cytotoxicity assays often only isolate NK cells from the spleen. Liver-resident NK cells, on the other hand, are often limited in quantity and isolation methods, making it challenging to conduct ex vivo cytotoxicity assays effectively. If more sensitive detection methods become available, we will also incorporate ex vivo data into our future research endeavors.

      Comment 14: The suggestion that NK cells produce IL-6 is indeed a bold one, and without additional validation through intracellular cytokine detection or ELISA, it may be prudent to omit these claims.

      Response 14: We have checked the GSEA results, and found no valuable genes in IL-6 production.

      Therefore, we have removed this figure.

      Comment 15: The lack of fluorescence minus one (FMO) controls in Figure 3 and Supplementary

      Figure 4 is noted; including these would enhance the validity of your gating strategies.

      Response 15: As requested, we add the FMO controls in aforementioned figures.

      Comment 16: There seems to be a minor mix-up in referring to Figure 4A in the scRNAseq results section, perhaps it was intended to refer to Figure 3A?

      Response 16: We have corrected this part (line 247). We also double checked corrected the inaccuracies in the references to the figures. we apologize for the inadvertent errors.

      Comment 17: The rich datasets generated from bulk and scRNAseq are commendable. However, I urge you to make these datasets publicly accessible with a GEO accession number.

      Response 17: We appreciate the suggestion from the reviewer. We plan to upload our datasets when in the last version of our manuscript, which is also the request of the eLife policy.

      Comment 18: Figure 4K is insightful, yet a similar analysis of the ILC1 cluster could provide a more rounded understanding.

      Response 18: We thank the reviewer for the comments. We provide the similar analysis of ILC1s, as showing in revised Figure 5H. 

      Comment 19: The metabolic RNA signatures featured in Supplementary Figure 6 are intriguing and warrant further validation, perhaps through Seahorse analysis. Such validation could merit their inclusion in the main figures.

      Response 19: This is a very good suggestion. Currently, our data offer only limited indications in this context. We have chosen to validate some aspects of Prmd1's influence on cytotoxicity molecules. As for Prdm1's impact on other aspects of NK cells, such as metabolic functions, we may explore further in future research. Additionally, we hope that by publishing our research findings, laboratories worldwide can draw insights for their own studies and conduct relevant research based on this data.

      Comment 20: It is difficult to discern whether the cells depicted in Figure 7D are truly tumorinfiltrating ILC1 or NK cells that have adopted ILC1-like characteristics. Intravenous injection of CD45-PE could clarify this distinction, and if they are the latter, it may be more appropriate to refer to them as ILC1-like cells.

      Response 20: We completely agree with the reviewer's suggestion that "tumor-infiltrating lymphocytes" may not be accurate for the current experiment. Therefore, in the revised manuscript, we have changed it to "liver cNK or ILC1 from tumor-bearing livers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      All of the reviewers indicate that their major concerns have been adequately addressed, but they each have a few comments that the authors should consider before submitting a final version (without further review) for publication. For example, a statement about the sex of the mice used in the studies and whether any differences were noted if both sexes were used. The idea that the loss of glutamate transport might affect NA loading into vesicles is also worth considering. Finally, the authors might want to mention that the role of neuropeptide release from NA neurons needs further examination. 

      As noted in the prior submitted revision, all experiments contained both males and females and this was addressed in our re-submission. In our analysis of breathing and metabolism, sex was included in the analysis and no significant phenotypic difference was observed (The statement of no sex difference is in line 451-456). For the fate map and in situ experiments, although the group size is small, we did not see obvious differences in the expression patterns in the three glutamate transporters between females and males (line 347-350). All the anatomical and phenotypic data in this manuscript are presented as combined graphs (figure 1, figure 1 supplement 1, figure 2, figure 2 supplement 2, figure 4,5,6,7) and we had differentially labeled our data points by sex (female data is pink and male data is blue).

      The possibility that loss of Vglut2 might affect NA release has been added in the discussion (line 485-491) of the current revision. Dopamine Beta Hydroxylase (DBH) converts dopamine to noradrenaline in the vesicles, thus, glutamate may not directly affect noradrenaline loading into vesicles. However, since loss of Vglut2 reduced dopamine release in subsets of dopaminergic neurons, it remains possible that glutamate affects dopamine loading in NA neurons and in turn perturbs DA to NA conversion in the vesicle by DBH and subsequent noradrenaline release. Future work could examine this hypothesis using fast-scan cyclic voltammetry (FSCV) or microdialysis.

      The further examination of the role of neuropeptide release from NA neurons is mentioned in the discussion (line 491-494 and line 497-499 of the pre).

      eLife assessment

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments provide compelling evidence that conditional deletion of vesicular glutamate transporters from noradrenergic neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. This study provides an important contribution to our understanding of how noradrenergic neurons regulate respiratory homeostasis in conscious adult mice. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments show that conditional deletion of Vglut2 in NA neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. Their observations challenge the importance of glutamatergic signaling from Vglut2 expressing NA neurons in normal respiratory homeostasis in conscious adult mice. 

      Strengths:

      The comprehensive Vglut1, Vglut2, and Vglut3 co-expression profiles in the central noradrenergic system and the combined measurements of breathing and oxygen consumption are two major strengths of this study. Observations from these experiments provide previously undescribed insights into (1) expression patterns for subtypes of the vesicular glutamate transporter protein in the noradrenergic system and (2) the dispensable nature of Vglut2dependent glutamate signaling from noradrenergic neurons to breathing responses to physiologically relevant gas challenges in adult conscious mice. 

      Weaknesses:

      Although the cellular expression profiles for the vesicular glutamate transporters are provided, the study does not document that glutamatergic-based signaling originating from noradrenergic neurons is evident at the cellular level under normal, hypoxic, and/or hypercapnic conditions. The authors effectively recognize this issue and appropriately discuss their findings in this context. 

      We thank the reviewer for the positive evaluation of our work.

      Reviewer #2 (Public Review):

      The authors characterized the recombinase-based cumulative fate maps for vesicular glutamate transporters (Vglut1, Vglut2 and Vglut3) expression and compared those maps to their realtime expression profiles in central NA neurons by RNA in situ hybridization in adult mice. Authors have revealed a new and intriguing expression pattern for Vglut2, along with an entirely uncharted co-expression domain for Vglut3 within central noradrenergic neurons. Interestingly, and in contrast to previous studies, the authors demonstrated that glutamatergic signaling in central noradrenergic neurons does not exert any influence on breathing and metabolic control either under normoxic/normocapnic conditions or after chemoreflex stimulation. Also, they showed for the first-time the Vglut3-expressing NA population in C2/A2 nuclei. In addition, they were also able to demonstrate Vglut2 expression in anterior NA populations, such as LC neurons, by using more refined techniques, unlike previous studies. 

      A major strength of the study is the use of a set of techniques to investigate the participation of NA-based glutamatergic signaling in breathing and metabolic control. The authors provided a full characterization of the recombinase-based cumulative fate maps for Vglut transporters. They performed real-time mRNA expression of Vglut transporters in central NA neurons of adult mice. Further, they evaluated the effect of knocking down Vglut2 expression in NA neurons using a DBH-Cre; Vglut2cKO mice on breathing and control in unanesthetized mice. Finally, they injected the AAV virus containing Cre-dependent Td tomato into LC of v-Glut2 Cre mice to verify the VGlut2 expression in LC-NA neurons. A very positive aspect of the article is that the authors combined ventilation with metabolic measurements. This integration holds

      particular significance, especially when delving into the exploration of respiratory chemosensitivity. Furthermore, the sample size of the experiments is excellent.  Despite the clear strengths of the paper, some weaknesses exist. It is not clear in the manuscript if the experiments were performed in males and females and if the data were combined. I believe that the study would have benefited from a more comprehensive analysis exploring the sex specific differences. The reason I think this is particularly relevant is the developmental disorders mentioned by the authors, such as SIDS and Rett syndrome, which could potentially arise from disruptions in central noradrenergic (NA) function, exhibit varying degrees of sex predominance. Moreover, some of the noradrenergic cell groups are sexually dimorphic. For instance, female Wistar rats exhibit a larger LC size and more LC-NA neurons than male subjects (Pinos et al., 2001; Garcia-Falgueras et al., 2005). More recently, a detailed transcriptional profiling investigation has unveiled the identities of over 3,000 genes in the LC. This revelation has highlighted significant sexual dimorphisms, with more than 100 genes exhibiting differential expression within LC-NA neurons at the transcript level. Furthermore, this investigation has convincingly showcased that these distinct gene expression patterns have the capacity to elicit disparate behavioral responses between sexes (Mulvey et al., 2018).

      Therefore, the authors should compare the fate maps, Vglut transporters in males and females, at least considering LC-NA neurons. Even in the absence of identified sex differences, this information retains significant importance. 

      An important point well raised by the authors is that although suggestive, these experiments do not definitively rule out that NA-Vglut2 based glutamatergic signaling has a role in breathing control. Subsequent experiments will be necessary to validate this hypothesis. 

      An improvement could be made in terms of measuring body temperature. Opting for implanted sensors over rectal probes would circumvent the need to open the chamber, thereby preventing alterations in gas composition during respiratory measurements. Further, what happens to body temperature phenotype in these animals under different gas exposures? These data should be included in the Tables. 

      Is it plausible that another neurotransmitter within NA neurons might be released in higher amounts in DBH-Cre; Vglut2 cKO mice to compensate for the deficiency in glutamate and prevent changes in ventilation? 

      Continuing along the same line of inquiry is there a possibility that Vglut2 cKO from NA neurons not only eliminates glutamate release but also reduces NA release? A similar mechanism was previously found in VGLUT2 cKO from DA neurons in previous studies (Alsio et al., 2011; Fortin et al., 2012; Hnasko et al., 2010). Additionally, does glutamate play a role in the vesicular loading of NA? Therefore, could the lack of effect on breathing be explained by the lack of noradrenaline and not glutamate? 

      We thank the reviewer for the positive evaluation and further suggestions. Please see our response in “Author Response” to the previous version of Reviewer #2 (Public review).

      Reviewer #4 (Public Review): 

      Summary:

      Although previous research suggested that noradrenergic glutamatergic signaling could influence respiratory control, the work performed by Chang and colleagues reveals that excitatory (specifically Vglut2) neurons is dynamically and widely expressed throughout the central noradrenergic system, but it is not significantly crucial to change baseline breathing as well the hypercapnia and hypoxia ventilatory responses. The central point that will make a significant change in the field is how NA-glutamate transmission may influence breathing control and the dysfunction of NA neurons in respiratory disorders. 

      Strengths:

      There are several strengths such as the comprehensive analysis of Vglut1, Vglut2, and Vglut3 expression in the central noradrenergic system and the combined measurements of breathing parameters in conscious unrestrained mice. 

      Other considerations :

      These results strongly suggest that glutamate may not be necessary for modulating breathing under normal conditions or even when faced with high levels of carbon dioxide (hypercapnia) or low oxygen levels (hypoxia). This finding is unexpected, considering many studies have underscored glutamate's vital role in respiratory regulation, more so than catecholamines. This leads us to question the significance of catecholamines in controlling respiration. Moreover, if glutamate is not essential for this function, we need to explore its role in other physiological processes such as sympathetic nerve activity (SNA), thermoregulation, and sensory physiology. 

      We thank the reviewer for the positive evaluation and further suggestions. The potential role of noradrenergic-derived glutamate in other processes, which is beyond the scope of this study, should be addressed in the future.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      All of my concerns were effectively resolved, leading me to accept the paper. However, I suggest that the authors consider investing in a more reliable system for measuring body temperature, as accurate measurements of this parameter are crucial for whole body plethysmography. 

      Thank you for the suggestion. The real-time measurement of body temperature is a goal in future studies.

      Reviewer #4 (Recommendations For The Authors):

      Because I am revising a revised version, I believe the authors have addressed most, if not all, the concerns raised by already 3 reviewers. In my understanding the authors achieved their aims and the results are totally supported by the conclusions. The impact of this work on the respiratory field is significant and is likely to advance the field. The methods and data utilized, which combine standard techniques with genetic tools, will be highly beneficial to the research community. 

      In my understanding I still have one concern that if glutamate is not critical, then what is? Could we potentially disable the noradrenergic (NA) system while preserving glutamate functionality to determine if the NA system is indeed crucial for respiratory physiology? This approach might provide clearer insights into the mechanisms underlying respiratory control. 

      We agree that there remain several exciting questions about the respective roles of noradrenaline, glutamate, and other neuropeptides such as Neuropeptide Y (NPY) and galanin. We are currently devising strategies to address the respective and combinatorial roles for all these candidates in breathing control. Most simply, we can conditionally, mutagenized each of them in the central noradrenergic system in an acute manner using DBH-CreER mice to determine if any of them are critical to respiratory control with the advantage of minimizing developmental compensatory events.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors evaluated a novel eIF2B activator, DNL343, in two mouse models representing different forms of the integrated stress response (ISR). They first assessed the pharmacokinetics of DNL343, demonstrating its ability to cross the blood-brain barrier and exhibit good bioavailability. In an acute ISR model induced by optic nerve crush (ONC) injury, DNL343 treatment reduced ISR-induced transcriptional changes and neuronal loss, demonstrating neuroprotective effects. Next, the authors generated an eIF2B loss-of-function mice model by knocking in disease-causing Eif2b5 variants. The model presents a chronic ISR and mimics vanishing white matter disease (VWMD). DNL343 treatment from the pre-symptomatic stage improved body weight and motor functions corrected transcriptional changes, and reversed proteomic and metabolomic alterations in the brain and cerebrospinal fluid. DNL343 treatment initiated at an advanced disease stage also showed positive effects, restoring body weight gain, suppressing ISR, reducing neurodegeneration biomarkers, and extending lifespan. These findings highlight DNL343 as an effective ISR inhibitor with potential applications in treating VWMD and other neurodegenerative disorders involving ISR.

      Strengths:

      The study's findings regarding the novel compound DNL343 offer significant promise in addressing VWMD, a condition currently lacking disease-modifying treatment. DNL343 directly targets eIF2B, the disease-causing complex in VWMD, and demonstrates notable efficacy in reversing the integrated stress response (ISR) and mitigating neurodegeneration in a VWMD mouse model. These results raise hope for the potential application of DNL343 in VWMD treatment, a development eagerly anticipated by patients and the VWMD research community. Moreover, the study hints at the broader potential of DNL343 in treating other ISR-related neurodegenerative disorders, such as amyotrophic lateral sclerosis, a prospect that holds broader interest. Additionally, the study's identification of potential biomarkers for VWMD represents a notable strength, potentially leading to improved disease progression assessment pending further confirmation in future research.

      Weaknesses:

      There are a couple of notable concerns in this study. Firstly, while the in vivo evidence strongly supports the efficacy of DNL343 in mitigating ISR and neurodegeneration, there is a lack of direct biochemical evidence to confirm its activity in eIF2B activation. Secondly, the potential for cardiovascular toxicity, which has been reported for a related eIF2B activator in a canine model (as mentioned in the manuscript), has not been evaluated for DNL343 in this study. This data gap regarding toxicity could be crucial for informing the future development of DNL343 for potential human use. Further investigation into these areas would be valuable for a comprehensive understanding of the compound's mechanisms and safety profile.

      We thank the reviewer for the thoughtful feedback and an opportunity to provide further clarification. To address the first question regarding biochemical evidence of the mechanism of action of DNL343, we agree that additional data is helpful to interpreting the results presented in this manuscript. We now include a citation to Craig et al (Craig, R.A., 2nd, J. De Vicente, A.A. Estrada, J.A. Feng, K.W. Lexa, M.J. Canet, W.E. Dowdle, R.I. Erickson, B.N. Flores, P.C.G. Haddick, L.A. Kane, J.W. Lewcock, N.J. Moerke, S.B. Poda, Z. Sweeney, R.H. Takahashi, V. Tong, J. Wang, E. Yulyaningsih, H. Solanoy, K. Scearce-Levie, P.E. Sanchez, L. Tang, M. Xu, R. Zhang and M. Osipov (2024). "Discovery of DNL343: A Potent, Selective, and Brain-Penetrant eIF2B Activator Designed for the Treatment of Neurodegenerative Diseases." J Med Chem.) which includes the full details on the discovery and characterization of DNL343.

      On the question of cardiovascular toxicity observed with previous eIF2B activating compounds, Craig et al also provides evidence in a non-human primate (cynomolgus monkey) model that DNL343 dosing did not result in QT prolongation or any functional cardiac changes. We have also completed a Phase 1 (NCT04268784) and Phase 1B double-blind (NCT05006352) trials in healthy and ALS participants, respectively and these trials are referenced on page 4, lines 102-103. The safety profile observed in these clinical studies supported further development of DNL343 for ALS in the Healey Platform trial (NCT04297683, Regimen G).

      Reviewer #2 (Public Review):

      Summary:

      The authors developed DNL343, a CNS-penetrant small molecule integrated stress response (ISR) inhibitor, to treat neurodegenerative diseases caused by ISR.

      Strengths:

      DNL343 is an investigational CNS-penetrant small molecule integrated stress response (ISR) inhibitor designed to activate the eukaryotic initiation factor 2B (eIF2B) and suppress aberrant ISR activation. The therapeutic efficacy of DNL343 has been extensively characterized in two animal models. Importantly, plasma biomarkers of neuroinflammation and neurodegeneration can be reversed with DNL343 treatment. Remarkably, several of these biomarkers show differential levels in CSF and plasma from patients with vanishing white matter disease (VWMD) upon DNL343 treatment. Overall, this is a very exciting study to target ISR for therapeutic interventions.

      Weaknesses:

      My main questions center around the characterization of DNL343.

      (1) Is there any biochemical evidence showing DNL343 activates eIF2B, such as binding assays or in vitro biochemical activity assays? A conference presentation was cited - "Osipov, M. (2022). Discovery of DNL343: a Potent Selective and Brain-penetrant eIF2B Activator Designed for the Treatment of Neurodegenerative Diseases. Medicinal Chemistry Gordon Research Conference. New London, NH." However, there needs to be public information about this presentation.

      Information from this presentation and more details on the discovery and characterization of DNL343 can be found in Craig et al J Med Chem (2024) and this citation has been replaced.

      (2) How was the selectivity of DNL343 demonstrated? What are the off-targets of DNL343, in particular when DNL343 is administered at a high dose? Thermal-proteasome profiling or photoaffinity labeling experiments could be considered.

      Please see Craig et al J Med Chem (2024) for full details. In brief, there were no significant off target effects observed for DNL343 in a Cerep panel.

      (3) What are the total drug concentrations in the brain and plasma? What are the unbound ratios?

      Following a single oral dose of DNL343 in mice, unbound brain-to-unbound plasma exposures ratios (Kp,uu) of 0.8 to 1.1 were observed, indicating high CNS penetrance. This was further supported by CSF-to-unbound plasma exposures ratios at 0.9 in the same mouse study. The CNS penetrance was also confirmed in rats and NHP by CSF-to-unbound plasma ratios near unity as reported in Craig et al J Med Chem (2024).

      (4) If DNL343 is given intravenously, what are the concentrations in the brain and plasma after 5 minutes and 1 hour or longer time points? In other words, does DNL343 cross BBB through passive diffusion or an active process?

      Unbound brain-to-unbound plasma exposure ratios following a single oral dose in the mouse were 0.8 to 1.1 and showed no time dependence. These measurements were made prior to, near, and following plasma tmax of DNL343, indicating unbound DNL343 crosses the BBB through passive diffusion and rapidly reached equilibrium between the brain and systemic circulation. Details can be found in Craig et al J Med Chem (2024).

      (5) What is the complete PK profile of DNL343 for intravenous and oral dosing?

      DNL343 administered orally to mice as a suspension formulation showed plasma PK consistent with prolonged absorption with tmax ranging from 3 to 4 h, and a terminal elimination half-life (t1/2) of ~10 h. Details can be found in Craig et al J Med Chem (2024).

      (6) Are there any major drug metabolites that could be of concern?

      DNL343 metabolism is through Phase 1 biotransformation pathways. None of the in vivo circulating metabolites show potency towards eIF2B activation. Given that none of these metabolites are of concern, we believe this information is beyond the scope of the current manuscript.

      Reviewer #3 (Public Review):

      Summary:

      ISR contributes to the pathogenesis of multiple neurodegenerative diseases, such as ALS, FTD, VWMD, etc. Targeting ISR is a promising avenue for potential therapeutics. However, previously identified ways to target ISR present some challenges. PERK inhibitors suppress ISR by inhibiting eIF2alpha phosphorylation and cause pancreatic toxicity in mice. In order to bypass eIF2alpha, previous studies have identified ISR suppressors that target eIF2B, such as ISRIB and 2BAct. These molecules suppress neurodegeneration but do not cause detrimental effects in mouse models. However, ISRIB is water-insoluble, and 2BAct causes cardiovascular complications in dogs, preventing their use in clinics. Here, the authors showed that DNL343, a new ISR inhibitor targeting eIF2B, suppresses neurodegeneration in mouse models. Combined with their previous results of a clinical phase I trial showing the safety of DNL343, these findings suggest the promise of DNL343 as a potential drug for neurodegenerative diseases in which ISR contributes to pathogenesis.

      Strengths:

      The finding is important and has disease implications, and the conclusion is not surprising.

      Weaknesses:

      The experimental design and data are hard to comprehend for an audience with a basic research background. This reviewer suggests that the authors use the same way that previous studies on ISRIB and 2BAct (e.g., Wong et al; eLife, 2019) designed experiments and interpret data.

      We thank this reviewer for their feedback and recognition that DNL343 has a promising potential as treatment for neurodegenerative diseases. While our studies share some similarities to Wong et al., eLife (2019) and Abbink et al., ACTN (2019), our study design is intentionally distinct (e.g. inclusion of both prevention and treatment dosing paradigms, determining dose-response impact of drug treatment across biomarkers) which necessitates tailored data visualization to effectively communicate our findings. However, we understand the importance of clarity for a broader audience and to this end, we have made a number of changes to the data figures, in particular data from omics experiments in Figures 3 and 5. We also provided additional supplemental tables to aid data interpretation. This would hopefully cater to both audiences familiar with previous work and those with a less specialized background.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Demyelination is a significant pathological feature in the VWMD mouse model. The authors should clarify whether they observed similar demyelination in their study and if DNL343 had any impact on reversing this demyelination. These findings are crucial for assessing the compound's effectiveness in mitigating neurodegeneration.

      Demyelination is indeed an important feature in the eIF2B LOF (VWMD) mouse model. Given that this phenotype and the ability to rescue the histological phenotype with this MOA (Wong et al; eLife, 2019, cited in introduction) is very well characterized, along with our limitation from the size and number of mouse tissues, we prioritized non-histological targeted and unbiased analyses that were aimed at identifying translatable biomarkers. Nonetheless, the totality of our data, in different mouse models and cell types, strongly supports DNL343 as a potent ISR inhibitor that is effective in attenuating neurodegeneration:

      · In the optic nerve crush model, DNL343 dose-dependently reduced retinal cell degeneration

      · In the VWMD mouse model, DNL343 attenuated the increase in a plasma biomarker of neurodegeneration, neurofilament-light, which corresponded to normalization in motor function.

      · Metabolomic and lipidomic analyses in the VWMD mouse model brain showed increases in oxysterols, such as 7-ketocholesterol, and cholesterol esters and these lipids are associated with demyelination (Nugent et al, 2020). DNL343 treatment attenuated the levels of these oxysterols, indicating decreased demyelination.

      · When initiated at an advance disease stage, reversal of plasma biomarkers of neurodegeneration (Nf-L) and neuroinflammation (GFAP) by DNL343 in this model was accompanied by extension in the lifespan that is otherwise shortened as the mutant animals succumb to disease.

      These data highlight the potential therapeutic benefits of DNL343 in the broader context of ISR-mediated neurodegeneration which can include but may not be limited to VWMD.

      (2) Figure 6 presents several biomarkers with significantly increased levels in VWMD mice and patient biofluids. However, these biomarkers are not reflected in the brain proteomics data presented in Figure 3. The discrepancy between these findings should be addressed and discussed in the manuscript to provide a more comprehensive understanding.

      Proteins detected in Figure 6 were not detected by TMT proteomics in the CSF. In the brain, only GFAP was detected and the overall abundance in tissue were similar in both genetic groups. Cytokines such as TIMP1, MCP1 are usually present in low abundances and therefore are challenging to detect in broad discovery proteomics method applied in this study. Antibody-based immunoassays are better suited to specifically measure low abundant proteins than mass-spectrometry-based proteomics, while mass-spectrometry based methods offer wider dynamic range to detect more highly abundant proteins. Differences in detection sensitivity between immunoassay vs mass spectrometry assays has been previously noted (Petrera et al, J Proteome Res, 2021). We have added new text to address this point in the revised manuscript (page 7, line 274-277).

      (3) Figure 7 discusses the effects of DNL343 treatment initiated at an advanced disease stage. Since the 4-week treatment did not rescue performance in the balance beam test (as shown in Figure 6A), it is important to clarify if a 20-week treatment had any impact on this parameter.

      This reviewer raised an important question that we were unfortunately unable test. When the balance beam training was administered after 8 (out of 20) weeks of dosing, most animals of both wildtype and mutant genotypes struggled to remain on or maintain balance on the beam and were unable to progress traversing the beam, making the assay unsuccessful in this cohort. This impairment appeared to be driven by distinct factors in the two genotypes: age-associated obesity in wild-type animals and severe motor impairment in the eIF2B HOM mice, irrespective of treatment. While it is possible that other less demanding and more sensitive assays could reveal more nuanced differences, this, and our earlier data (Figure 4G-I), suggest that DNL343 could prevent but not reverse functional deterioration. This is in line with our understanding of DNL343 mechanism of action that does not include neuronal regeneration, a therapeutic effect that is likely required for functional recuperation. We have added this point to the manuscript (page 8, line 319-326).

      Additionally, considering the significant increase in Gdf15 levels in the disease model, it would be valuable to know if DNL343 treatment affected Gdf15 levels. If these assays were conducted, reporting the data would greatly assist in evaluating the compound's efficacy when administered at an advanced disease stage.

      We were not able to measure GDF15 levels in the 20-week study due to limitation in the in-life collected plasma samples which was dedicated to assessing biomarkers of neurodegeneration (Figure 7E-F). However, data from our 4-week treatment study, which was initiated at a similar age range to the 20-week treatment study (19-26 and 24-33 weeks of age, respectively), showed that DNL343 was able to reduce GDF15 levels in the brain (mRNA and protein) and CSF (protein) (Supplemental Figure 5A-C), suggesting that DNL343 reduces ISR activation at an advanced disease stage in the model. We expect that this reduction observed at 4 weeks of treatment would persist for the duration of the extended treatment in the 20-week cohort.

      (4) A minor point. In Figures 5A, 5C, and 5E, it appears that the red-colored group should likely be labeled as "HOM 0 mg/kg" instead of "HOM 3 mg/kg".

      This has been amended, thank you.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      (1) The cellular function of DNL343 needs to be clarified. The authors claim that it activates eIF2B, but no cellular or molecular evidence is provided. Does it bind to eIF2B? Does it not affect eIF2alpha phosphorylation? Does it restore translation upon stress that causes eIF2alpha phosphorylation? Does it suppress stress granule assembly? The authors cited Sun, Tsai et al. 2023 and Osipov et al., 2022. However, these citations are conference abstracts with no published figures available for review.

      We agree that additional data outlining the biochemical evidence of the mechanism of action of DNL343 was needed. We now include a citation to Craig et al J Med Chem (2024) that includes the full details on the discovery and molecular characterization of DNL343.

      (2) It needs to be clarified how the authors selected the ISR marker genes. ISR genes are more than those selected. How about others? How did the authors measure the mRNA levels, bulk RNA-seq or RT-PCR? If the former, have the authors verified their results using RT-PCR? Have the authors measured the protein levels for nerve crush experiments (by both proteomic and individual protein analyses)? Also, no statistical analyses were found for the heat maps.

      The ISR marker genes were selected by a combination of experimental and literature data. Transcriptomics analysis of the eIF2B HOM brains was conducted using untargeted RNAseq (Supplemental Figure 1B). Here, we found an enrichment of transcripts previously reported to be ISR dependent, namely Atf4, Chac1, Ddit3, Eif4ebp1, Ppp1r15a (Larhammar et al., 2017), Atf3, Asns, Mthfd2, Psat1, Sesn2, Slc1a5, Slc7a5, Slc7a11, Trib3 (Wong et al., 2019, Abbink et al., 2019).  These transcripts were assayed using targeted qPCR in the eIF2B HOM brains, spleen and PBMC (Supplemental Figure 1A, C, D) and in the retinas from the ONC experiments (Figure 2C). We have further clarified the analysis method for the gene expression data in the figure legends.

      We did not interrogate the proteome of the retina in the ONC model. Our study in this model was intended as a proof-of-concept evaluation of DNL343 effects in this acute ISR-dependent model of neurodegeneration. To this end, we performed gene expression (Figure 2C) and immunofluorescence analyses (Figure 2D-F). Each of these analyses were conducted using dedicated whole retinas; conducting additional protein analyses would necessitate a separate cohort of animals.

      We believe that heatmaps provide the best visualization of the data, particularly the dose dependent effects of DNL343 on multiple genes, but we understand the value for also providing statistical analyses. To address this, we provide additional Supplemental tables to show the outcome of statistical analyses undertaken. Statistical data relating to Figure 2C can be found on new Supplemental Tables 1 & 2; those relating to Supplemental Figures 1A, C, and D on new Supplemental Tables 3, 5, 6, respectively; that from Figure 4D on new Supplemental Table 8, and that from Figure 7D on new Supplemental Table 11.

      (3) Both the authors and Wong et al. (eLife, 2019) performed transcriptomic analyses on HOM mice. How do the authors compare the two data sets? Are they the same?

      In this work, transcriptomic approach was applied to confirm induction of ISR response in our in vivo model. While data are not identical, all of the top annotated genes shown in supplementary figure 1B were also deemed to be significant by Wong and coworkers (Bayes factor > 10). More importantly, as explained in our responses to question #2 from reviewer 3,  ISR genes highlighted in supplementary Figure 1B were also confirmed in two other studies (Larhammar et al., 2017, Abbink et al., 2019). These data support our interpretation that eIF2B HOM have elevated ISR relative to WT mice. We have added new text to line 164 on page 5 to clarify this point.

      (4) Can the authors interpret their omic data using volcano plots for HOM rescue experiments, as Wong et al. did in eLife 2019? Heat maps with statistical analyses are more straightforward to comprehend. Can the authors verify some of these data using RT-PCR, Western blot, etc.?

      We added additional pathway interpretation in our Figure 3 and 5 to highlight key biological processes altered in the brain and cellular compartment origin of CSF proteins changed in eIF2B HOM at baseline and following treatment with DNL343. Our treatment designed employed multiple dosing levels and as such, summarization by volcano plot would have resulted in creation of many figures that can be more easily captured by a single heat map plot. However, to provide additional quantitative information, we now added supplementary tables showing full statistical analysis for all heat maps for added clarity and transparency.

      We demonstrated 100% correlation between the select genes we examined by qPCR in supplemental Figure 1A and those identified from brain by RNA-seq. In addition, question of reliability of RNA-seq data has been previously been examined in great detail (Everaet et al, Sci Rep 2017) and found ~85% concordance between RNA-seq and qPCR data and those that were discordant tended to have < 2 log2FC and were present in low abundance. Given that top core ISR genes identified in our study have >2 log2FC and have been verified by other independent labs (Larhammar et al., 2017, Abbink et al., 2019, Wong et al., 2019). Based on these, we do not think that there is a rationale need for technical confirmation of RNAseq data.

      Risks for mis-annotation of proteins in TMT data were further mitigated by removing protein with coverage < 20% and having less than 8 unique peptides detected and setting protein annotation FDR to <1%.

      Additionally, TMT-labelling based proteomics offers wider dynamic range and sensitivity than western blotting. Validation of TMT logFC data with western blot technique, which is less quantitative and has lower dynamic ranges of detection may not be very informative. Furthermore, similar trends of changes in key ISR genes and proteins shown in figures 4D and 5A (e.g PSAT, SLC7A11, SLC7A5) provides additional support for the authenticity of proteins identified in this work.

      Also, for Figures 4E and F, it is assumed that each line represents an individual animal, but why their body weight gains are so different for the wild type? Can the authors plot the mean and s.e.m.? Also, there are no data about neurodegeneration. The authors need to show microscopy images, count the numbers, and assess the morphology of nerve cells.

      The large data spread in the body weight gain in our wild-type mice reflect the normal variability of this endpoint which can be influenced by sex and age. Indeed, both factors are present in our cohorts as animals of both sexes were included and there was a 7-week age-range (10-17 weeks of age at dosing start). Each line in Figures 4E-F indeed represents data sampled from individual animal over time. We chose to represent the data this way for transparency and have provided additional visualization (new Supplemental Figure 3) showing both body weight gain and plasma Nf-L levels as mean ± SEM as requested by this reviewer.

      In this study we chose to use a clinically-relevant biomarker of neurodegeneration, plasma neurofilament light chain (NfL) (Figure 4F). This allowed us to prioritize the tissue samples from these studies to execute comprehensive unbiased analyses for more complete characterization of the phenotype of these eIF2B LoF mice. NfL is a biomarker that has been recognized as a sensitive measurement of neuronal/axonal damage regardless of cause (Gaetani et al., 2018, Khalil et al., 2018). Elevated levels of plasma (and CSF) NfL levels has been demonstrated across neurodegenerative conditions such as Alzheimer’s disease (Giacomucci et al., 2022), multiple sclerosis (Ferreira-Atuesta et al., 2021), and in ALS (Huang et al., 2018).

      (5) How ISR is connected to metabolomic changes? Can the authors explain it?

      ISR caused significant increases in amino acid transporter and serine/glycine/1-carbon metabolism enzymes transcript and protein abundances that were highlighted in Figure 3A and C and lines 237-255 in the main text. Similar patterns were also observed in prior published studies (Larhammar et al., 2017, Abbink et al., 2019, Wong et al., 2019). Consistent with these changes we observed increased levels of Alanine (transported by SLC3A2, SLC7A11, SLC7A3) and decreased cystathionine levels (associated with increased expression of CTH).  ATF4 is one of the main orchestrator of ISR response to stress (e.g., amino acid deprivation) and it is required for expression of amino acid transporters and enzymes required for synthesis non-essential amino acids (PMID: 28494858). ATF4 increases cellular amino acid uptake and deliver AA needed for synthesis of proteins and glutathione needed for survival.

      We also observed prominent changes in CE in eIF2B HOM and its normalization with DNL343 treatment shown in Figure 5C. We checked for changes in expression levels of CEL, CES1, LCAT, LIPA, SOAT1, and NCEH1 proteins involved in CE metabolism and failed to detect any changes in protein or RNA abundances.  This  suggests that a rapid demyelination is a more likely trigger for CE accumulation as reported in FTD-GRN (Marian OC et al., 2023 acta neuropathol commun 11, 52), and in experimental demyelination models (Nugent AA et al., 2020 Neuron). We have added new text to the discussion section of the manuscript page 9, lines 408-411 to discuss how these results relate to each other.

      (6) It is hard to understand the biomarker part. The authors said "potential translational biomarkers are elevated..." Do the authors mean they are elevated so they can be potential biomarkers? If their levels are unchanged (e.g., TIMP-1), how can they be biomarkers? Also, this part needs a conclusion/summary. Also, what does "reversed biomarkers..." mean?

      We have modified the text to clarify and included a concluding sentence for this section of the results (page 7, lines 297-299). In assessing whether a given protein could be a potential translational biomarker for human disease we evaluated if the following two conditions were met: (1) Increased or decreased gene expression or protein levels of the biomarker in the brain or biofluids (CSF or plasma) of Eif2b5 R191H homozygote mice relative to wild-type controls that is modulated or normalized by administration of DNL343 and (2) protein levels in biofluids from VWMD patients that show differential levels than healthy controls in the same directionality as what is seen in the mouse model. GDF-15, GFAP, and NfL meet these criteria, but TIMP-1 and MCP-1 do not.

      Minor concerns:

      (1) Please explain which multiple comparison tests the authors used.

      This information has been further clarified in the figure legends.

      (2) Administrating the drug at an advanced stage led to a trend of NfL reduction but did not rescue function. Can the authors discuss what this means?

      Further elaboration and discussion about this finding have been added to the results section on page 8, line 319-325.

      (3) For statistical analyses on the bar graphs, it would be better if the authors labeled the comparison pairs on the graphs.

      We agree that labelling comparisons in bar graphs could aid the readership and have added this modification. Additionally, comparisons are indicated in the figure legend.

      (4) The authors need to state clearly that 2BAct's cardiovascular toxicity was observed in dogs, not mice. The current study does not exclude similar DNL343 toxicity. However, previous clinical trials suggest that DNL343 may be safe for humans.

      The suggestion to specify cardiovascular toxicity in dogs has been added (page 3, line 101), thank you. We now include a citation to Craig et al J Med Chem (2024) that provides evidence in a non-human primate (cynomolgus monkey) model that DNL343 dosing did not result in QT prolongation or any functional cardiac changes. We have also completed a Phase 1 (NCT04268784) and Phase 1B double-blind (NCT05006352) trials in healthy and ALS participants, respectively and now include reference to these trials on page 4, lines 102-104. The safety profile observed in these clinical studies supported further development of DNL343 for ALS in the Healey Platform trial (NCT04297683, Regimen G).

    1. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Last and colleagues describe Ais, an open-source software package for the semi-automated segmentation of cryo-electron tomography (cryo-ET) maps. Specifically, Ais provides a graphical user interface (GUI) for the manual segmentation and annotation of specific features of interest. These manual annotations are then used as input ground-truth data for training a convolutional neural network (CNN) model, which can then be used for automatic segmentation. Ais provides the option of several CNNs so that users can compare their performance on their structures of interest in order to determine the CNN that best suits their needs. Additionally, pre-trained models can be uploaded and shared to an online database.

      Algorithms are also provided to characterize "model interactions" which allows users to define heuristic rules on how the different segmentations interact. For instance, a membrane-adjacent protein can have rules where it must colocalize a certain distance away from a membrane segmentation. Such rules can help reduce false positives; as in the case above, false negatives predicted away from membranes are eliminated.

      The authors then show how Ais can be used for particle picking and subsequent subtomogram averaging and for the segmentation of cellular tomograms for visual analysis. For subtomogram averaging, they used a previously published dataset and compared the averages of their automated picking with the published manual picking. Analysis of cellular tomogram segmentation was primarily visual.

      Strengths:

      CNN-based segmentation of cryo-ET data is a rapidly developing area of research, as it promises substantially faster results than manual segmentation as well as the possibility for higher accuracy. However, this field is still very much in the development and the overall performance of these approaches, even across different algorithms, still leaves much to be desired. In this context, I think Ais is an interesting package, as it aims to provide both new and experienced users with streamlined approaches for manual annotation, access to a number of CNNs, and methods to refine the outputs of CNN models against each other. I think this can be quite useful for users, particularly as these methods develop.

      Weaknesses:

      Whilst overall I am enthusiastic about this manuscript, I still have a number of comments:

      On page 5, paragraph 1, there is a discussion on human judgement of these results. I think a more detailed discussion is required here, as from looking at the figures, I don't know that I agree with the authors' statement that Pix2pix is better. I acknowledge that this is extremely subjective, which is the problem. I think that a manual segmentation should also be shown in a figure so that the reader has a better way to gauge the performance of the automated segmentation.

      On page 7, the authors mention terms such as "emit" and "absorb" but never properly define them, such that I feel like I'm guessing at their meaning. Precise definitions of these terms should be provided.

      For Figure 3, it's unclear if the parent models shown (particularly the carbon model) are binary or not. The figure looks to be grey values, which would imply that it's the visualization of some prediction score. If so, how is this thresholded? This can also be made clearer in the text.

      Figure 3D was produced in ChimeraX using the hide dust function. I think some discussion on the nature of this "dust" is in order, e.g. how much is there and how large does it need to be to be considered dust? Given that these segmentations can be used for particle picking, this seems like it may be a major contributor to false positives.

      Page 9 contains the following sentence: "After selecting these values, we then launched a batch particle picking process to determine lists of particle coordinates based on the segmented volumes." Given how important this is, I feel like this requires significant description, e.g. how are densities thresholded, how are centers determined, and what if there are overlapping segmentations?

      The FSC shown in Figure S6 for the auto-picked maps is concerning. First, a horizontal line at FSC = 0 should be added. It seems that starting at a frequency of ~0.045, the FSC of the autopicked map increases above zero and stays there. Since this is not present in the FSC of the manually picked averages, this suggests the automatic approach is also finding some sort of consistent features. This needs to be discussed.

      Page 11 contains the statement "the segmented volumes found no immediately apparent false positive predictions of these pores". This is quite subjective and I don't know that I agree with this assessment. Unless the authors decide to quantify this through subtomogram classification, I don't think this statement is appropriate.

      In the methods, the authors note that particle picking is explained in detail in the online documentation. Given that this is a key feature of this software, such an explanation should be in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the spatial and temporal patterns of occurrence and the interspecific associations within a terrestrial mammalian community along human disturbance gradients. They conclude that human activity leads to a higher incidence of positive associations.

      Strengths:

      The theoretical framework of the study is brilliantly introduced. Solid data and sound methodology. This study is based on an extensive series of camera trap data. Good review of the literature on this topic.

      Weaknesses:

      The authors use the terms associations and interactions interchangeably.

      This is not the case. In fact, we state specifically that "... interspecific associations should not be directly interpreted as a signal of biotic interactions between pairs of species…" However, co-occurrence can be an important predictor of likely interactions, such as competition and predation. We stand by our original text.

      It is not clear what the authors mean by "associations". A brief clarification would be helpful.

      Our specific definition of what is meant here by spatial association can be found in the Methods section. To clarify, the calculation of the index of associations is based on the covariance for the two species of the residuals (epsilon) after consideration of all species-specific response to known environmental covariates. These covariances are modelled to allow them to vary with the level of human disturbance, measured as human presence and human modification. After normalization, the final index of association is a correlation value that varies between -1 (complete disassociation) and +1 (complete positive association).

      Also, the authors do not delve into the different types of association found in the study. A more ecological perspective explaining why certain species tend to exhibit negative associations and why others show the opposite pattern (and thus, can be used as indicator species) is missing.

      Suggesting the ecological underpinnings of the associations observed here would mainly be speculation at this point, but the associations demonstrated in this analysis do suggest promising areas for the more detailed research suggested.

      Also, the authors do not distinguish between significant (true) non-random associations and random associations. In my opinion, associations are those in which two species co-occur more or less than expected by chance. This is not well addressed in the present version of the manuscript.

      Results were considered to be non-random if correlation coefficients (for spatial association) or overlap (for temporal association) fell outside of 95% Confidence Intervals. This is now stated clearly in the Methods section.  In Figure 3—figure supplement 1-3 and Figure 4—figure supplement 1-3, p<0.01 levels are also presented.

      The obtained results support the conclusions of the study.

      Anthropogenic pressures can shape species associations by increasing spatial and temporal co-occurrence, but above a certain threshold, the positive influence of human activity in terms of species associations could be reverted. This study can stimulate further work in this direction.

      Reviewer #2 (Public Review):

      Summary:

      This study analyses camera trapping information on the occurrence of forest mammals along a gradient of human modification of the environment. The key hypotheses are that human disturbance squeezes wildlife into a smaller area or their activity into only part of the day, leading to increased co-occurrence under modification. The method used is joint species distribution modelling (JSDM).

      Strengths:

      The data source seems to be very nice, although since very little information is presented, this is hard to be sure of. Also, the JSDM approach is, in principle, a nice way of simultaneously analysing the data.

      Weaknesses:

      The manuscript suffers from a mismatch of hypotheses and methods at two different levels.

      (1) At the lower level, we first need to understand what the individual species do and "like" (their environmental niche). That information is not presented, and the methods suggest that the representation of each species in the JSDM is likely to be extremely poor.

      The response of each species to the environmental covariates provides a window into their environmental niche, encapsulated in the beta coefficients for each environmental covariate. This information is presented in Figure 2.

      (2) The hypothesis clearly asks for an analysis of the statistical interaction between human disturbance and co-occurrence. Yet, the model is not set up this way, and the authors thus do a lot of indirect exploration, rather than direct hypothesis testing.

      Our JSDM model is set up specifically to examine the effect of human disturbance on co-occurrence, after controlling for shared responses to environmental variables.  It directly tests the first hypothesis, since, if increase in indices of human disturbance had not tended to increase the measured spatial correlations between species as detected by the model, we would have rejected our stated hypothesis that human modification of habitats results in increased positive spatial associations between species.

      Even when the focus is not the individual species, but rather their association, we need to formulate what the expectation is. The hypotheses point towards presenting the spatial and the temporal niche, and how it changes, species for species, under human disturbance. To this, one can then add the layer of interspecific associations.

      Examining each species one by one and how each one responds to human disturbance would miss the effects of any meaningful interactions between species.  The analysis presented provides a means to highlight associations that would have been overlooked.  Future research could go on to analyze the strongest associations in the community and the strongest effects of human disturbance so as to uncover the underlying interactions that give rise to them and the mechanisms of human impact.  We believe that this will prove to be a much more productive approach than trying to tackle this problem species by species and pair by pair.

      The change in activity and space use can be analysed much simpler, by looking at the activity times and spatial distribution directly. It remains unclear what the contribution of the JSDM is, unless it is able to represent this activity and spatial information, and put it in a testable interaction with human disturbance.

      The topic is actually rather complicated. If biotic interactions change along the disturbance gradient, then observed data are already the outcome of such changed interactions. We thus cannot use the data to infer them! But we can show, for each species, that the habitat preferences change along the disturbance gradient - or not, as the case may be.

      Then, in the next step, one would have to formulate specific hypotheses about which species are likely to change their associations more, and which less (based e.g. on predator-prey or competitive interactions). The data and analyses presented do not answer any of these issues.

      We suggest that the so-called “simpler” approach described above is anything but simple, and this is precisely what the Joint Species Distribution Model improves upon.  As pointed out in the Introduction, simply examining spatial overlap is not enough to detect a signal of meaningful biotic interaction, since overlap could be the result of similar responses to environmental variables.  With the JSDM approach, this would not be considered a positive association and would then not imply the possible existence of meaningful interaction.

      Another more substantial point is that, according to my understanding of the methods, the per-species models are very inappropriate: the predictors are only linear, and there are no statistical interactions (L374). There is no conceivable species in the world whose niche would be described by such an oversimplified model.

      While interaction terms can be included in the JSDM, this would considerably increase the complexity of the models.  In previous work, we have found no strong evidence for the importance of interaction terms and they do not improve the performance of the models.

      We have no idea of even the most basic characteristics of the per-species models: prevalences, coefficient estimates, D2 of the model, and analysis of the temporal and spatial autocorrelation of the residuals, although they form the basis for the association analysis!

      The coefficient estimates for response to environmental variables used in the JSDM are provided in Figure 2 and Figure 2—source data 1.

      Why are times of day and day of the year not included as predictors IN INTERACTION with niche predictors and human disturbance, since they represent the temporal dimension on which niches are hypothesised to change?

      Also, all correlations among species should be shown for the raw data and for the model residuals: how much does that actually change and can thus be explained by the niche models?

      The discussion has little to add to the results. The complexity of the challenge (understanding a community-level response after accounting for species-level responses) is not met, and instead substantial room is given to general statements of how important this line of research is. I failed to see any advance in ecological understanding at the community level.

      We agree that the community-level response to human disturbance is a complex topic, and we believe it is also a very important one.  This research and its support of the spatial compression hypothesis, while not providing definitive answers to detailed mechanisms, opens up new lines of inquiry that makes it an important advance.  For example, the strong effects of human disturbance on certain associations that were detected here could now be examined with the kind of detailed species by species and pair by pair analysis that this reviewer appears to demand.

      Reviewer #1 (Recommendations For The Authors):

      L27 indicates instead of "idicates".

      We thank the reviewer for catching that error.

      L64 I would refer to potential interactions or just associations. It is always hard to provide evidence for the existence of true interactions.

      We have revised to “potential interactions” to qualify this statement.

      L69 Suggestion: distort instead of upset.

      We thank the reviewer for catching that error.

      L70-71 Here, authors use the term associations. Please, be consistent with the terminology throughout the manuscript.

      We thank the reviewer for raising this important point.  The term “co-occurrence” appears to be used inconsistently in the literature, so we have tried to refer to it only when referencing the work of us. For us, co-occurrence means “spatial overlap” without qualification as to whether it is caused by interaction or simply by similar responses to environmental factors (see Blanchet et al. 2020, Argument 1). In our view, interactions refer to biotic effects like predation, competition, commensalism, etc., while associations are the statistical footprint of these processes.   In keeping with this understanding, in Line 73, we changed "association" to the stronger word "interaction," but in Line 76, we keep the words "spatiotemporal association", which is presumed to be the result of those interactions. In Line 91, we have changed “interactions” to “associations,” as we do not believe interactions were demonstrated in that study. 

      L76 "Species associations are not necessarily fixed as positive or negative..." This sentence is misleading. I would say that species associations can vary across time and space, for instance along an environmental gradient.

      We thank the reviewer for pointing out the potential for confusion.  In Line 79, we have changed as suggested.

      L78 "Associations between free-ranging species are especially context-dependent" Loose sentence. Please, explain a bit further.

      We have changed the sentence to be more specific; ”Interactions are known to be context-dependent; for example, gradients in stress are associated with variation in the outcomes of pairwise species interactions.”

      L83-85 This would be a good place to introduce the 'stress gradient' hypothesis, which has also been applied to faunal communities in a few studies. According to this hypothesis, the incidence of positive associations should increase as environmental conditions harden.

      In our review of the literature, we find that the stress gradient hypothesis is somewhat controversial and does not receive strong support in vertebrates.  We have added the phrase “…the controversial stress-gradient hypothesis predicts that positive associations should increase as environmental conditions become more severe…”

      L86-88 Well, overall, the number of studies examining spatiotemporal associations in vertebrates is relatively small. That is, bird associations have not received much more attention than those of mammals. I find this introductory/appealing paragraph a bit rough. I think the authors can do better and find a better justification for their work.

      We thank the reviewer for the comments.  We have rewritten the paragraph extensively to make it clearer and to provide a stronger justification for the study.

      L106 "[...] resulting in increased positive spatial associations between species" I'd say that habitat shrinking would increase the level of species clustering or co-occurrence, but in my opinion, not necessarily the incidence of positive associations. It is not clear to me if the authors use positive associations as a term analogous to co-occurrence.

      We thank the reviewer for raising this very important distinction.  Habitat shrinking would increase levels of species co-occurrence, but this is not particularly interested.  We wanted to test whether there were effects on species interactions, as revealed by associations.  We find that the terms association and co-occurrence are used somewhat loosely in the literature and so have made some new effort to clarify and systematize this in the manuscript.  For example, there appear to be a differences in the way “co-occurrence” is used in Boron 2023 and in Blanchet 2020. We do not use the term "positive spatial association" as analogous to "spatial co-occurrence.". Spatial co-occurrence, which for us has the meaning of spatial overlap, could simply be the result of similar reactions to environmental co-variates, not reflecting any biotic interaction. Joint Species Distribution Models enable the partitioning of spatial overlap and segregation into that which can be explained by responses to known environmental factors, and that which cannot be explained and thus might be the result of biotic interactions.  It is only the latter that we are calling spatial association, which can be positive or negative.   These associations may be the statistical footprint of biotic interactions.

      Results:

      Difference between random and non-random association patterns. It is not clear to me if the reported associations are significant or not. The authors only report the sign of the association (either positive or negative) but do not clarify if these associations indicate that two species coexist more or less than expected by chance. In my opinion, that is the difference between true ecological associations (e.g., via facilitation or competition effects) and random co-existence patterns. This is paramount and should be addressed in a new version of the manuscript.

      This information is provided in Figure 3—figure supplement 1,2,3 and Figure 4—figure supplement 1,2,3.  This is referenced in the text as follows, “… correlation coefficients for 18 species pairs were positive and had a 95 % CI that did not overlap zero, and the number increased to 65 in moderate modifications but dropped to 29 at higher modifications" and so on. This criterion for significance (ie., greater than expected by chance) is now stated at the end of the Materials and methods.  In Figure 3—figure supplement 1,2,3 and Figure 4—figure supplement 1,2,3, those correlations that were significant at p<0.01 are also shown.

      I am also missing a more ecological explanation for the observed findings. For instance, the top-ranked species in terms of negative associations is the red fox, whereas the muntjac seems to be the species whose presence can be used as an indicator for that of other species. What are the mechanisms underlying these patterns? Do red foxes compete for food with other species? Do the species that show positive associations (red goral, muntjac) have traits or a diet that are more different from those of other species? More discussion on these aspects (role of traits and the trophic niche) would be necessary to better understand the obtained results.

      The purpose of this paper was to test the compression hypotheses, and we have tried to keep that as the focus.  However, the analysis does open up interesting lines of inquiry for future research to decipher the details of the interactions between species and the mechanisms by which human disturbance facilitates or disrupts these interactions. The reviewer raises some interesting possibilities, but at this point, any discussion along these lines would be largely speculation and could lengthen the paper without great benefit. 

      Reviewer #2 (Recommendations For The Authors):

      The manuscript should be accompanied by all data and code of analysis.

      All data and RScripts have been made available in Science Data Bank: https://doi.org/10.57760/sciencedb.11804.

      The sentence "not much is known" is weak: it suggests the authors did not bother to quantify what IS known, and simply waved any previous knowledge aside. Surely we have some ideas about who preys on whom, and which species have overlapping resource requirements (e.g., due to jaw width). For those, we would expect a particularly strong signal, if the association is indeed indicative of interactions.

      We believe that the reviewer is referring to the statement in Line 90-92 about the lack of understanding of the resilience of terrestrial mammal associations to human disturbance.  We have added a reference to one very recent publication that addresses the issue (Boron et al., 2023), but otherwise we stand by our statement. We have, however, added a qualifier to make it clear that we did indeed look for previous knowledge; "However, a review of the literature indicates that ...."

      Figures:

      Fig. 1. This reviewer considers that this is too trivial and should be deleted.

      This is a graphical statement of the hypotheses and may be helpful to some readers.

      Fig. 2. Using points with error bars hides any potential information.

      Done as suggested.

      That only 4 predictors are presented is unacceptably oversimplified.

      Only 4 predictors are included because, in previous work, we found that adding additional predictors or interactions did little to improve the model’s performance (Li et al. 2018, 2021 and 2022) and could lead to over-fitting.

      Fig. 5. and 6. aggregate extremely strongly over species; it remains unclear which species contribute to the signal, and I guess most do not.

      The number of detection events presented in Table 1 should help to clarify the relative contribution of each species to the data presented in Figures 5 and 6.

      This reviewer considers that the introduction 'oversells' the paper.

      L55: can you give any such "unique ecological information"

      L60: Lyons et al. (Kathleen is the first name) has been challenged by Telford et al. (2016 Nature) as methodologically flawed.

      The first name has been deleted.  The methodological flaw has to do with interpretation of the fossil record and choice of samples, not with the need to partition shared environmental preferences and interactions.

      L61 contradicts line 64: Blanchet et al. (2022, specifying some arguments from Dormann et al. 2018 GEB) correctly point out that logically one cannot infer the existence or strength from co-occurrence data. It is thus wrong to then claim (citing Boron et al.) that such data "convey key information about interactions". The latter statement is incorrect. A tree and a beetle can have extremely high association and nothing to do with each other. Association does not mean anything in itself. When two species are spatially and temporally non-overlapping, they can exhibit perfect "anti-association", yet, by the authors' own definition, cannot interact.

      We believe that the reviewer’s concerns arise from a misunderstanding of how we use the term association.  In our usage, an association is not the same as co-occurrence or overlap, which may simply be the result of shared responses to environmental variables.  The co-occurring tree and beetle would not be found to have any association in our analysis, only shared environmental sensitivities.  In contrast, associations can be the statistical footprint of interactions, and would be overlaid onto any overlap due to similar responses to the environment.  In the case of negative associations, such as might be the result of competitive exclusion or avoidance of predators, the two species would share environmental responses but show lower than expected spatial overlap.  Even though they might be only rarely found in the same vicinity, they would indeed be interacting when they were together.

      Joint Species Distribution Models "allow the partitioning of the observed correlation into that which can be explained by species responses to environmental factors... and that which remains unexplained after controlling for environmental effects and which may reflect biotic interactions." (Garcia Navas et al. 2021). It is the latter that we are calling “associations.”

      L63: Gilbert reference: Good to have a reference for this statement.

      This point is important, but the reviewer’s comments below have made it clear that it is even more important to point out that strong interactions should be expected to lead to significant associations.  We have added a statement to clarify this.

      L70-72: Incorrect, interactions play a role, not associations (which are merely statistical).

      In this, we agree, and we have revised the statement to refer to interactions, not associations. In our view, an interaction is a biological phenomenon, while an association is the resulting statistical signal that we can detect.

      L75: Associations tell us nothing, only interactions do. Since these can not be reliably inferred, this statement and this claim are wrong.

      We thank the reviewer for raising this point, but we beg to disagree. Strong interactions should be expected to lead to significant associations that can be detected in the data. Associations, which can be measured reliably, are the evidence of potential interactions, and hence associations can tell us a great deal.  We have added a note to this effect after the Gilbert reference above to clarify this point.

      However, we do accept that associations must be interpreted with caution. As Blanchet et al. 2020 explain, " …the co-occurrence signals (e.g. a significant positive or negative correlation value) estimated from these models could originate from any abiotic factors that impact species differently. Therefore, this correlation cannot be systematically interpreted as a signal of biotic interactions, as it could instead express potential non-measured environmental drivers (or combinations of them) that influence species distribution and co-distribution.”  Or alternatively an association could be the result of interaction with a 3rd species. 

      L87: Regarding your claim, how would you know you DO understand? For that, you need to formulate an expectation before looking at the data and then show you cannot show what you actually measure. (Jaynes called this the "mind-projection fallacy".)

      We are not sure if the reviewer is criticizing our paper or the entire field of community ecology.  Perhaps it is the statement that “….resilience of interspecific spatiotemporal associations of terrestrial mammals to human activity remains poorly understood….”  Since we are confident that the reviewer believes that mammals do interact, we guess that it is the term “association” that is questioned.  We have revised this to “…the impacts of human activity on interspecific interactions of terrestrial mammals remains poorly understood…” 

      In this particular case, we did formulate an expectation before looking at the data, in the form of the two formal hypotheses that are clearly stated in the Introduction and illustrated in Figure 1. If the hypotheses had not been supported, then we would have accepted that we do not understand. But as the data are consistent with the hypotheses, we submit that we do understand a bit more now.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Eaton et al. examine the regulation of transcription directionality using a powerful genomic approach (more about the methodology below). Their data challenge the notion that the polyadenylation signal-reading Cleavage and Polyadenylation (CPA) complex is responsible for controlling promoter directionality by terminating antisense transcription. Namely, depletion of the required CPA factor RBBP6 has little effect on antisense transcription measured by POINT. They find instead that initiation is intrinsically preferential in the sense direction and additionally maintained by the activities of an alternative processing complex called Integrator, together with the kinase CDK9. In the presence of CDK9 activity, depletion of Integrator endoribonuclease INTS11 leads to globally increased transcription in the antisense direction, and minor effects in the sense direction. However, CDK9 inhibition reveals that sense transcription is also sensitive to INS11 depletion. The authors suggest that CDK9 activity is stronger in the sense direction, preventing INTS11-mediated premature termination of sense transcrpts.

      Strengths:

      The combination of acute depletion of the studied factors using degron approaches (important to limit possible secondary effects), together with novel and very sensitive nascent transcriptomics methods POINT and sPOINT is very powerful. The applied spike-in normalization means the analysis is more rigorous than most. Using this methodology allowed the authors to revisit the interesting question of how promoter/transcription directionality is determined.

      The data quality appears very good and the fact that both global analysis as well as numerous gene-specific examples are shown makes it convincing.

      The manuscript is well written and hence a pleasure to read.

      We appreciate this positive assessment.

      Weaknesses:

      I am slightly worried about the reproducibility of the data - it is unclear to me from the manuscript if and which experiments were performed in replicate (lack of table with genomic experiments and GEO access, mentioned in more detail in below recommendations to authors), and the methods could be more detailed.

      All sequencing data was deposited with GEO. Multiple biological replicates were performed for each sequencing experiment.  Bigwig files are presented as a table in the GEO submissions. This data has now been made public.

      A separate discussion section would be useful, particularly since the data provided challenge some concepts in the field. How do the authors interpret U1 data from the Dreyfuss lab in light of their results? How about the known PAS-density directionality bias (more PAS present in antisense direction than in sense) - could the differential PAS density be still relevant to transcription directionality?

      As suggested, we have expanded our discussion to relate our findings to existing data. We think the results from the Dreyfuss lab are very important and highlight the role of U1 snRNA in enforcing transcriptional elongation.  It does this in part by shielding PAS sequences.  Recent work from our lab also shows that U1 snRNA opposes the Restrictor complex and PNUTS, which otherwise suppress transcription (Estell et al., Mol Cell 2023).  Most recently, the Adelman lab has demonstrated that U1 snRNA generally enhances transcription elongation (Mimoso and Adelman., Mol Cell 2023).  Our work does not challenge and is not inconsistent with these studies.

      The role of U1 in opposing PAS-dependent termination inspired the idea that antisense transcriptional termination may utilise PASs.  This was because such regions are rich in AAUAAA and comparatively poor in U1 binding sites. However, our RBBP6 depletion and POINT-seq data suggest that PAS-dependent termination is uncommon in the antisense direction. As such, other mechanisms suppress antisense transcription and influence promoter directionality. In our paper, we propose a major role for the Integrator complex.

      We do not completely rule out antisense PAS activity and discuss the prior work that identified polyadenylated antisense transcripts. Nevertheless, this was detected by oligo-dT primed RT-PCR/Northern blotting, which cannot determine the fraction of non-polyadenylated RNA that could result from PAS-independent termination (e.g. by Integrator).  To do that requires an analysis of total nascent transcription as achieved by our POINT-seq.  Based on these experiments, Integrator depletion has a greater impact on antisense transcription than RBBP6 depletion. 

      I find that the provided evidence for promoter directionality to be for the most part due to preferential initiation in the sense direction should be stressed more. This is in my eyes the strongest effect and is somehow brushed under the rug.

      We agree that this is an important finding and incorporated it into the title and abstract.  As the reviewer recommends, we now highlight it further in the new discussion.

      References 12-17 report an effect of Integrator on 5' of protein-coding genes, while data in Figure 2 appears contradictory. Then, experiments in Figure 4 show a global effect of INST11 depletion on promoter-proximal sense transcription. In my opinion, data from the 2.5h time-point of depletion should be shown alongside 1.5h in Figure 2 so that it is clear that the authors found an effect similar to the above references. I find the current presentation somehow misleading.

      We are grateful for this suggestion and present new analyses demonstrating that our experiment in Figure 2 concurs with previous findings (Supplemental Figures 2A and B). Our original heatmap (Figure 2E) shows a very strong and general antisense effect of INTS11 loss. On the same scale, the effects in the sense direction are not as apparent, which is also the case using metaplots.  New supplemental figure 2A now shows sense transcription from this experiment in isolation and on a lower scale, demonstrating that a subset of genes shows promoter-proximal increases in transcription following INTS11 depletion.  This is smaller and less general than the antisense effect but consistent with previous findings.  Indeed, our new analysis in supplemental figure 2B shows that affected protein-coding genes are lowly expressed, in line with Hu et al., Mol Cell 2023. This explains why a sense effect is not as apparent by metaplot, for which highly expressed genes contribute the most signal.

      As a result of our analyses, we are confident that the apparently larger effect at the 2.5hr timepoint (Figure 4) that we initially reported is due to experimental variability and not greater effects of extended INTS11 depletion. Overlaying the 1.5h and 2.5h datasets (Supplemental Figure 4B) revealed a similar number of affected protein-coding genes with a strong (83%) overlap between the affected genes.  To support this, we performed qPCR on four affected protein-coding transcripts which revealed no significant difference in the level of INTS11 effect after 2.5h vs 1.5h (Supplemental Figure 4C).

      We now present data for merged replicates in Figures 2 and 4 which reveal very similar average profiles for -INTS11 vs +INTS11 at both timepoints. Overall, we believe that we have resolved this discrepancy by showing that it amounts to experimental variability and because the most acutely affected protein-coding genes are lowly expressed. As detailed above, we show this in multiple ways (and validate by qPCR) We have revised the text accordingly and removed our original speculation that differences reflected the timeframe of INTS11 loss.

      Conclusion/assessment:

      This important work substantially advances our understanding of the mechanisms governing the directionality of human promoters. The evidence supporting the claims of the authors is compelling, with among others the use of advanced nascent transcriptomics including spike-in normalization controls and acute protein depletion using degron approaches.

      In my opinion, the authors' conclusions are in general well supported.

      Not only the manuscript but also the data generated will be useful to the wide community of researchers studying transcriptional regulation. Also, the POINT-derived novel sPOINT method described here is very valuable and can positively impact work in the field.

      We are grateful for the reviewers' positive assessment of our study.

      Reviewer #2 (Public Review):

      Summary:

      Eaton and colleagues use targeted protein degradation coupled with nascent transcription mapping to highlight a role for the integrator component INST11 in terminating antisense transcription. They find that upon inhibition of CDK9, INST11 can terminate both antisense and sense transcription - leading to a model whereby INST11 can terminate antisense transcription and the activity of CDK9 protects sense transcription from INST11-mediated termination. They further develop a new method called sPOINT which selectively amplifies nascent 5' capped RNAs and find that transcription initiation is more efficient in the sense direction than in the antisense direction. This is an excellent paper that uses elegant experimental design and innovative technologies to uncover a novel regulatory step in the control of transcriptional directionality.

      Strengths:

      One of the major strengths of this work is that the authors endogenously tag two of their proteins of interest - RBBP6 and INST11. This tag allows them to rapidly degrade these proteins - increasing the likelihood that any effects they see are primary effects of protein depletion rather than secondary effects. Another strength of this work is that the authors immunoprecipitate RNAPII and sequence extracted full-length RNA (POINT-seq) allowing them to map nascent transcription. A technical advance from this work is the development of sPOINT which allows the selective amplification of 5' capped RNAs < 150 nucleotides, allowing the direction of transcription initiation to be resolved.

      We appreciate this positive assessment.

      Weaknesses:

      While the authors provide strong evidence that INST11 and CDK9 play important roles in determining promoter directionality, their data suggests that when INST11 is degraded and CDK9 is inhibited there remains a bias in favour of sense transcription (Figures 4B and C). This suggests that there are other unknown factors that promote sense transcription over antisense transcription and future work could look to identify these.

      We agree that other (so far, unknown) factors promote sense transcription over antisense, which was demonstrated by our short POINT.  We have provided an expanded discussion on this in the revision. In our opinion, demonstrating that sense transcription is driven by preferential initiation in that direction is a key finding and we agree that the identification of the underlying mechanism constitutes an interesting avenue for future study.

      Reviewer #3 (Public Review):

      Summary:

      Using a protein degradation approach, Eaton et al show that INST11 can terminate the sense and anti-sense transcription but higher activity of CDK9 in the sense direction protects it from INS11-dependent termination. They developed sPOINT-seq that detects nascent 5'-capped RNA. The technique allowed them to reveal robust transcription initiation of sense-RNA as compared to anti-sense.

      Strengths:

      The strength of the paper is the acute degradation of proteins, eliminating the off-target effects. Further, the paper uses elegant approaches such as POINT and sPOINT-seq to measure nascent RNA and 5'-capped short RNA. Together, the combination of these three allowed the authors to make clean interpretations of data.

      We appreciate this positive assessment.

      Weaknesses:

      While the manuscript is well written, the details on the panel are not sufficient. The methods could be elaborated to aid understanding. Additional discussion on how the authors' findings contradict the existing model of anti-sense transcription termination should be added.

      We have added more detail to the figure panels, which we hope will help readers to navigate the paper more easily. Specifically, the assay employed for each experiment is indicated in each figure panel. As requested, we provide a new and separate discussion section in the revision.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on this important piece of work!

      Some specific suggestions.

      MAJOR

      -The data are not available (Accession "GSE243266" is currently private and is scheduled to be released on Sep 01, 2026.) This should be corrected and as a minimum, the raw sequencing files as well as the spike-in scaled bigwig files should be provided in GEO.

      We have made the data public. Raw and bigwig files are provided as part of the GEO upload.

      MINOR

      - It would be useful for readers if you could include catalog numbers of the reagents used in the study.

      We have included this information in our revision.

      - A table in experimental procedures summarizing the genomic experiments performed in this study as well as published ones reanalyzed here would be helpful.

      This is now provided as part of the resources table.

      - It would be easier for reviewers to evaluate the manuscript if the figure legends were included together with the figures on one page. This is now allowed by most journals.

      We have used this formatting in the revision.

      - Providing some captions for the results sections would be helpful.

      We have included subheadings as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Generally, I would suggest writing the experiment-type above panels where it is not immediately obvious what they are so a reader can appreciate the figures without referencing the legend. E.g. write POINT-seq on Figure 1B just to make it obvious to someone looking at the figures what methodology they are looking at. Likewise, you could write RNAPII ChIP-seq for Supplementary Figures 3D and 3E.

      We have carried out this recommendation.

      Can a y-axis be indicated on POINT-seq genome browser tracks? This could make them easier to interpret.

      Y-axis scales are provided as RPKM as stated in the figure legends.

      The authors could address/speculate in the text why there is less POINT-seq signal for the antisense transcript in the treatment condition in Figure 1B? Or could consider including a different example locus where this is not the case for clarity.

      Acute depletion of poly(A) factors (like RBBP6) results in a strong read-through beyond the poly(A) signal of protein-coding genes as Figure 1 shows.  However, it also causes a reduction in transcription levels, which can be seen in the figure and is correctly noted by the reviewer in this comment.  We see this with other poly(A) factor depletions (e.g. CPSF73 and CPSF30 – Eaton et al., 2020 and Estell et al., 2021) and other labs have observed this too (e.g for CPSF73-dTAG depletion (Cugusi et al., Mol Cell 2022)).  Plausible reasons include a limited pool of free RNAPII due to impaired transcriptional termination or limited nucleotide availability due to their incorporation within long read-through transcripts. For these reasons, we have retained the example in Figure 1B as a typical representation of the effect. Moreover, the heatmap in Figure 1D fairly represents the spectrum of effects following RBBP6 loss – highlighting the strong read-through beyond poly(A) signals and the marginal antisense effects.

      "The established effect of INTS11 at snRNAs was detected in our POINT-seq data and demonstrates the efficacy of this approach (Figure 2B)." The authors could explain this point more clearly in the text and describe the data - e.g. As expected, depletion of INTS11 leads to increased POINT-seq signal at the 3' end of snRNAs, consistent with defects in transcriptional termination. This is highlighted by the RNU5A-1 and RNU5B-1 loci (Figure 2B).

      We agree and have added more context to clarify this.

      I would suggest adjusting the scale of the heatmap in Figure 2E - I think it would be easier to interpret if the value of 0 was white - with >0 a gradient of orange and <0 a gradient of blue (as is done in Figure 1C). I think making this change would make the point as written in the text clearer i.e. "heatmap analysis demonstrates the dominant impact of INTS11 on antisense versus sense transcription at most promoters (Figure 2E)." I'm assuming most of the sense transcription would be white (more clearly unchanging) when the scale is adjusted.

      We agree and have done this. The reviewer is correct that most sense transcription is unchanged by INTS11 loss.  However, as we alluded to in the original submission, a subset of transcripts shows a promoter-proximal increase after INTS11 depletion. We have expanded the analyses of this effect (see responses to other comments) but stress that it is neither as general nor as large as the antisense effect.

      The authors make the point that there is mildly increased transcription over the 5' end of some genes upon INST11 depletion and show a track (Supplementary Fig 2A). It is not immediately obvious from the presentation of the meta-analysis in Figure 2D how generalisable this statement is. Perhaps the size of the panel or thickness of the lines in Figure 2D could be adjusted so that the peak of the control (in blue) could be seen. Perhaps an arrow indicating the peak could be added? I'm assuming the peak at the TSS is slightly lower in the control compared to INST11 depletion based on the authors' statement.

      We have provided multiple new analyses of this data to highlight where there are promoter-proximal effects of INTS11 loss in the sense direction.  Please see our response to the public review of reviewer 1 and new supplemental figures 2A, 2B, 4A and 4B which highlight the sense transcription increased in the absence of INTS11.

      The authors label Figure 4 "Promoters lose their directionality when CDK9 is inhibited" - but in INST11 depleted cells treated with CDK9i they find that there still is a bias towards sense transcription. Suggested edit "Some promoter directionality is lost when CDK9 is inhibited" or similar.

      We agree and have made this change.

      The authors conclude that INTS11-mediated effects are the result of perturbation of the catalytic activities of Integrator, the authors should perform rescue experiments with the catalytically dead E203Q-INTS11 mutant.

      This is a very good suggestion and something we had intended to pursue.  However, as we will describe below (and shown in Supplemental Figure 4G), there were confounding issues with this experiment.

      The E203Q mutant of INTS11 is widely used in the literature to test for catalytic functions of INTS11.  However, we have found that this mutation impairs the ability of INTS11 to bind other Integrator modules in cells. Based on co-immunoprecipitation of flag-tagged WT and E203Q derivatives, INTS1 (backbone module), 10 (tail module), and 8 (phosphatase module) all show reduced binding to E203Q vs. WT. Because E203Q INTS11 is defective in forming Integrator complexes, rescue experiments might not fully distinguish the effects of INTS11 activity from those caused by defects in complex assembly. While this may at first seem unexpected, in the analogous 3’ end processing complex, catalytic mutants of CPSF73 (which is highly related to INTS11) negatively affect its interaction with other complex members (Kolev and Steitz, EMBO Reports 2005).

      We hypothesise that INTS11 activity is most likely involved in attenuating promoter-proximal transcription, but we cannot formally rule out other explanations and discuss this in our revision. Regardless of how INTS11 attenuates transcription, our main conclusion is on its requirement to terminate antisense transcription whether this involves its cleavage activity or not.

      The authors suggest that CDK9 modulates INTS11 activity/assembly and suggest this may be related to SPT5. Is there an effect of CDK9 inhibition on the snRNA's highlighted in Figure 2B?

      We believe that snRNAs are different from protein-coding genes concerning CDK9 function. Shona Murphy’s lab previously showed that, unlike protein-coding genes, snRNA transcription is insensitive to CDK9 inhibition, and that snRNA processing is impaired by CDK9 inhibition (Medlin et al., EMBO 2003 and EMBO 2005).  We reproduce these findings by metaanalysis of 15 highly expressed and well-separated snRNAs and by qRT-PCR of unprocessed RNU1-1, RNU5A-1 and RNU7-1 snRNA following CDK9 inhibition. We observe snRNA read-through by POINT-seq following INTS11 loss whether CDK9 is inhibited or not (left panel, below). Note the higher TES proximal signal in CDK9i conditions, which likely reflects the accumulation of unprocessed snRNA as validated by qPCR for three example snRNAs (right panel, below).

      Author response image 1.

      For Figure 4, would similar results be observed using inhibitors targeting other transcriptional CDKs such as CDK7,12/13?

      In response to this suggestion, we analysed four selected protein-coding transcripts (the same 4 that we used to validate the CDK9i results) by qRT-PCR in a background of CDK7 inhibition using the THZ2 compound (new Supplemental Figure 4E).  THZ2 suppresses transcription from these genes as expected.  Interestingly, expression is restored by co-depleting Integrator, recapitulating our findings with CDK9 inhibition.  As CDK7 is the CDK-activating kinase for CDK9, its inhibition will also inhibit CDK9 so THZ2 may simply hit this pathway upstream of where CDK9 inhibitors.  Second, CDK7 may independently shield transcription from INTS11.  We allude to both interesting possibilities.

      What happens to the phosphorylation state of anti-sense engaged RNAPII when INTS11 is acutely depleted and/or CDK9 is inhibited? This could be measured by including Ser5 and Ser2 antibodies in the sPOINT-seq assay and complemented with Western Blot analysis.

      We have performed the western blot for Ser5 and Ser2 phosphorylation as suggested.  Both signals are mildly enhanced by INTS11 loss, which is consistent with generally increased transcription.  Ser2p is strongly reduced by CDK9 inhibition, which is consistent with the loss of nascent transcription in this condition.  Interestingly, both modifications are partly recovered when INTS11 is depleted in conjunction with CDK9 inhibition. This is consistent with the effects that we see on POINT-seq and shows that the recovered transcription is associated with some phosphorylation of RNAPII CTD.  This presumably reflects the action(s) of kinases that can act redundantly with CDK9.

      We have not performed POINT-seq with Ser5p and Ser2p antibodies under these various conditions.  Our rationale is that our existing data uses an antibody that captures all RNAPII (regardless of its phosphorylation status), which we feel most comprehensively assays transcription in either direction. Moreover, the lab of Fei Chen (Hu et al., Mol Cell 2023) recently published Ser5p and Ser2p ChIP-seq following INTS11 loss. By ChIP-seq, they observe a bigger increase in antisense RNAPII occupancy vs. sense providing independent and orthogonal support for our POINT-seq data.  Interestingly, this antisense increase is not paralleled by proportional increases in Ser5p or Ser2p signals.  This suggests that the unattenuated antisense transcription resulting from INTS11 loss does not have high Ser5p or Ser2p.  Since CDK7 and 9 are major Ser5 and 2 kinases, this supports our model that their activity is less prevalent for antisense transcription.  We now discuss these data in our revision.   

      The HIV reporter RNA experiments should be performed with the CDK9 inhibitor added to the experimental conditions. Presumably CDK9 inhibition would result in no upregulation of the reporter upon addition of TAT and/or dTAG. Perhaps the amount of TAT should be reduced to still have a dynamic window in which changes can be detected. It is possible that reporter activation is simply at a maximum. Can anti-sense transcription be measured from the reporter?

      We have performed the requested CDK9 inhibitor experiment to confirm that TAT-activated transcription from the HIV promoter is CDK9-dependent (new supplemental figure 4F).  Consistent with previous literature on HIV transcription, CDK9 inhibition attenuates TAT-activated transcription.  Importantly, and in line with our other experiments, depletion of INTS11 results in significant restoration of transcription from the HIV promoter when CDK9 is inhibited. Thus, TAT-activated transcription is CDK9-dependent and, as for endogenous genes, CDK9 prevents attenuation by INTS11.

      While TAT-activated transcription is high, we do not think that the plasmid is saturated. When considering this question, we revisited previous experiments using this system to study RNA processing (Dye et al., Mol Cell 1999, Cell 2001, Mol Cell 2006). In these cases, mutations in splice sites or polyadenylation sites have a strong effect on RNA processing and transcription around HIV reporter plasmids. Effects on transcription and RNA processing are; therefore, apparent in the appropriate context. In contrast, we find that the complete elimination of INTS11 has no impact on RNA output from the HIV reporter. Our original experiment assessing the impact of INTS11 loss in +TAT conditions used total RNA.  One possibility is that this allows non-nascent RNA to accumulate which might confound our interpretation of INTS11 effects on ongoing transcription.  However, the new experiment described in the paragraph above was performed on chromatin-associated (nascent) RNA to rule this out.  This again shows no impact of INTS11 loss on HIV promoter-derived transcription in the presence of TAT.

      To our knowledge, antisense transcription is not routinely assayed from plasmids. They generally employ very strong promoters (e.g. CMV, HIV) to drive sense transcription.  Crucially, their circular nature means that RNAPII going around the plasmid could interfere with antisense transcription coming the other way which does not happen in a linear genomic context. This is why we restricted our use of plasmids to looking at the effects of stimulated CDK9 recruitment (via TAT) on transcription rather than promoter directionality.   

      The authors should clearly state how many replicates were performed for the genomics experiments. Ideally, a signal should be quantified and compared statistically rather than relying on average profiles only.

      We have stated the replicate numbers for sequencing experiments in the relevant figure legends. All sequencing experiments were performed in at least two biological replicates, but often three. In addition, we validated their key conclusions by qPCR or with orthogonal sequencing approaches.

      Reviewer #3 (Recommendations For The Authors):

      The authors provide strong evidence in support of their claims.

      ChIP-seq of pol2S5 and S2 upon INST11 and CDK9 inhibition will strengthen the observation that transcription in the sense direction is more efficient.

      We view the analysis of total RNAPII as the most unbiased way of establishing how much RNAPII is going one way or the other. Importantly, ChIP-seq was very recently performed for Ser2p and Ser5p RNAPII derivatives in the lab of Fei Chen (Hu et al., Mol Cell 2023). Their data shows that loss of INTS11 increases the occupancy of total RNAPII in the antisense direction more than in the sense direction, which is consistent with our finding. Interestingly, the increased antisense RNAPII was not paralleled with an increase in Ser2p or Ser5p. This suggests that, following INTS11 loss, the unattenuated antisense transcription is not associated with full/normal Ser2p or Ser5p. These modifications are normally established by CDK7 and 9; therefore, this published ChIP-seq suggests that they are not fully active on antisense transcription when INTS11 is lost. This supports our overall model that CDK9 (and potentially CDK7 as suggested for a small number of genes in new Supplemental Figure 4E) is more active in the sense direction to prevent INTS11-dependent attenuation. We now discuss these data in our revision.

      In Supplementary Figure 2, the eRNA expression increases upon INST11 degradation, I wonder if the effects of this will be appreciated on cognate promoters? Can the authors test some enhancer:promoter pairs?

      We noticed that some genes (e.g. MYC) that are regulated by enhancers show reduced transcription in the absence of INTS11. Whilst this could suggest a correlation, the transcription of other genes (e.g. ACTB and GAPDH) is also reduced by INTS11 loss although they are not regulated by enhancers.  A detailed and extensive analysis would be required to establish any link between INTS11-regulated enhancer transcription and the transcription of genes from their cognate promoters.  We agree that this would be interesting, but it seems beyond the scope of our short report on promoter directionality.

      Line 111, meta plot was done of 1316 genes. Details on this number should be provided. Overall, the details of methods and analysis need improvement. The layout of panels and labelling on graphs can be improved.

      We have now explained the 1316 gene set.  In essence, these are the genes separated from an expressed neighbour by at least 10kb.  This distance was selected because depletion of RBBP6 induces extensive read-through transcription beyond the polyadenylation site of protein-coding genes.  To avoid including genes affected by transcriptional read-through from nearby transcription units we selected those with a 10kb gap between them. This was the only selection criteria so is unlikely to induce any unintended biases. Finally, we have added more information to the figure panels and their legends, which we hope will make our manuscript more accessible.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank all reviewers for their detailed and constructive feedback, which substantially helped improve the manuscript. We apologise for the time taken for the revisions, which was partially due to the first author (successfully) writing and defending her PhD thesis in the same time frame. We would like to point out already here that, based on reviewers' feedback, main figure 6 is completely redone and the conclusions of this figure have changed substantially. We no longer suggest RNA chaperoning activity (it was identified as being due to the high concentration of TEV protease, in a control suggested by the reviewers). Instead, our refined assay conditions with lower TEV protease concentration identified ribonuclease activity of membrane-bound full-length 2C, which is consistent with a publication from 2022 (PMID: 35947700).


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Evidence, reproducibility, and clarity

      Summary:

      In this study by Shankar and colleagues, the authors aim to understand the structure and function of the enterovirus 2C protein, a putative viral helicase with AAA+ ATPase activity. Using poliovirus (as a model enterovirus) 2C, the author's propose the protein contains two amphipathic helices (AH1 and AH2) at the N-terminus that are divided by a conserved glycine. Using purified MBP-tagged 2C and N-terminal 2C truncations, their data suggests AH1 is primarily responsible for clustering at membranes, whilst AH2 is the main mediator of 2C oligmerisation and membrane binding. Furthermore, 2C was suggested to be able to recruit RNA to membranes, with a preference for dsRNA, and the author's data implies that the helicase activity of 2C is ATP-independent. Instead, the ATP activity appears to be required for 2C hexamer formation or chaperone activity. The manuscript is generally well written /presented and the author's present very interesting data which raises several questions, some of which require additional experimentation to help support the author's conclusions. Specific comments are as follows.

      We thanks the reviewer for the overall positive assessment, as well as the specific comments below.

      Major Comments:

      1. The authors use four main constructs throughout the paper: full-length 2C, 2C with deletion of AH1 (ΔAH1), 2C with both AH1 and AH2 deleted (ΔMBP) and 2C with an extended N-terminal deletion. From this, the author's draw conclusions on the function of both AH1 and AH2. One of the author's main conclusions is that AH2 is the main mediator of 2C membrane association (e.g., in line 169). However, is it possible to conclude the relative importance of AH1 vs AH2 without testing a construct containing the deletion of AH2 only (ΔAH2)? This should be generated and used alongside this data to fully define the relative importance of AH1 and AH2 in these assay and remove the possibility that the deletion of AH1 changes the structure and/or function of AH2, which could also result in the observed differences.

      This was a very good suggestion. We expressed and purified the ΔAH2 protein requested by the reviewer and characterized its oligomeric state as well as its membrane binding. It turns out, as suspected, that the ΔAH2 protein behaves very similarly to the ΔMBD protein (i.e. it does not form higher order oligomers and does not bind membranes). The changes in the manuscript due to this addition are many but can primarily be found in main figures 2-3 and their associated supplementary figures.

      Previous structural predictions of 2C do not appear to have two separate AHs at the N-terminus. Are the AH1 and AH2 structures predicted to be formed in the context of the entire 2C protein, 2BC precursors and polyprotein? Are there structural approaches that could provide experimental evidence for two separate AH at the N-terminus?

      This is a good point. Previous predictions were not that detailed, partially since they were done in the pre-alphafold era. Unfortunately, we cannot think of a tractable experimental method that could verify the split nature of the amphipathic helix in the only context that would matter: the protein bound to a membrane. A long-term goal would be in situ structures of full-length 2C on membranes using cryo-electron tomography, but our current sample and data sets are not sufficient for this. We added a mention of the long-term need for experimental structures of full-length 2C on lines 315-318 in the discussion.

      Why are the 2C dimers (lines 137-138) not apparent on the mass photometry data presented (figure 2)?

      Different constructs were measured by mas photometry and SEC-MALS. Also, the required concentration is 100-1000x lower for mass photometry which will affect a dynamic equilibrium in case the same construct were measured by the two methods.

      It appeared that binding of ΔMBD-2C was better when POPS is in the membrane (line 174). What is the explanation for this and was this finding significant?

      Well spotted. It may mean that 2C has a second, lower affinity membrane-binding site which is charge-dependent somewhere outside the MBD. We now added a mention of this in the discussion, lines 321-323.

      From the author's data on lipid drop clustering they conclude ΔAH1 is more effective for clustering, however, the ΔAH1 construct produces pentamers not hexamers (from Figure 2). Is formation of hexamers related to or required for membrane clustering?

      ΔAH1 is LESS effective at clustering, not more. As for the mention of pentamers in the original submission: we now think this was an unfortunate choice of words. The mass photometry data for 2C(ΔAH1) could more parsimoniously be interpreted as a mix of hexamers and other (unknown to us) smaller oligomers such as trimers. We have removed all mentions of pentamers.

      The replicon data presented in Figure 7 should include a replication-defective control (e.g., polymerase mutant), in order to compare how defective in replication ΔAH1 and ΔMBP deletions are compared to a fully-defective construct. Likewise, deletion of ΔAH1 in this construct is likely to affect processing of the viral polyprotein where several previous studies with picornaviruses have demonstrated that the residues in the P2'-P4' positions can change cleavage efficiency (e.g., PMID: 2542331), or the structure of 2C, leading to the reduction of replication.

      Thanks for these good comments. We made the polymerase-dead (GDD-to-GAA) replicon and remeasured it side by side with the 2C replicons. It has a similar luciferase activity indicating that no replication takes place in the 2C deletion replicons. This is shown in the new figure 7. As for the possibility or processing defects, we mentioned this in the original discussion and have now cited the reference suggested by the reviewer in this context (line 324).

      How does the author's model of ATPase-independent helicase activity and an APT-dependent required RNA chaperone activity fit with 2 step model for RNA binding and ATPase activity suggested by Yeager et al (PMID: 36399514)?

      Acting upon comments from other reviewers, we completely redid the "helicase assay" in the revised manuscript. It turns out that the ATP-independent unwinding activity in the original submission was an artefact of the assay conditions (specifically, of the TEV protease at the higher concentration we used in the old assay). In our improved assay we neither see helicase activity nor ATP-independent RNA chaperoning activity.

      Optional major comments that would increase the significance of the work:

      All of the optional comments below are exceptionally interesting. But given the long time needed for the several major changes to this manuscript (e.g. the ΔAH2 protein characterization and reoptimisation of the helicase assay) we believe it is more sensible to address them in future studies, for which the 2C reconstitution system can be used.

      The preference for dsRNA over ssRNA appears to be quite small (Figure 5d). In the context of a viral infection where ssRNA is likely to outnumber dsRNA at different times during infection is this preference physiologically relevant? In relation to this, what size stretch of dsRNA is required for preference, and could this correspond to cis-acting RNA structural elements, dsRNA as it escapes 3D polymerase or as part of the RF and RI forms (PMID: 9343205)? What is the proposed mechanism of how dsRNA outcompetes membrane tethering of 2C? OPTIONAL The author's study has been conducted in the absence of other viral non-structural proteins. What is the physiological importance of the observations, such as membrane interaction/clustering or RNA binding when presented in the context of the other replication machinery. OPTIONAL Do 2C monomers, dimers and hexamers have different functions in viral replication perhaps at different stages of replication and which of these forms are relevant during viral infection or can they all be detected during infection? Can any suggested separate functional arrangements be separated by genetic complementation experiments? OPTIONAL

      Minor comments:

      1. The author's appear to interchange between naming/nomenclature of the constructs which makes it confusing to follow (for example, ΔMBD is the same as 2C(41-329) likewise, 2C(Δ115) is sometimes called 2C(116-329)). It would be much easier to follow if the naming of constructs was consistent throughout (unless I am misunderstanding some subtlety in the difference between such constructs).

      Thanks very much for spotting this. We have fixed it.

      The author's suggest a pentamer arrangement for the ΔAH1 construct, however in the mass photometry data (figure 2D), a hexamer is indicated with the arrow. It would be helpful to change the label to indicate the size of the pentamer where this is being generated, not the hexamer.

      As mentioned above, we think the "pentamer" designation of the original manuscript was unfortunate. It is more parsimonious to interpret this as a mix of states, hexamer and undefined snaller.

      In most figures, data for full-length 2C, ΔAH1 and ΔMBP is shown. However data for ΔMBP is missing in Figure 4. Using ΔMBP may demonstrate even lower clustering, hinting that AH2 is also involved in this process.

      Thanks for this comment. In our view, it can be derived from figure 3 (which shows lack of binding to PC/PE membranes) that the ΔMBD construct would not cluster membranes under the conditions of the assay (clustering requires concomitant binding to two membranes). We now describe our rationale for this on lines 220-222. However, we did include the ΔMBD protein in the new negative staining TEM supplementary figure where it and ΔAH2 show no signs of clustering (figure S10).

      I think it would be better for normalise the data in the flotation experiments such that the percentage of 2C in the upper faction is presented as relative to the amount of lipid in the upper fraction (presented in Figure S4).

      The change suggested by the reviewer would make it impossible to show the important no-liposome control (leftmost bar in Fig. 3C) in the same plot as the other measurements. We believe that would unnecessarily complicate the figure. Thus, we opted to keep the measurement that are normalised by lipid fluorescence in the supplementary figure. Instead, we now added another mention of this supplementary figure in the legend to main figure 3.

      At several places (e.g., lines 232 and 272) the author's refer to "realistic systems". I think the term "physiologically relevant" might be more appropriate.

      Agreed and changed throughout.

      Line 237: I think "y" is a typo and should read "by".

      Thanks. This text was reworked due to the major changes to figure 6.

      Reviewer #1 (Significance (Required)):

      Significance

      I have limited expertise with structural biology but specialise my research on positive-sense RNA virus replication, structure and function. This research is of interest to a broad audience of researchers investigating many positive-sense RNA viruses, which extends beyond the viral family studied here. The work utilises novel techniques to begin to understand the specific roles of 2C in poliovirus replication. The author's data add important incremental new insight into recent studies on viral helicase proteins as referenced in the study, however, a key limitation is understanding the importance/relevance of their observations during a viral infection.

      We thanks the reviewer for this positive and nuanced appraisal of our work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors present an alternative assay system to investigate picornavirus 2C, a protein that is tricky to analyze biochemically in its full length form because of an amphipathic helix at the N-terminus. Poliovirus 2C is expressed with an N-terminal MBP tag, a 50kD protein that helps with solubility as is commonly used for 2C investigations. A difference here is that liposomes are included to mimic membranes for 2C attachment. The key findings are that 2C induces clustering of of liposomes, that double stranded RNA binding by 2C impacts this clustering effect and that a free N-terminus (after cleavage of MBP by TEV protease) is needed for RNA binding and an ATP independent (ie non helicase) RNA duplex separation activity.

      Major:

      In the floatation assays in figure 3 the authors use a system where MBP-2C is fluorophore-labeled with ATTO488 on exposed cysteines. Poliovirus and other enterovirus 2C has a very well characterized zinc finger domain that has cysteines coordinating a zinc ion. Mutation experiments previously showed that these cysteines are necessary for viral replication and 2C stability. Have the authors controlled for disruption of the zinc finger domain by the labelling of cysteines with ATT0488 and checked if the protein remains folded?

      We completely agree with the reviewer and apologise for the omission in the original submission. We have now included a Zn content measurement, which shows unchanged levels between labelled and unlabelled 2C protein (Figure S7). Also, we now in the revised manuscript explicitly describe our original reasoning for labelling on native cysteines: the presence of two cysteines which are not necessary for viral replication and which are more solvent exposed-exposed (and thus more likely to be labelled) in the crystal structure of the soluble fragment of 2C (lines 176-181).

      In the analysis of the amphipathic helix, did the authors include membranes in their structural predictions o just the free helix? How does inclusion of membranes impact the predictions? In the predictions in Figure D, only 2 of 4 show a kink and there doesn't seem to be a correlation between those that predict a kink or not and whether the hydrophobic side is aligned in Figure S1.

      Unfortunately, predicting a protein structure with the interacting membrane is beyond what is currently doable with protein prediction methods (one would have to combine protein structure predictions with molecular dynamics simulations including a membrane). Based on general principles of protein structure, it is likely that there is some flexibility around G17. Thus there may not be a single "kink angle" for any given virus, but we believe that the presence of the kink (and offset hydrophobic surfaces) for a number of viruses lends credibility and robustness to the observation. We added some descriptions of this thinking on lines 126-127.

      Based on previous structures of 2C from different viruses the N-terminal amphipathic helix containing region is predicted to localize on one face of the predicted hexametric structure tethering 2C to the membrane. How does the authors hypothesized model explain 2C dependent clustering? is there evidence that 2C hexamers can oligomerize further into dodecamers for example, maintaining separate faces to enable N-terminal interaction with different membranes? What is the distance between the liposomes in figure 4 at the points of density attributed to 2C? How does this compare to the size of 2C determined in previous structural studies? Is it consistent with one hexamer/2 hexamers sitting on top of one another?

      These are very interesting questions but we believe it is prudent to limit our speculation at this point. Eventually, we hope that larger data sets of cryo-electron tomography, coupled to subtomogram averaging, may provide a more definitive answer. What we managed to do with our current cryo-electron tomography data set is to estimate the volume of individual protein densities, and from the volume calculate an estimated molecular mass of the individual complexes seen in the tomograms. This correlates very well with 2C hexamers (new figure 4D).

      In the Discussion lines 278-285 the authors suggest that having MBP attached may reflect the polyprotein condition. Can they make a construct with MBP-2B2C to examine interaction with liposomes and assess 2C function?

      This is a highly relevant question, but the biochemistry of 2BC is even more challenging than 2C, and we are unfortunately nowhere near being able to work with purified 2BC at the moment.

      Discussion lines 293-296, the possibility of two different populations of 2C, binding RNA or membranes cannot be excluded, there is much more 2C around late in infection that present in early infection- the model in figure 8 doesn't acknowledge/capture this.

      We have changed the model figure such that more 2C is seen later, and the clustering function is also seen late in infection. The original discussion text referred to (which is unchanged) talks about a "preferential role in RNA replication and particle assembly at later time points" specifically for this reason. We hope the new figure 8 is better at conveying this message.

      Discussion lines 313-317, the authors don't reference a study where a mutant of foot-and-mouth disease virus 2C lacking the n-terminal amphipathic helix that could bind but not hydrolyze ATP, hexamerized in the presence of RNA that seems pertinent here (PMID: 20507978).

      Thanks for the suggestion. However, after the extensive changes we made to the revised to figure 6 based on excellent reviewer comments (essentially: the RNA chaperoning activity turned out to be an artefact, the improved assay shows no sign of RNA unwinding but instead of 2C-mediated ribonuclease activity), these sentence of the original discussion lost most of their context and we opted to remove them.

      Some evidence of MBP-2C cleavage by TEV in the different assays used should be presented as this is a major focus of discussion and currently no gels show TEV cleavage is happening.

      Thanks for the suggestion - we agree. We now show these in the new supplementary figures S5 and S12.

      Reviewer #2 (Significance (Required)):

      The work presents an additional methodology to investigate a a protein that has previously been difficult to study. The authors acknowledge that there is still a lot of 2C biology that remains to be discovered.

      Thanks, we agree.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript provides insights into the role of the N-terminus in membrane binding and its importance in the various functions of 2C.

      Major issues

      Line 103-119. Is this novel? I thought people had done a lot of bioinformatic analysis of PV 2C (especially Wimmer) who also did mutational work to analyse the importance of various amino acids in the N-terminal helix. I feel like the paper in general, and this section in particular, underplays the large body of work that has been done on the amphipathic helix by various groups.

      We apologise if our original manuscript didn't sufficiently acknowledge previous work in the field. In the first sentence of the mentioned paragraph (now lines 112-113) , we did however cite several papers that have previously addressed the amphipathic nature of the N-terminus of 2C. We have now added two more references along the same line, and changed the wording in a way that we hope better bring across that the amphipathic nature per se has been studies before. We would be happy to add more specific references if the reviewer has any suggestions. However, the rest of our analysis IS indeed novel for the following reasons: (i) we show that the amphipathic region is not a simple, single amphipathic helix, but instead has a conserved glycine (helix breaker/destabiliser residue) and two distinct amphipathic stretches before and after this region, (ii) we use alphafold2 (not available at the time of the earlier work) to provide the first reliable structural models of the membrane-binding domain. These models consistently, across several enterovirus 2C proteins, reveal that the hydrophobic surfaces of the first and second amphipathic regions, on either side of the conserved glycine 17, are offset from one another. This lends additional credibility to the distinct nature of these regions which have not previously been identified as such and which we also show in the biochemical assays to be functionally distinct. We have now also added a clarification to the Discussion that the N-terminus of 2C had previously been identified as its membrane-binding domain and we cite references for this. We hope that these changes will sufficiently acknowledge earlier work in the field while clearly pointing out the advance that our paper makes.

      Line 132. Did you validate your column with known MW standards? The peak for full length and deltaAH1 look fairly standard for 2C, in that you have a mixture of species. Not sure you can say it is a hexamer when it is such a broad peak. C doesn't really help you too much since the counts at 400 (pentamer) and 480 (hexamer) are almost the same with quite large error bars. Like most people that have worked with 2C I think the best you can say is that you are making some kind of oligomerized 2C that includes hexamer, pentamer, etc. Why no dimer for MBP-2C and MBP-2C(delta AH1) when compared to the other constructs?

      We did not calibrate the gel filtration column since the outcome would anyway be a more crude estimate of molecular mass than the mass photometry and SEC-MALS measurements. But we do agree with the reviewer on the broad mass photometry peaks. To address this experimentally, we compared the existing MBP-2C spectra to new recordings on apoferritin, a highly stable homomultimeric protein complex of a similar mass to aa MBP-2C hexamer. The apoferritin mass estimate is overlayed with the full-length MBP-2C in the new figure 2D and the corresponding supplementary figure S3. This indeed shows that the MBP-2C peak is broader, i.e. consistent with a mix of species which are predominantly but not only hexamers. We describe and discuss this on lines 145-149. As for the mention of pentamers in the original submission: we now think this was an unfortunate choice of words. The mass photometry data for 2C(ΔAH1) could more parsimoniously be interpreted as a mix of hexamers and other (unknown to us) smaller oligomers such as trimers. We have removed all mentions of pentamers.

      Line 143. Does your data show that there are two amphipathic helices? Bioinformatics suggests it but your experiments just show the importance of the two areas in oligomerization, not that it is forming two helices.

      We agree that the choice of words was not idea and have now changed it to "structure predictions indicate" (lines 162).

      Figure S2. Your preps are still relatively dirty, which isn't ideal for biochemical assays. Especially lane 3, where you are looking at 50-60% purity. I don't want you to re-run experiments but I think you need to comment on the purity of the protein you are working with. Also I don't like that you removed the top and bottom of the SDS-PAGE. How much protein never entered the gel. Is there a big fat band at 20 kDa? You need to have the full gel here. Did you measure 260 nm of the preps as well to see if you had bound RNA to the 2C?

      Thanks for the comment, we agree that our original submission lacked detail in the description of the protein purification. This is now addressed with the new figure S2 which shows size exclusion chromatograms of the fluorophore-labelled proteins (same chromatograms as in figure 2) and the corresponding uncropped gels imaged both in the stain-free channel (showing all proteins) and in the fluorescence channel. The A260/A280 ratio measured for all proteins shows that they are free of nucleic acids at the point of imaging. The protein preps are not 100% homogeneous but we do believe that they are more than 50-60% pure.

      Lines 170. Wasn't this done in the recent "An Amphipathic Alpha-Helix Domain from Poliovirus 2C Protein Tubulate Lipid Vesicles"? I don't see it referenced. What is novel about the current work when compared to that paper? Any differences?

      Thanks for pointing this out. The referenced study worked with a synthesized, isolated peptide corresponding to AH2 (i.e. not with full protein). An amphipathic peptide outside the context of its protein cannot be expected to recapitulate the properties of the entire protein, e.g. since it is not spatially constrained in how it interactis with membranes. As one example (relating to the title of that paper) we don't see full-length 2C protein tubulating membranes the way the isolated peptide does. As for the reviewer's question about novelty, the paper mentioned does not identify the split nature of the amphipathic region, does not consider the role of AH1, does not characterise the membrane-binding properties of full-length 2C with respect to liposome membrane composition and size, does not identify and characterise the membrane clustering properties of 2C, nor its interactions with nucleic acid when bound to a membrane. However, we do agree that we should have cited the paper in our manuscript. We now cite it in the discussion, lines 320-321.

      I'm surprised by the lack of electron microscopy (negative stain mostly) of both the oligomerized 2C and the various liposomes. I know the Carlson group is a microscopy group so why the lack of validation using electron microscopy of the various DLS experiments? I know you did cryo-ET for one of the constructs but I think negative stain electron microscopy of other constructs would be useful.

      Thanks for the suggestion. As suggested, we have now expanded the analysis with negative staining EM of several more constructs studied by DLS. It can be found in the new supplementary figure S10.

      Figure 4C. What evidence is there that this is 2C apart from you added it to the liposomes? It also comes back to the relative impurity of your protein prep. Could this be E.coli contamination?

      Thanks for this comment. We have now added a new supplementary figure (S5) showing SDS-PAGE gels of the reactions used for flotation and DLS assays - which are identical to the cryo-ET samples. In addition, we estimated the molecular mass of the individual, putative 2C desities in the cryo-electron tomograms by measuring their volume. This analysis, which can be found in the new figure 4D, shows that the estimated mass of individual protein densities is consistent with a hexamer of full-length 2C. In addition, we mention in the discussion the long-term need to determine high-resolution structures of membrane-bound 2C using cryo-ET and subtomogram averaging (lines 315-318).

      Figure 8. Is this model supported by the data in this paper? Your cryo-ET says that 2C is there but that isn't supported by any other data. How is the dsRNA protected from the innate immune system in this model? is it just sat out in the cytosol? How is the nascent ssRNA packeged into the capsid? Is there competition between the dsRNA and capsid for 2C binding (which your model suggests)? I know it sounds like I am being overly critical of the model but in my opinion there are still too many unanswered questions in the field to come up with a half decent model.

      Thanks for this comment. We are the first to agree that our understanding of the roles of 2C is far from complete! We should have been more clear that the model figure represents some of the roles of 2C identified to date, and does not claim to be complete. However we do feel that a model figure serves a purpose of putting our findings into a context, and also providing testable hypotheses for future research . As for the question, some of the roles of 2C shown in the model figure (in particular, particle assembly) are rather supported but earlier work of ourselves and others. We have now produced a new model figure and changed the figure legend to better reflect the incompleteness of the current understanding, and the origin of the different parts of the model figure. In addition, we extended the final paragraph of the discussion (which lists still-unknown aspects of 2C) with the reviewer's mention of dsRNA shielding from innate immunity (lines 374-375). The other aspects mentioned by the reviewer as not yet fully understood are already mentioned in that paragraph.

      Minor issues

      Lines 43-45: I feel like you underplay the success of the poliovirus vaccination program. Approximately 30 of WPV1 in 2022 and the full eradication of WPV2 and 3. Vaccine derived polio is still an issue but even that is relatively low compared to where the world was in the 1950s.

      We agree that the previous wording was not ideal. We replaced it and added another recent reference - related to the type 2 vaccine switch (lines 47-49).

      Line 66. I agree there are 11 individual proteins but I feel like this leaves out the fact that some of the uncleaved precursors appear to have some functions, for example 2BC.

      Good point. We have now added a mention of 2BC and the fact that it has distinct functions to the introduction (lines 70-71). 2BC is also mentioned in the legend of the model figure (figure 8).

      Line 56: LD needs to be defined.

      Well spotted thanks. Since the abbreviation was not used anywhere else we opted to spell it out instead (line 59).

      Line 75. I think you have misrepresented Xia et al here. They clearly say that in their study that they show helicase and chaperone activity. I never managed to repeat that work but you should still report what they claim. One major thing is that they used insect expressed protein, whereas most people (including myself and in the paper under review) use E.coli expressed protein. Do post translational modifications play an important role in function?

      You are right that the reference to their paper for this statement was incorrect. We have now made this part of the introduction more explicit (lines 82-83) and we also in the new discussion mention the possibility of e.g. post-translational modifications affecting 2C helicase activity, under reference to Xia et al (lines 359-361)

      Line 103. Need to make it clear here it is poliovirus 2C.

      Thanks, we added it (line 112).

      Line 135. I assume you mean kDa instead of uM?

      It should actually be μM. It is the solution concentration at which the assay was performed. We added some words to clarify this (line 154).

      Figure 3. What do you mean by "Only 2C"? Is that MBP-2C? Maybe I am reading the data wrong but adding TEV does nothing? How do you know TEV is removing the MBP? It looks like MBP-2C binds to the liposomes just the same as cleaved MBP-2C. I see in line 165 you acknowledge this. Could an alternative conclusion for line 168 be that MBP isn't being cleaved off but that AH2 is too small to be exposed in that construct? Did you do that construct without MBP being cleaved? I think you need to confirm that MBP is being cleaved off.

      Thanks for spotting this mistake. It should indeed be MBP-2C (in the absence of liposomes). We corrected figure 3. Also, in response to this comment and similar ones, we have now added a new supplementary figure showing SDS-PAGE gels of the reaction loaded onto flotation assays and DLS (figure S5). It shows that MBP-2C is cleaved.

      Line 184. Is there a reason you use the 2019 paper as a reference instead of the far earlier Bienz et al papers? I'd suggest they are the seminal papers on 2C membrane association. Once again how is this work different from the recent "An Amphipathic Alpha-Helix Domain from Poliovirus 2C Protein Tubulate Lipid Vesicles" paper?

      See our response above of the paper mentioned here (which we have now cited). As for why we cite the 2019 paper here: our statement pertains specifically to the contact sites between lipid droplets and replication organelles, not to the membrane binding of 2C per se. We have now added a more general mention of membrane remodelling by non-structural proteins in the introduction, where we cite on of the Bienz papers (lines 75-77).

      Figure 5D. So only 1-3% of RNA is found in the upper fraction? Is that significant enough to say that dsRNA was recruited significantly more than ssRNA? How confident are you in your quantification of the starting amounts of RNA?

      We agree that the fraction is low, however, the fluorescence signal is very clearly above background. We are thus confident in the measurement. The low percentage at the end of the experiment likely has a simple physico-chemical explanation: in a dynamic equilibrium in a density gradient, whatever RNA dissociates during the run will migrate away from the 2C-vesicle fraction and not be able to rebind. We still tried to address this concern by a complementary experiment where we used fluorescence anisotropy to measure binding of RNA to 2C on vesicles. While the measurements showed the same tendency, they curves were not clean enough to be published, which we think is due to the complex system with 2C bound to vesicles and clusters of vesicles. Still, in view of the relatively low percentage of measured recruitment we opted to adjust the paper title and the title of figure 5 (including the subheading related to figure 5) to put less emphasis on the dsRNA recruitment.

      Line 223. Any idea why the MBP needs to be cleaved off? Clearly the MDB is accessible or it would not bind to the liposomes.

      Since we have no data directly supporting this we prefer not to speculate in the paper. But one guess would be that the NTD of 2C, as implicated by previous publications, has a dual role in membrane binding and RNA binding. It may be that it can bind membrane while conjugated to MBP, but needs MBP to be removed in order to simultaneously bind membrane and RNA.

      Line 237: missing "b" in "by"

      Thanks. This paragraph was rewritten in the light of the changes to figure 6.

      Figure 6. I don't fully understand the results here. Earlier you showed that the delta MBD didn't really bind SUV. So presumably it isn't really membrane bound. Why does it have similar activity to full-length MBP in your helicase assay if membrane is important? Did you do SUV and TEV protease only control?

      We are very grateful to this reviewer (and others) for pointing out the need for a TEV control. When performing the control, we found that the TEV protease, at the high concentrations initially used, surprisingly had an artefactual RNA chaperone-like effect on its own. We then proceeded to titrate down the TEV protease concentration to the point where it no longer interfered. At this TEV protease concentration, although 2C was substantially cleaved (see the new supplementary figure S12), we could no longer detect an RNA chaperone activity. Thus, the contents of the new figure 6, and its conclusions, have been substantially changed. We now focused our attention on the remaining effect that 2C has on RNA: single-strand ribonuclease activity. These experiments were all conducted in the presence of RNase inhibitors, and the presence of Mg2+-dependent ribonuclease activity parallels a recent publication that found this for truncated 2C from hepatitis A and several enteroviruses.

      Line 257: "staring"?

      Thanks, corrected. A staring glycine would indeed be something strange.

      Line 336. Need to change the u to mu.

      Thanks, corrected.

      Any discussion on your observation in Figure 1D that EV71 and CVB3 don't appear to have AH1 and AH2 or do you think that the domains are conserved across the different viruses?

      Thanks for bringing this up. Based on this and a comment from another reviewer, we have now clarified our thinking around this. Since the glycine will introduce some flexibility between AH1 and AH2, we cannot say from the single alphafold predictions that this is THE kink angle. The presence of the kink in the predictions of several MBDs lends more credibility to the robustness of the observation, but most importantly the hydrophobic surfaces in AH1 and AH2 are non-aligned for ALL sequences we looked at. This is now described on lines 126-128.

      Table 1 (and possibly elsewhere): an apostrophe is not the prime symbol. 5' compared to 5′.

      Thanks, we corrected this throughout.

      Line 702 "and" should be "an".

      Thanks, corrected.

      I couldn't open one of the movies (140844_0_supp_2820374_a2g272.avi).

      Sorry to hear this, we will check the movie again.

      Reviewer #3 (Significance (Required)):

      Overall I liked the paper and is worth publishing. One of the issues in the 2C field is the difficulty in making pure 2C and carrying out in vitro assays that correlate with what is observed in the natural infection. I think this paper suffers from similar struggles with a 2C preparation that doesn't appear that pure. I think it also suffers from not having 2C from a wild-type infection. I don't think that it is feasible to get that kind of 2C but by once again using a recombinant protein from E.coli we are left with another manuscript that provides conflicting evidence of the functions of 2C without a definitive answer. The experiments are well done, although are missing some controls and the manuscript is laid out in a logical manner and is relatively easy to follow.

      We thanks the reviewer for these comments. We believe that we have now provided better information regarding the purification of the recombinant 2C protein, and we do think that the controls present in the original manuscript and the revised manuscript alleviate the concerns about lack of specificity. Of course, isolating 2C vesicles from wildtype infection would be another interesting way of approaching its function, but such an approach would come with its own set of challenges related e.g. to the presence of confounding host factors.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      This is an interesting manuscript that reports the development of an in vitro membrane assay for probing the biochemical functions of the enterovirus 2C protein. The technique is interesting because it can be applied to 2C proteins from other members of the picornavirus family, an important group of mammalian pathogens. It has the capacity to probe different functions (e.g. membrane clustering, ATPase activity, RNA-binding and manipulation activities).

      Overall, the manuscript is well written and gives a clear account of the work undertaken. It adds insight to previous studies of enteroviral (and picornaviral) 2C proteins, providing confirmation of some earlier work in a more physiological context and some new insights, particularly into the membrane and RNA binding aspects of 2C.

      That said, there are a number of places where some amendment of the claims made is required to provide a more precise statement of the findings of this work. These are listed below.

      We thank the reviewer for this positive feedback on our work, as well as for the specific comments below.

      Line 21 (Abstract) - The authors claim to have shown that a conserved glycine divides the N-terminal membrane-binding domain into 2 helices. I would suggest instead what they have produced are computational predictions that this is the case - some way short of an experimental demonstration. Sequence analysis predicts helical secondary structure in the N-terminus and indeed Alphafold2 also predicts a helical structure, but these predictions require experimental verification. The authors should therefore rewrite sections that claim to have shown the presence of 2 helices. In doing so, they should perhaps also comment on the fact that Alphafold2 does not predict 2 helices in this region for all enteroviruses (see Fig 1D). Moreover, the sequence analysis in Fig. S1 shows the presence of two Lys residues in the segment 17-38; it would be interesting for the reader to have these indicated in the figures showing the Alphafold2 prediction - do they in any way interrupt the hydrophobic face of the predicted helix?

      Thanks very much for this comment, which is in line with what other reviewers also wrote. We agree, and changed the abstract sentence. We have also rewritten the manuscripts in several places to address the limits of structure predictions and the eventual need for an experimental structure of full-length membrane-bound 2C (lines 126-128 and 315-318).

      Line 82 (Introduction) - The authors write that the membrane binding domain (MBD) of poliovirus has been shown to mediate hexamerisation, citing Adams et al (2009) - reference 43. However, that is not what this paper shows. Rather it provides evidence of aggregation of an MBP-2C fusion protein into forms that ranged from tetramer to octamer, but no evidence that these aggregates assume functional forms (e.g. the presumed hexameric ring structure characteristic of the AAA+ ATPase family to which 2C belongs). As far as I am aware the first demonstration of hexameric ring formation by a picornaviral 2C protein was for the 2C of foot-and-mouth disease virus (see Sweeney et al, JBC, 2010). Although this is not an enterovirus, this finding was later confirmed for Echovirus 30 (ref 51). I should declare an interest here: the Sweeney paper is from my lab. I will leave it to the editor and the authors to determine how to write a more precise account of the early observations of hexamerisation in picornaviral and enteroviral 2C proteins.

      Thanks very much for this insightful comment. As a response to this and other similar comments, we are much more cautious about our wording in the revised manuscript (see also response to comment below. In the part of the introduction discussed here (now lines 89-91) we now use the original wording of the Adams paper ("oligomerization"). In the context of that new text we didn't feel that Sweeney et al paper was a suitable reference, but we now cite it in the later mention of 2C's oligomeric/hexameric state in the first part of the Results (lines 137-138 ).

      Line 132 - the authors used mass photometry to investigate oligomeric forms of their MBP-2C constructs and state that for the full length 2C protein "the high-mass peak closely corresponds to a hexamer". While it is true that the peak shown in Fig 2C aligns with the expected MW for an MBP-2C hexamer, the peak is very broad, indicative of the presence of other oligomeric states with lower and higher numbers of monomers. This should be commented on. Indeed, the finding seems to echo the early findings of Adams et al (ref 43) with poliovirus MBP-2C.

      Thanks for this comment, which was also made by another reviewer. We cite here what we replied to that reviewer

      ...we do agree with the reviewer on the broad mass photometry peaks. To address this experimentally, we compared the existing MBP-2C spectra to new recordings on apoferritin, a highly stable homomultimeric protein complex of a similar mass to aa MBP-2C hexamer. The apoferritin mass estimate is overlayed with the full-length MBP-2C in the new figure 2D and the corresponding supplementary figure S3. This indeed shows that the MBP-2C peak is broader, i.e. consistent with a mix of species which are predominantly but not only hexamers. We describe and discuss this on lines 145-149.

      Line 143 - for the reasons given above, this summary paragraph represents too strong a statement of what has been observed.

      We agree, and changed the paragraph. It now only refers to "oligomerization" (lines 162-164).

      Line 197 - I note that the authors did not test the membrane clustering capabilities of the 2C(41-329) construct. Although the 2C(deltaAH1) construct had already shown a significant loss of activity, the shorter construct could still have been a useful control. I don't think it is necessary for this experiment to be done, but if the authors have a rationale for not performing the experiment, perhaps they could include it in a revised manuscript.

      Thanks for the suggestion. The rationale is that a protein that doesn't bind a membrane in the first place will also not cluster them (an action that requires binding TWO membranes). We now describe our reasoning on lines 220-222. Nevertheless, we did test these constructs in the new supplementary figure showing negative staining TEM (figure S10).

      Line 223 - typo. I think you mean MBD.

      Thanks! Corrected (now line 257).

      Line 215 - the authors observed that the presence of ssDNA reduced membrane clustering and conclude that "nucleic acid binding partially outcompetes membrane tethering activity". Two things: (1) although I agree is it likely that this effect is due to binding of DNA to 2C, binding has not been demonstrated experimentally so the authors should be more careful in how they describe their result; (2) there is no data presented to show that RNA binding reduces membrane tethering so at best I think the conclusion has to be that the data are consistent with the notion that DNA binding reduces membrane tethering. It would of course be interesting to see the effects of RNA and I'm curious to know why the assay was not performed.

      Thanks for the comment. The honest answer is that previous publications (primarily Yeager et al, NAR 2022) convinced us that the outcome should be near-identical with DNA, so we chose DNA oligos because they are cheaper and easier to work with. But we agree with the reviewer that RNA is of course more relevant. We now present a comparison at 5 μM of ssDNA and ssRNA, which in fact shows a slightly stronger effect on membrane clustering by RNA (figure 5C). In the light of this additional experiment, we feel that some of the text changes suggested by the reviewer may no longer be necessary.

      Line 237 - typo: by, not y

      Thanks. In the light of the extensive changes to figure 6 this text was removed.

      Line 284 - the authors claim that 2C may only bind RNA after the N-terminus is liberated from 2B in infected cells, since cleavage of the MBP tag from their construct was needed for 2C to bind RNA in their in vitro assay. However, this does not automatically follow given the large structural differences between MBP and 2B and the fact that the authors have not tested the RNA binding capacity of a 2BC fusion protein. Their claim here is too strong and should be re-written.

      We agree, and have added a discussion along the lines suggested by the reviewer (line 330-332).

      Line 293 - The authors speculate that RNA binding might cause a shift between the membrane clustering activities and the role of the protein in RNA replication. However, since they have not shown that RNA binding reduces membrane clustering, this is too speculative.

      In our revised manuscript we have studied the effect of RNA on membrane binding, thus we feel that this text is relevant in the context of the extended experiments.

      Line 299-317 - within this discussion is the assumption that in their assay system enterovirus 2C adopts the ring-like hexameric structure typical of AAA+ ATPases. While I agree this may well be the case, it has not been demonstrated in this study so the authors should make clear they are making this assumption. The same applies to the legend of Fig 8.

      This part of the discussion was extensively rewritten after our changes to figure 6. We now only refer to "hexamer" once in the corresponding part of the discussion, where we talk about structural models of hexamers produced by other groups who have crystallised fragments of 2C. There we believe we should refer to hexamers to accurately cite their work.

      We are not sure what the reviewer is referring to when it comes to the legend for figure 8: the original legend had no reference to the oligomeric state of 2C. We have substantially changed figure 8 and its legend and the new figure and legend make no references to hexamers/oligomers.

      Line 302 - the authors claim to have shown that 2C is 'selective' for dsRNA. I think at best they have shown a preference for binding dsRNA over ssRNA.

      We changed the wording (line 349). We have also changed the title of the paper where we removed "double-stranded".

      Line 313 - The sentence starting "A recent study..." needs a reference.

      The revised discussion no longer contains this sentence.

      Line 332 - the full sequence of the synthetic gene used in this study should be made available (e.g. as supplementary information or a deposited sequence with an accession number). This is a critical point before the paper can be published.

      We will of course submit the sequences as supplementary data. Thanks for the reminder.

      Line 362 - the authors should describe the likely points of attachment of fluorophores and comment on how this labelling might affect 2C function.

      Thanks for the comment. In response to this and a similar comment from another reviewer, we discuss the likely conjugation site of the fluorophore (lines 175-181), and also (due to the proximity to the Zn finger) provide a new measurement showing that equal amounts of Zn can be detected in the labelled and unlabelled protein (figure S7).

      Line 372 - Is a single protein standard (BSA) sufficient to calibrate the SEC-MALS system?

      Yes, it is the recommended procedure (note that SEC-MALS is only dependent on scattering, not elution volumes etc).

      Reviewer #4 (Significance (Required)):

      As stated above this is an interesting study that presents findings from a novel assay. It will be of interest to picornavirologists and the wider community interested in the mechanisms of AAA+ ATPases.

      We thanks the reviewer for this positive appraisal of our work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the reviewer for their careful reading of our manuscript and have taken all of their grammatical corrections into account.

      Reviewer #2 (Public Review):

      Weaknesses: 

      The paper contains multiple instances of non-scientific language, as indicated below. It would also benefit from additional details on the cryo-EM structure determination in the Methods and inclusion of commonly accepted requirements for cryo-EM structures, like examples of 2D class averages, raw micrographs, and FSC curves (between half-maps as well as between rigid-body fitted (or refined) atomic models of the different polymorphs and their corresponding maps). In addition, cryo-EM maps for the control experiments F1 and F2 should be presented in Figure 9.

      We tried to correct the non-scientific language and have included the suggested data on the Cryo-EM analyses including new Figures 11-17.  We did not collect data on the sample used for the seeds in the cross seeding experiments because we had already confirmed in multiple datasets that the conditions in F1 and F2 reproducibly produce fibrils of Type 1 and Type 3, respectively. We have now analyzed cryo-EM data for 6 more samples at pH 7.0 and found that several kinds of polymorphs (Types 1A, 1M, 2A, 2B and 5) are accessible at this pH, however the Type 3 polymorphs are not formed at pH 7.0 under the conditions that we used for aggregation.

      Reviewer #2 (Recommendations For The Authors):

      Remove unscientific language: "it seems that there are about as many unique atomicresolution structures of these aggregates as there are publications describing them"   

      We have rephrased this sentence.

      For same reason, remove "Obviously, " 

      Done

      What does this mean? “polymorph-unspecific” 

      Rephrased as non-polymorph-specific

      What does this mean? "shallow amyloid energy hypersurface"  

      By “shallow hypersurface” we mean that the minimum of the multi-dimensional function that describes the energy of the amyloid is not so deep that subtle changes to the environment will not favor another fold/energy minimum. We have left the sentence because while it may not be perfect, it is concise and seems to get the point across.

      "The results also confirm the possibility of producing disease-relevant structure in vitro." -> This is incorrect as no disease-relevant structure was replicated in this work. Use another word like “suggest”.

      We have changed to “suggest” as suggested.

      Remove "historically" 

      Done

      Rephrase “It has long been understood that all amyloids contain a common structural scaffold” 

      Changed to “It has long been established that all amyloids contain a common structural scaffold..” 

      "Amyloid polymorphs whose differences lie in both their tertiary structure (the arrangement of the beta-strands) and the quaternary structure (protofilamentprotofilament assembly) have been found to display distinct biological activities [8]" -> I don't think this is true, different biological activities of amyloids have never been linked to their distinct structures.  

      We have added 5 new references (8-12) to support this sentence.

      Reference 10 is a comment on reference 9; it should be removed. Instead, as for alphasynuclein, all papers describing the tau structures should be included.  

      We have removed the reference, but feel that the addition of all Tau structure references is not merited in this manuscript since we are not comparing them.

      Rephrase: "is not always 100% faithful"

      Removed “100%”

      What is pseudo-C2 symmetry? Do the authors mean pseudo 2_1 symmetry (ie a 2-start helical symmetry)?

      Thank for pointing this out.  We did indeed mean pseudo 21 helical symmetry.  

      Re-phrase: "alpha-Syn's chameleon-like behavior" 

      We have removed this phrase.

      "In the case of alpha-Syn, the secondary nucleation mechanism is based on the interaction of the positively charged N-terminal region of monomeric alpha-Syn and the disordered, negatively charged C-terminal region of the alpha-Syn amyloid fibrils [54]" -> I would say the mechanisms of secondary nucleation are not that well understood yet, so one may want to tune this down a bit. 

      We have changed this to “mechanism has been proposed to be”

      The paragraphs describing experiments by others are better suited for a Discussion rather than a Results section. Perhaps re-organize this part? 

      We have left the text intact as we are using a Results and Discussion format.

      A lot of information about Image processing seems to be missing: what steps were performed after initial model generation? 

      We have added more details in the methods section on the EM data processing and model analysis.

      Figure 1: Where is Type 4 on the pH scale?

      We have adjusted the Fig 1 legend to clarify that pH scale is only applicable to the structures presented in this manuscript. 

      Figure 2: This might be better incorporated as a subpanel of Figure 1.

      We agree that this figure is somewhat of a loner on its own and we only added it in order to avoid confusion with the somewhat inconsistent naming scheme used for the Type 1B structure. However, we prefer to leave it as a separate figure so that it does not get dilute the impact of figure 1.

      Figure 3: What is the extra density at the bottom of Type 3B from pH 5.8 samples 1 and 2. pH 5.8 + 50mM NaCl (but not pH 5.8 + 100 mM NaCl)? Could this be an indication of a local minimum and the pH 5.8 + 100 mM NaCl structure is correct? Or is this a real difference between 0/50mM NaCl and 100 mM NaCl? 

      We did not see the extra density to which the reviewer is referring, however the images used in this panel are the based on the output of 3D-classification which is more likely to produce more artifacts than a 3D refinement. With this in mind, we did not see any significant differences in the refined structures and therefore only deposited the better quality map and model for each of the polymorph types.

      Figure 3: To what extent is Type 3B of pH 6.5 still a mixture of different types? The density looks poor. In general, in the absence of more details about the cryo-EM maps, it is hard to assess the quality of the structures presented.

      In order to improve the quality of the images in this panel, a more complete separation of the particles from each polymorph was achieved via the filament subset selection tool in RELION 5. In each case, an unbiased could be created from the 2D classes via the relion_helix_inimodel2D program, further supporting the coexistence of 4 polymorphs in the pH 6.5 sample. The particles were individually refined to produce the respective maps that are now used in this figure.

      Many references are incorrect, containing "Preprint at (20xx)" statements.  

      This has been corrected.

      Reviewer #3 (Public Review):

      Weaknesses: 

      (1) The authors reveal that both Type 1 monofilament fibril polymorph (reminiscent of JOSlike polymorph) and Type 5 polymorph (akin to tissue-amplified-like polymorph) can both form under the same condition. Additionally, this condition also fosters the formation of flat ribbon-like fibril across different batches. Notably, at pH 5.8, variations in experimental groups yield disparate abundance ratios between polymorph 3B and 3C, indicating a degree of instability in fibrillar formation. The variability would potentially pose challenges for replicability in subsequent research. In light of these situations, I propose the following recommendations: 

      (a) An explicit elucidation of the factors contributing to these divergent outcomes under similar experimental conditions is warranted. This should include an exploration of whether variations in purified protein batches are contributing factors to the observed heterogeneity.

      We are in complete agreement that understanding the factors that lead to polymorph variability is of utmost importance (and was the impetus for the manuscript itself). However the number of variables to explore is overwhelming and we will continue to investigate this in our future research. Regarding the variability between batches of purified protein, we also think that this could be a factor in the polymorph variability observed for otherwise “identical” aggregation conditions, particularly at pH 7 where the largest variety of polymorphs have been observed. However, even variation between identical replicates (samples created from the same protein solution and simply aggregated simultaneously in separate tubes) can lead to different outcomes (see datasets 15 and 16 in the revised Table 1) suggesting that there are stochastic processes that can determine the outcome of an individual aggregation experiment. While our data still indicates that Type 1,2 and 3 polymorphs are strongly selected by pH, the selection between interface variants 3B vs. 3C and 2A vs. 2B might also be affected by protein purity. Our standard purification protocol produces a single band by coomassie-stained SDS-PAGE however minor truncations and other impurities below a few percent would go undetected and, given the proposed roles of the N and C-termini in secondary nucleation, could have a large effect on polymorph selection and seeding. In line with the reviewer’s comments we now include a batch number for each EM dataset. While no new conclusions can be drawn from the inclusion of this additional data, we feel that it is important to acknowledge the possible role of batch to batch variability. 

      (b) To enhance the robustness of the conclusions, additional replicates of the experiments under the same condition should be conducted, ideally a minimum of three times.  

      The pH 5.8 conditions that yield Type 3 fibrils has already been repeated several times in the original manuscript. Since the pH 7.4 conditions produce the most common a-Syn polymorph (Type 1A) and were produced twice in this manuscript (once as an unseeded and once as a cross-seeded fibrilization) we decided to focus on the intermediate condition where the most variability had been seen (pH 7.0). The revised table 1 now has 6 new datasets (11-16) representing 6 independent aggregations at pH 7.0 starting from two different protein purification batches. The results is that we now produce the type 2A/B polymorphs in three samples and in two of these samples we once again observed the type 1M polymorph.  The other samples produced Type 1A or non-twisted fibrils.

      (c) Further investigation into whether different polymorphs formed under the same buffer condition could lead to distinct toxicological and pathology effects would be a valuable addition to the study.  

      The correlation of toxicity with structure would in principle be interesting. However the Type 1 and Type 3 polymorphs formed at pH 5.8 and 7.4 are not likely to be biologically relevant. The pH 7 polymorphs (Type 5 and 1M) would be more interesting because they form under the same conditions and might be related to some disease relevant structures. Still, it is rare that a single polymorph appears at 7.0 (the Type 5 represented only 10-20% of the fibrils in the sample and the Type 1M also had unidentified double-filament fibrils in the sample). We plan to pursue this line of research and hope to include it in a future publication.

      (2) The cross-seeding study presented in the manuscript demonstrates the pivotal role of pH conditions in dictating conformation. However, an intriguing aspect that emerges is the potential role of seed concentration in determining the resultant product structure. This raises a critical question: at what specific seed concentration does the determining factor for polymorph selection shift from pH condition to seed concentration? A methodological robust approach to address this should be conducted through a series of experiments across a range of seed concentrations. Such an approach could delineate a clear boundary at which seed concentration begins to predominantly dictate the conformation, as opposed to pH conditions. Incorporating this aspect into the study would not only clarify the interplay between seed concentration and pH conditions, but also add a fascinating dimension to the understanding of polymorph selection mechanisms.

      A more complete analysis of the mechanisms of aggregation, including the effect of seed concentration and the resulting polymorph specificity of the process, are all very important for our understanding of the aggregation pathways of alphasynuclein and are currently the topic of ongoing investigations in our lab.

      Furthermore, the study prompts additional queries regarding the behavior of cross-seeding production under the same pH conditions when employing seeds of distinct conformation. Evidence from various studies, such as those involving E46K and G51D cross-seeding, suggests that seed structure plays a crucial role in dictating polymorph selection. A key question is whether these products consistently mirror the structure of their respective seeds. 

      We thank the reviewer for reminding us to cite these studies as a clear example of polymorph selection by cross-seeding. Unfortunately, it is not 100% clear from the G51D cross seeding manuscript (https://doi.org/10.1038/s41467-021-26433-2) what conditions were used in the cross-seeding since different conditions were used for the seedless wild-type and mutant aggregations… however it appears that the wildtype without seeds was Tris pH 7.5 (although at 37C the pH could have dropped to 7ish) and the cross-seeded wild-type was in Phosphate buffer at pH 7.0. In the E46K cross-seeding manuscript, it appears that pH 7.5 Tris was used for all fibrilizations (https://doi.org/10.1073/pnas.2012435118).  In any event, both results point to the fact that at pH 7.0-7.5 under low-seed conditions (0.5%) the Type 4 polymorph can propagate in a seed specific manner.

      (3) In the Results section of "The buffer environment can dictate polymorph during seeded nucleation", the authors reference previous cell biological and biochemical assays to support the polymorph-specific seeding of MSA and PD patients under the same buffer conditions. This discussion is juxtaposed with recent research that compares the in vivo biological activities of hPFF, ampLB as well as LB, particularly in terms of seeding activity and pathology. Notably, this research suggests that ampLB, rather than hPFF, can accurately model the key aspects of Lewy Body Diseases (LBD) (refer to: https://doi.org/10.1038/s41467-023-42705-5). The critical issue here is the need to reconcile the phenomena observed in vitro with those in in-vivo or in-cell models. Given the low seed concentration reported in these studies, it is imperative for the authors to provide a more detailed explanation as to why the possible similar conformation could lead to divergent pathologies, including differences in cell-type preference and seeding capability.  

      We thank the reviewer for bring this recent report to our attention. The findings that ampLB and hPFF have different PK digestion patterns and that only the former is able to model key aspects of Lewy Body disease are in support of the seed-specific nature of some types of alpha-synuclein aggregation.  We have added this to the discussion regarding the significant role that seed type and seed conditions likely play in polymorph selection.

      (4) In the Method section of "Image processing", the authors describe the helical reconstruction procedure, without mentioning much detail about the 3D reconstruction and refinement process. For the benefit of reproducibility and to facilitate a deeper understanding among readers, the authors should enrich this part to include more comprehensive information, akin to the level of detail found in similar studies (refer to:

      https://doi.org/10.1038/nature23002).

      As also suggested by reviewer #2, we have now added more comprehensive information on the 3D reconstruction and refinement process.

      (5) The abbreviation of amino acids should be unified. In the Results section "On the structural heterogeneity of Type 1 polymorphs", the amino acids are denoted using three-letter abbreviation. Conversely, in the same section under "On the structural heterogeneity of Type 2 and 3 structures", amino acids are abbreviated using the one-letter format. For clarity and consistency, it is essential that a standardized format for amino acid abbreviations be adopted throughout the manuscript.

      That makes perfect sense and had been corrected.

      Reviewing Editor:

      After discussion among the reviewers, it was decided that point 2 in Reviewer #3's Public Review (about the experiments with different concentrations of seeds) would probably lie outside the scope of a reasonable revision for this work. 

      We agree as stated above and will continue to work on this important point.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript provides a detailed analysis of RNA and protein dynamics during transmission of the rodent malaria model P. yoelii from the mouse host to an in vitro ookinete culture setting (mimicking the mosquito midgut environment). This group and others have shown experimentally that a substantial number of mRNAs is stored in the female Plasmodium gametocyte, ready to be translated following initiation of ookinete development. The process is akin to maternal deposition of mRNA in oocytes of metazoans. With this manuscript the authors provide a significant contribution to the field of translational control in Plasmodium parasites as they explore the translational activation during the early hours of zygote-to-ookinete development. The paper presents RNAseq and mass-spec analyses of female gametocytes and for the first time for 6-hour zygotes (ie a fertilized female gamete); the zygote datasets are much improved and more comprehensive than the only other performed in 2008 in P. gallinaceum. Using comparative analyses of transcriptome and proteome data (including published datasets) the authors arrive at a list of 198 transcripts that are translationally repressed in the gametocyte and translated within 6 hours of fertilization in the zygote. Many of these mRNAs are known to be involved in zygote to ookinete transformation. BioID is finally used to explore changes in mRNP protein composition between the female gametocyte and the zygote.

      The paper is generally well written. The authors present a lot of data (also in comparison with published data). Sometimes perhaps the main message could be simplified / streamlined in section titles (Quantitative Proteomics by DIA-MS is not very informative. The outcome of the analysis would be more telling).

      Response: We have revised section headers to clarify the content.

      A considerable proportion of the DIA mass-spec proteomics results section is very technical. The paper describes a biological phenomenon rather than a technical mass-spec advance. Can these technical details be moved to the methods section?

      Response: As this is one of the first published instances of using DIA-MS to Plasmodium, we want to keep this information in the main text to help our community adopt these approaches. While these details are highly technical, they are also some of the major advances of this project.

      On the other hand, a bit more detail could be provided in the main text. For example, the age of the zygotes is never mentioned. This is important, please add this. The main manuscript text has 16 mentions of the word "many". As the authors are in possession of the data, please provide, if missing, (in parenthesis) the absolute numbers, maybe in an "x out y" format. Please clearly state the number of biological and/or technical replicates used for transcriptome and proteome analyses in the main text, figures and/or figure legends. How many protein coding genes are encoded in the P. yoelii genome?

      Response: Several of these requested details are noted in the materials and methods. We have added this information to the main manuscript now as well. We have also revised the manuscript to replace some instances of “many” with specific numbers unless it adversely impacted the flow of the sentence to do so.

      The authors claim that only zygotes (fertilized females) have surface-exposed Pys25 (a surface protein they use to affinity-purify zygotes) but not gametocytes. I could not find the experimental data for this in the paper. The cited reference #22 also does not appear to show this. In Figure 2C Pys25 is shown to be translated in gametocytes. In this context it may be important to note that in the related P. berghei the related protein P28 is expressed even in the absence of fertilization (Billker 2004; DOI: 10.1016/s0092-8674(04)00449-0). It may not be relevant whether translation requires fertilization, but the authors claim it affects trafficking of the Pys25 protein to the surface, so it needs to be shown. A reference to an infertile P. yoelii line would be great.

      Response: We have corrected the reference supporting the surface exposure of p25 on zygotes. The observation by Billker and colleagues about Pbs28 is also of interest, but outside of the scope of this study as we did not investigate the fertilization event itself here.

      It is highly commendable that all data is provided throughout the manuscript. For readability, may I suggest that the authors add labels to individual sheets within an excel file from A to Z, and do so also within the manuscript. That would really help; the most relevant data sets could then be identified quickly. For example, line 184 refers to 276 zygote proteins in which sheet of which table?

      Response: While this labeling system would also be effective, we have provided a README tab for our files that quickly directs the reader to the relevant tab (as we do for our previous publications).

      Section 176 onwards: here the authors combine P. falciparum and P. yoelii proteomics data. Please explain why you excluded any of the available P. berghei proteome data such as the male and female gametocyte proteome? The same question applies to 294 onwards.

      Response*: We compared our datasets with those of Lasonder et al. NAR 2016 because that study was also focused on translational repression of mRNAs and provided both RNA-seq and proteomic datasets of female gametocytes (although not of zygotes). *

      The comparative transcriptome-proteome analysis arrives at 198 translationally repressed mRNAs. Could the authors provide one or two alternatives using less stringent parameters? The list in P. falciparum and P. berghei is considerably larger (500+ and 700+).

      Response: We could have reduced the stringency of our thresholds to arrive at a far larger number, but prefer to retain higher confidence in those we are scoring as translationally repressed and then released for translation. We provide all of the pertinent data in the supplemental files if readers would like to adjust these thresholds to see which additional mRNAs may also be regulated.

      The turboID data is informative but somewhat speculative in regard to spatial rearrangements within these mRNPs. Figure 6 presents the RNA helicase to bind the 5' end of mRNAs that are associated with polyribosomes and I assume being translated. Is this association realistic? The RNA helicase DOZI homolog of yeast (Dhh1) is also involved in decapping. Response: We provide Figure 6 as our working model of how the reorganization of the DOZI/CITH/ALBA complex could occur based on available data from this study and others. Future studies are warranted to determine if DOZI remains associated with monosomes vs. polysomes, but current data indicate that DOZI can bind to eIF4E when translational repression is not imposed.

      Specific comments:

      title Is global the appropriate word? Some transcripts appear to be translated later.

      Response: We believe it does apply appropriately to these data.

      Line 30/32 Please re-phrase the sentence. There is: Cell Host Microbe 2012 Jul 19;12(1):9-19. doi: 10.1016/j.chom.2012.05.014.

      Response: We conclude that the sentence is correct as written, even in considering Sebastian et al. Cell Host & Microbe 2012.

      30 Perhaps add ookinete that establishes infection rather than the zygote. For a general readership, a brief description of the sexual life cycle might be useful

      Response: It is not possible to get into these nuances in the Abstract. This information is covered in the main text and the works that are cited.

      32 DOZI/CITH/ALBA complex would require some explanation for a more general reader

      Response: It is not possible to get into these nuances in the Abstract. This information is covered in the main text and the works that are cited.

      36-37 I believe zygotes were collected 6 hours after fertilization. Does that qualify as soon after fertilization? Motile ookinetes are generated within 20 hours and motility can be seen before that.

      Response: Yes, we think this qualifies as the process is not synchronous, but relies on when male gametes encounter and fuse with female gametes.

      37 Essential functions for what?

      Response: It is not possible to get into these nuances in the Abstract. This information is covered in the main text and the works that are cited.

      39 Is the spatial arrangement of this mRNP known?

      Response*: Some interactions of members of this complex were known (DOZI with eIF4E, ALBA4 with PABP1), but not the overall spatial arrangement. These findings are novel to this study. *

      40 Can you briefly allude to the "recent, paradigm-shifting models of translational control"

      Response: It is not possible to get into these nuances in the Abstract. This information is covered in the main text and the works that are cited.

      44 Products = mRNA

      Response: We have stated it as products because the maternal cell provides more than just mRNAs that are essential to further development post-fertilization.

      45 Oocyte in metazoans ?

      Response: Yes, this is the correct term. The context here is in higher eukaryotes.

      60/62 Please re-phrase the sentence. There is: Cell Host Microbe 2012 Jul 19;12(1):9-19. doi: 10.1016/j.chom.2012.05.014.

      Response: We conclude that the sentence is correct as written, even in considering Sebastian et al. Cell Host & Microbe 2012.

      81 PbDozi Plasmodium berghei DOZI

      Response: We have added this clarifying text here as suggested.

      84/85 Please rephrase and cite Nucleic Acids Res. 2008 Mar;36(4):1176-86. doi: 10.1093/nar/gkm1142. Epub 2007 Dec 23. and Cell Host Microbe 2012 Jul 19;12(1):9-19. doi: 10.1016/j.chom.2012.05.014.

      Response: As noted above for other comments, we hold that the current phrasing is accurate even when considering these important publications.

      88 Please define the timepoints throughout this manuscript. What age are the zygotes? How many hours post-induction? Please define the time for ookinete development somewhere in the introduction

      Response: The timepoint used for zygote collection is now included in the main text in addition to its previous inclusion in the Materials and Methods section. As we have not studied the ookinete stage here, we have opted to keep the introduction focused on the key details for this study.

      104 Please add the age (in hours) of these zygotes from the time of starting the in vitro cultures. From the methods section it looks like 6 hours.

      Response: The timepoint used for zygote collection is now included in the main text in addition to its previous inclusion in the Materials and Methods section.

      103/105 I can find no evidence for P25 (Pys25) expression relying on fertilization in the cited paper (22). The SOM has no reference to Pys25 either. Please show data or reference published data that there is no translation and trafficking of Pys25 in unfertilized female gametes, ie those that are placed in ookinete medium. In this respect it may be important to note that unfertilized Plasmodium berghei females placed in ookinete medium translate P28, the P25 paralog (https://www.sciencedirect.com/science/article/pii/S0092867404004490?via%3Dihub)

      Response: We have corrected the reference supporting the surface exposure of p25 on zygotes. The observation by Billker and colleagues about Pbs28 is also of interest, but outside of the scope of this study as we did not investigate the fertilization event itself here.

      104 What cell line was used for the zygotes?

      Response*: The PyApiAP2-O::GFP transgenic parasite line was used here. These details are included in the manuscript and supporting information. *

      114 The number of transcripts detected in gametocytes is quite small compared to the twice as large proteomics dataset. See for example also Lasonder 2016 for P. falciparum detected transcripts: 4477 different sense transcripts were identified, 98% of which were shared between MG and FG.

      Response: Yes, the number of mRNAs or proteins scored as detected differs based on thresholds applied. We prefer to err on the side of higher stringency as noted above.

      117 Does the 194 up-in-gametocytes dataset include the 81 not found in zygotes?

      Response: No, these 194 are detected in both datasets, but are more abundant in gametocytes than zygotes.

      117 Could you indicate some of the genes in the plot?

      Response: Several hits of special note are described in the text. We have opted to keep the figure clear and streamlined.

      Fig1 How were the upregulated transcripts identified? 1647 are shown to be specific to zygotes in 1B, yet only 685 are shown in 1C to be upregulated. Do the transcripts found exclusively in zygotes not count? Are these transcripts likely the result of de novo transcription? How old are these zygotes when the libraries are made?

      Response: The details of the RNA-seq processing are provided in the MakeFile, the supplementary tables, and the manuscript. The README tab provides descriptions of what processing occurred between sequential tabs. As noted above, zygotes were collected at 6 hours.

      132 Many? How many? Please provide a precise number.

      Response: These details are now in the revised manuscript.

      134 Please explain why p28 would be differentially abundant in the zygote rather than the female gametocyte. That would require de novo transcription of this gene. If there is experimental evidence for the de novo transcription of p28 and other translationally repressed transcripts in the zygote please cite the references. Can you name a few more examples here? P25 for example, ap2-o, or anything published and experimentally validated. What about AP2-o and AP2-Z? Both are known to be translationally repressed.

      Response: We state in the original manuscript that there is not a significantly different mRNA abundance of pys28.

      139 Please define how many members of the IMC?

      Response*: We have now replaced “many” with the number of IMC members we have detected, which is also shown in supporting tables. *

      156 Can you provide a number of how many parasites were used in total or per run. And how many biological and technical replicates were analysed?

      Response: These details are provided in the Materials and Methods.

      169 The number of proteins detected in the gametocyte sample is twice the size of transcripts. IS this to be expected?

      Response*: This reflects the sensitivity of the assays run for transcriptomics and proteomics. *

      170 How many samples were analyzed? One gametocyte and one zygote sample?

      Response: Yes, for the creation of the DIA-MS spectral library, a single biological replicate was used in addition to in silico library approaches. This information is provided in the next sentence.

      176 Why did you not include P. berghei in the meta-analysis?

      Response: We compared these results to all of the published Plasmodium proteomes in PlasmoDB.

      184 Please refer to an excel table here.

      Response: We have pointed to the relevant supporting files in this section.

      184 145 proteins: do you mean orthologs in general or orthologs with a gene/protein annotation other than unknown function?

      Response: We use the standard form of ortholog throughout the manuscript.

      190 142 proteins: do they all have orthologs in P. falciparum?

      Response: No, not all proteins in our dataset have unambiguous orthologues in P. falciparum, and this is accounted for in our data processing approaches.

      Figure 2C P25 is not exclusive to zygotes here and also found in the gametocyte sample.

      Response: That is correct. It is known that p25 is expressed in female gametocytes, but that the localization changes in the zygote.

      190 shortlist

      Response: The spelling of “short list” as two words is an appropriate American spelling of this term.

      219 onwards Does the list of 198 transcripts exclusively arise from your RNAseq and proteomics comparison? Or does it include falciparum data as outline in section 176 onwards, ie the list of 276 proteins that only are detected in zygotes?

      Response: Yes, this list of 198 mRNAs is derived from our datasets only using our defined thresholds. The details of this are provided in the manuscript.

      224 Early zygote? At 6 hours do the parasites not start to transform, elongate?

      Response: This process is not synchronous, as it is affected by the timing of gamete fusion.

      225 >5-fold. Is this an arbitrary decision?

      Response: This threshold has been used by our group and others in prior studies, and was partially informed by the behavior of previously characterized transcripts.

      227 1417 mRNAs: they are from which dataset?

      Response: These are from our datasets with P. yoelii, as described in the manuscript.

      228/229 Please explain why DOZI and CITH are in the list of 198 repressed transcripts? They are present in the gametocyte. Are they upregulated>5 fold?

      Response: Yes, they meet our criteria for this regulation, and in the manuscript we note that we believe that they are self-regulated and likely have continuing roles in early mosquito stage development.

      259 ... as they are already translated in the gametocyte?

      Response: Yes. Translational repression allows for the existence of some of the protein in the initial timepoint. This differs from translational silencing which does not.

      295 Is this from the 198 TR list S4?

      Response: No. Transcripts that remain repressed would not be in the list of 198, as the protein was not detected in zygotes.

      294 onwards How many putatively falciparum transcripts are there? How many were identified in P. berghei? How many are common to all? A Venn diagram perhaps to compare the different studies

      Response: There is substantial overlap between the species with respect to the presence of syntenic orthologues in this dataset. However, because we did not conduct experiments with P. falciparum or P. berghei here, we do not want to make claims that they are similarly regulated or potentially have a reader misinterpret a figure to that effect.

      301 How many transcripts were found associated with Plasmodium berghei DOZI and/or CITH in female gametocytes? How many of those were abundantly detected as protein in zygotes, or had no difference in protein abundance between gametocytes and zygotes, or even greater abundance in female gametocytes?

      Response: These details are now provided in the revised manuscript.

      303/305 Please indicate the numbers of translationally repressed transcripts identified for P. falciparum and berghei.

      Response: These data are provided in Supporting Information Table 4.

      317/319 Please add the promoter used for tid-GFP

      Response: We have now added this information to the Materials and Methods.

      320 Please elaborate on the spatial organization of the DCA complex.

      Response: This has not been previously characterized, and this entire section is dedicated to the experimental data and interpretations of how the DOZI/CITH/ALBA complex may be organized.

      321/322 Have precise binding sites of DOZI and ALBA4 really been shown experimentally in the cited papers? In relation to 5' and 3' ends of the mRNA? Please cite Braks et al. paper.

      Response: Yes. The association of DOZI with eIF4E and ALBA4 with PABP1 are established in the literature, in some cases by multiple independent laboratories. The Braks publication does not address the binding of these proteins, and thus is not cited.

      323 What is the first generation BioID enzyme? BirA*

      Response: Yes. The first generation enzyme is called BirA*

      323 Please cite relevant Kyle Roux and Alice Ting for the original enzymes

      Response: We have now added these citations to this sentence.

      327 Could you show images of ALBA4::TurboID::GFP, DOZI::TurboID::GFP and cytosolic (free) TurboID? Perhaps stained with fluorescently labelled streptavidin and / or against GFP? In the gametocyte and zygote samples?

      Response: We attempted to stain with monoclonal antibodies that are reactive against biotin and there was insufficient specificity, hence why such data is not included. We conclude that all of the other data that supports this approach suffices to demonstrate its rigor.

      331 What is the age of these zygotes? Where they affinity purified?

      Response: As throughout the manuscript, zygotes were collected at 6 hours. Details of experimental purifications are provided in the materials and methods.

      Fig S4 Please indicate whether ALBA4 and DOZI were tagged endogenously

      Response: Yes. The endogenous loci for both ALBA4 and DOZI were modified to include the C-terminal TurboID and GFP tags.

      421/430 Please add a few references here

      Response: We do not believe that specific references are warranted for these general statements.

      429 translational repression?

      Response: Yes. These statements set the stage for the use of translational repression.

      445 966 proteins in gallinaceum? The zygote cultures in that study were 2-3 hours. How old were the cultures in your study?

      Response: As throughout the manuscript, zygotes were collected at 6 hours.

      481 Please explain / cite why repression is energetically costly.

      Response: These details are provided in both the introduction and discussion sections. The energetic cost of translational repression is both the cost to produce the transcripts without immediately/fully utilizing it for translation, in addition to the energetic cost to impose the regulation.

      501 Please add the time-point of RNA and protein sampling. How many hours into ookinete development? What is the time from cardiac puncture through FACS sampling of gametocytes.

      Response: We have provided all of these details in the materials and methods for female gametocytes and zygotes. We did not look at ookinetes in this study.

      711/713 Do you have any images that show the successful purification of zygotes away from gametocytes? Secondly, please provide a reference for the statement that unfertilized female gametocyte do not express surface exposed Pys25.

      Response*: We do not have captured images of these zygotes, but confirmed them during collection using microscopy. The reference for surface exposure of Pbs25 is now provided earlier in the manuscript as well. *

      711/716 Were parasites lysed and mechanically homogenised?

      Response: We have provided all of these details in the materials and methods for female gametocytes and zygotes.

      Figure 6 What is the evidence that DOZI stays associated with mRNA that is being translated? Rather than mRNA that is being decapped. Please add the references that unequivocally show that DOZI and ALBA4 bind to opposite ends of repressed mRNAs.

      Response: This is our working model of these data. It is feasible that these complexes could form off of mRNA as well. Publications describing the interactions of DOZI with eIF4E and ALBA4 with PABP1 are provided in the manuscript. It is well established that eIF4E binds to the m7G cap of the 5’ end of mRNAs, and PABP1 binds to the poly(A) tail at the 3’ end of mRNAs.

      Reviewer #1 (Significance (Required)):

      The experiments in the manuscript are carefully conducted. Apart from a P. gallinaceum study from 2009 this is the first comprehensive analysis of the transcriptome and proteome of a Plasmodium zygote (developing ookinete) at 6 hours post-fertilization. The data are used to explore the temporal aspect of activation of translation during the first quarter of the 20-24 hour ookinete developmental period. The study will be of interest to the field, specifically those scientists working to understand translational control, ookinete development, and those developing intervention strategies to prevent mosquito infection and thus malaria transmission.

      Response: We appreciate Reviewer 1’s extensive feedback and positive remarks about the significance of our study. We have revised our manuscript to reflect this constructive feedback.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Main findings

      Taking a multi-omic approach, the authors provide quantitative evidence for translation repression of ~200 mRNAs in Plasmodium yoelii female gametocytes. These mRNAs are then translated, and proteins detected by 6 hours after activating gametocytes. They accomplish this by performing a comparative global analysis of the transcriptome and proteome between female gametocytes and early zygotes that provides an intresting resource. The authors also use proximity labelling of the DOZI/CITH/ALBA4 repression complex, and these data suggest the complex may disassemble in the zygote or change its composition.

      Major points

      Line 181-184: The authors state that there is no evidence of how the DCA complex selects specific mRNAs for translation repression. While the exact mechanisms have not been fully elucidated, Braks et al (2008, doi:10.1093/nar/gkm1142) suggested a role of the untranslated regions (UTRs) in translation repression of transcripts in Plasmodium berghei female gametocytes. They identified a uridine-rich 47-base element in the 5'UTR and or 3'UTR that was associated with translationally repressed transcripts and validated it experimentally. Considering this finding, I would recommend an amendment of the statement and to include the earlier work. I would also like to see additional analysis to check if this U-rich motif or other motifs are associated with the translationally repressed transcripts identified in the current study. The current study should be better powered to conduct such an analysis.

      Response: We have now added a comment and citation in the revised text about this study in Lines 86-88. Understanding the full importance of this element is challenging, as the Plasmodium transcriptome is highly enriched in A’s and U’s due to the highly skewed A/T content of its genome. Perhaps for this reason, we did not see an association of this motif with the identified mRNAs.

      The authors used zygotes that expressed GFP tagged AP2-O, however, there is no explanation of the significance of using this line.

      Response: This line is described in the Materials and Methods and supporting information. It was used to provide further validation of the production of zygotes.

      Minor points

      In line 106-107, the authors refer to figure SI, this figure is about genomic locus and genotyping PCR for the PyApiAP2-O::GFP parasites but there is no intext description of why this specific line was used.

      Response: We have provided this information in the revised manuscript.

      Statement in line 122-124 "It is likely that....." should go into the discussion not results.

      Response: We have placed this single sentence immediately after presenting these data here to aid reader comprehension.

      Statement in line 171-175: "In addition to providing confirmatory...." Should be in the discussion not on the results.

      Response: We view this sentence as a concluding remark of this section of data that also places this information in context for the reader.

      In Fig. 4 A and B, could the colour scheme be changed so that the proteins that are not in both samples (and probably contain many unspecifically detected proteins) appear less prominent?

      Response: We appreciate this suggestion and have adjusted these plots accordingly in the revised manuscript.

      Reviewer #3 (Significance (Required)):

      Why is the paper interesting. Translation repression of mRNA at a global level in the female gametocytes has been studied previously in rodent malaria parasites investigated, but prior to the current study, the release of mRNA from translation repression in the mosquito stages has only been demonstrated for specific transcripts. By characterizing and quantitating changes in protein abundance between macrogamete and zygote, coupled with transcriptomic analysis, the current work broadens our understanding of zygotic translation activation that is key to successful malaria parasite transmission to the mosquito.

      This dataset provides a useful resource for the Plasmodium research community as it provides a more comprehensive view of how transcripts behave during the transitions from the mammalian host to the vector. It is one step in a broader endeavour towards finding genes crucial for parasite transmission that could be targeted for interventions.

      How translational repression and derepression is regulated remains unknown, although some of the molecular players have been identified. This paper shows proximity labelling and expansion microscopy data of the ribonuclear protein complex thought to mediate repression. Although the specific mechanistic insights provided by the experiments shown here remain relatively limited, the work demonstrates interesting new avenues for how translational derepression in Plasmodium can be studied.

      Response: We also appreciate Reviewer 3’s excellent feedback and positive remarks about the significance of our study. The revised manuscript addresses these comments, and we believe it is further strengthened because of it.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank all the reviewers for their constructive and critical comments. We provide a point-by-point response to the reviewers' comments, as detailed below. By responding to them, we believe that our revised manuscript will significantly improve so that it will be of interest for researchers in the field of cell biology, signaling pathways, physiology and nutrition.

      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary: The manuscript by Yusuke Toyoda and co-workers describes that the phosphorylation of the a-arrestin Aly3 downstream of TORC2 and GAD8 (AKT) negatively regulates endocytosis of the hexose transporter Ght5 in S.pombe under glucose limiting growth conditions.

      To arrive at these conclusions, the researchers define a set of redundant c-terminal phosphorylation sites in Aly3 that are downstream by GAD8. Phosphorylation of these sites reduces Ght5 ubiquitination and endocytosis. For ubiquitination, Aly3 interacts with the ubiquitin ligases Pub1/3.

      We thank the reviewer for his/her time and reporting advantages and issues of this study.

      Major points:

      Figure 3B: it would be interesting to compare Aly3 migration pattern (and hence potential phosphorylation) under glucose replete or limiting growth conditions. Can the authors provide direct evidence that Aly3 phosphorylation changes in response to glucose availability? Also please explain the 'smear' in lanes aly3(4th Ala), aly3(4th Ala, A584S), aly3(4th Ala, A586T).

      While it is an interesting possibility that the Aly3 migration pattern changes in response to glucose concentrations in medium, we think that this is unlikely and that examining this possibility is beyond the scope of this study. Because a phospho-proteomics study reported by Dr. Paul Nurse's lab showed Tor1-dependent phosphorylation of Aly3 at S584 under high glucose (2%) conditions (Mak et al, EMBO J, 2021), the Aly3 phosphorylation (migration) pattern is likely to be constant regardless of glucose conditions. Glucose conditions affect the mRNA and protein levels of Ght5, but supposedly not its endocytosis to vacuoles (Saitoh et al, Mol Biol Cell, 2015; Toyoda et al, J Cell Sci, 2021).

      As for the smear in Aly3(4th A), Aly3(4th A;A584S), Aly3(4th A; A586T), we suspect that some posttranslational modification occurs on these mutant Aly3 proteins, but the identity of the modification is unclear. We did not mention the smear signals in the original manuscript, because the presence or absence of the smear did not necessarily correlate with cell proliferation in low glucose and thus vacuolar localization of Ght5, which is the main topic of this study. In the revised manuscript, we will mention this point more clearly.

      Figure 4: Ght5 localization should be analyzed + / - thiamine and in media with different glucose levels. Also, a co-localization with a vacuolar marker (FM4-64) would be nice (but not necessary). Ideally, the authors should add WB analysis of Ght5 turnover to complement the imaging data. Also, would it be possible to measure directly the effects on glucose uptake (using eg: 2-NBDG).

      In this revision, we plan to observe Ght5 localization under the conditions indicated by the reviewer (+/- thiamine and high/low glucose levels) to unambiguously show that the vacuolar localization of Ght5 occurs in a manner dependent solely on expression of the mutant Aly3 protein.

      We thank the reviewer for the suggestion of co-staining with FM4-64. Indeed, because we previously reported that the cytoplasmic Ght5 signals were surrounded by FM4-64 signals in the TORC2-deficient tor1Δ mutant cells (Toyoda et al, J Cell Sci, 2021), the cytoplasmic Ght5-GFP signals in Figure 4 are very likely to co-localize with vacuoles. We will modify the text to clarify this point.

      As suggested, we plan to add Western blot analysis of Ght5 turnover in Aly3-expressing cells, to complement the imaging data (Figure 4) in the revised manuscript. Persistent appearance of GFP in Western blot would be a good support for vacuolar transport of Ght5-GFP.

      While regulation of glucose uptake is an important issue, measurement of Ght5-dependent glucose uptake using 2-NBDG was very difficult in our hands. Another reviewer (Reviewer #2) also mentioned the difficulty of this measurement in the Referees cross-commenting section.

      Figure 5: Given the localization of Ght5 shown in Figure 4, I'm surprised that it is possible in to detect full length Ght5, and its ubiquitination in the phospho-mutants of Aly3. I expected that the majority of Ght5 would be constitutively degraded, and that one would need to prevent endocytosis and/or vacuolar degradation to detect full length Ght5 and ubiquitination. Please explain the discrepancy. Also it seems that the quantification in B was performed on a single experiment.

      As the aim of Figure 5 is to compare the ubiquitinated species of Ght5 among the samples expressing different species of Aly3, the loading amount of each sample was adjusted so that the abundance of immunoprecipitated Ght5 is same across them. Therefore, as the reviewer points out, before the adjustment, abundance of the full-length Ght5 might be different in these samples. In the revised manuscript, we will add explanation on this point; why the anti-GFP blot of Figure 5A has the similar intensities in those samples.

      In the revised manuscript, we will add two additional replicates of the same experiment as Figure 5 in Supplementary material to show reproducibility of the result.

      Figure 6: Which PPxY motif of Aly3 is used for interaction with Pub1/3 and does their interaction depend on (de)phosphorylation?

      In the revised manuscript, we will discuss that "both PY motifs of Aly3 might be required for full interaction with Pub1/3," by citing the following published knowledge:

      (a) Mutation of both PPxY motif of budding yeast Rod1 and Rog3 (Aly3 homologs) diminished their interaction with the ubiquitin ligase Rsp5 (Andoh et al, FEBS Lett, 2002).

      (b) Mutating either one of two PPxY motifs of budding yeast Cvs7/Art1 greatly decreased interaction with WW domain, and mutating both abolished the interaction (Lin et al, Cell, 2008).

      Our preliminary results indicated that Pub3 interacted with Aly3, Aly3(4th A) and phospho-mimetic Aly3(4th D), and thus suggested that the Aly3-Pub1/3 interaction does not depend on the phosphorylation status of Aly3. Consistently, budding yeast Rod1 reportedly interacts with Rsp5 regardless of its phosphorylation status (e.g. Becuwe et al, J Cell Biol, 2012). While we have partially mentioned this point in the original manuscript (L499-503), we will discuss this point more clearly in the revised manuscript.

      Reviewer #1 (Significance):

      The results are well presented and clear cut (with few exceptions, please see major points). They provide further evidence that metabolic cues instruct the phosphorylation of a-arrestins. Phosphorylation then negatively regulates a-arrestin function in selective endocytosis and is essential to adjust nutrient uptake across the plasma membrane to the given biological context.

      We thank the reviewer for finding significance of our study. We believe that adding new results of the requested experiments and responding to the raised comments will clarify the significance of our revised manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity):

      **Summary / background. This paper focuses on the regulation of endocytosis of the hexose transporter, Ght5, in S. pombe by nutrient limitation through the arrestin-like protein Aly3. Ght5 is induced when glucose is limiting and is required for growth and proliferation in these conditions. ght5+ encodes the only high-affinity glc transporter from fission yeast. ght5+ is induced in low glucose conditions at the transcriptional level and is translocated to the plasma membrane to allow glc import. Ght5 is targeted to the vacuole in conditions of N limitation. Mutations in the TORC2 pathway lead to the same process, thus preventing growth on low glucose medium, as shown in the gad8ts mutant, mutated for the Gad8 kinase acting downstream of TORC2. Previously, the authors demonstrated that the vacuolar delivery of Ght5 in the gad8ts mutant is suppressed by mutation of the arrestin-like protein Aly3. Arrestin-like proteins are in charge of recognising and ubiquitinating plasma membrane proteins to direct their vacuolar targeting by the endocytosis pathway. This suggested that Aly3 is hyperactive in TORC2 mutants, and accordingly, Ght5 ubiquitination was increased in gad8ts.

      **Overall statement This study aims at deepening our understanding of the regulation of endocytosis by signalling pathways through arrestin-like proteins. Ght5 is a nice model to study a physiological regulation, and the authors have a great set of tools at hand. However, I think the conclusions are not always rigorous and the conclusions are sometimes far-reaching. The main problem is that much of the conclusions concern a potential phosphorylation of Aly3 which is not experimentally addressed. An additional issue is the fact that they look at Ght5 ubiquitination by co-immunoprecipitation in native conditions (or at least, it seems to me) which cannot be conclusive. Overall, I think some experiments should be performed to address (at least) these 2 points before the manuscript can be published, see detailed comments below.

      We thank the reviewer for pointing both advantages and issues of our manuscript.

      We admit that phosphorylation of Aly3 was not experimentally shown in our manuscript, although its phosphorylation has already been shown in phospho-proteomic studies by other groups. For this issue, we plan to add an experiment and modify the text, as explained below.

      The other major issue raised by this reviewer is that detection of Ght5 ubiquitination by immunoprecipitation in a native condition cannot be conclusive. Although we noticed that many studies perform affinity purification after denaturing and precipitating proteins with TCA or acetone to detect ubiquitination of the affinity-purified protein (e.g. Lin et al, Cell, 2008), we disagree with this opinion of the reviewer #2. In a review article describing methods to study ubiquitination by immunoblotting (Emmerich and Cohen, Biochem Biophys Res Comm, 2015), affinity purification of the protein of interest in a native condition is mentioned as one major choice. Moreover, a denaturing condition was not applicable to detect ubiquitinated Ght5 because the Ght5 protein that is once denatured and precipitated with TCA cannot be re-solubilized for immune-purification and -blotting. As the reviewer points out, a pitfall of detection of ubiquitinated Ght5 in a native condition is the presence of co-immunoprecipitated proteins. In our previous study (Toyoda et al, J Cell Sci, 2021), we purified GFP-tagged Ght5 and showed that a 110 kDa band detected in an anti-Ub immunoblot was also recognized by an anti-GFP antibody, confirming that the detected 110 kDa band corresponded to an ubiquitinated species of Ght5, but not a co-immunoprecipitated protein. Similarly, in the revised manuscript, we will add a panel of high-contrast (over-exposed) anti-GFP immunoblot, in which the indicated 110 kDa band was clearly detected by an anti-GFP antibody, in Figure 5A.

      We appreciate these issues raised by the reviewer #2. By responding to them, we believe that conclusions of our study will be more rigorous and undoubtful in the revised manuscript.

      **Major statements and criticism.

      *Fig 1. Based on the hypothesis that TORC2-mediated phosphorylation regulate Ght5 endocytosis, the authors first considered a possible phosphorylation of Ght5. They mutagenised 11 **possible** phosphorylation sites on the Ct of Ght5, but none affected the growth on low glucose in the absence of thiamine, suggesting that they don't contribute to the observed TORC2-mediated regulation. However, I disagree with the statement that "phosphorylation of Ght5 is dispensable for cell proliferation in low glucose", given that the authors do not show 1- that Ght5 is phosphorylated and 2-that this is abolished by these mutations. They should either provide data on this or tone down and say that these residues are not involved in the regulation, without implying phosphorylation which is not proven.

      Although we did not experimentally test whether these 11 residues of Ght5 was phosphorylated in our hand, these residues have been shown to be phosphorylated in phospho-proteomics studies by other groups (Kettenbach et al, Mol Cell Proteomics, 2015; Swaffer et al, Cell Rep, 2018; Tay et al, Cell Rep, 2019; Halova et al, Open Biol, 2021; Mak et al, EMBO J, 2021). In the revised manuscript, we plan to be more precise by replacing this conclusion with the following statement: "11 Ser/Thr residues of Ght5, which are reportedly phosphorylated, are not essential for cell proliferation in low glucose."

      In the presence of Thiamine (Supp fig 1), it seems that the ST/A mutant grows better in low glucose, and this is not explained nor commented. Since the transporter is not expressed, could the authors provide an explanation to this? If the promoter is leaky and some ght5-ST/A is expressed, it may be more stable and allow better growth than the WT, which would tend to indicate that impairing phosphorylation prevents endocytosis (which is classical for many transporters, see the body of work on CK1-mediated phosphorylation of transporters). Have the authors tried to decrease glc concentration lower than 0.14% in the absence of thiamine to see if this also true when the transporters is strongly expressed? (OPTIONAL)

      Improved growth of Ght5(ST11A)-expressing cells in the presence of thiamine was mentioned in the legend of Supplementary Figure 1A. In the revised manuscript, we will mention this observation also in the main text for better description of the results.

      Adding thiamine to medium does not completely shut off transcription from the nmt1 promoter but allows some transcription, as previously reported (Maundrell, J Biol Chem, 1990; Forsburg, Nuc Acid Res, 1993). In the revised manuscript, we will mention this "leakiness" of the nmt1 promoter and, by citing the suggested studies, will discuss a possibility that the ST11A mutations might prevent endocytosis of Ght5 and consequently promote cell proliferation in low glucose conditions.

      We found that, in the absence of thiamine, cells expressing ght5+ and ght5(ST11A) proliferated to the comparable extent on medium containing 0.08% glucose. This result will be added to the revised manuscript.

      *Fig 2. The authors then follow the hypothesis that TORC2 exerts its Ght5-dependent regulation through the phosphorylation of Aly3. They mutagenised 18 **possible** phosphorylation sites on Aly3. This led to a strong defect in growth in low-glc medium. Mutation of the possible Gad8 site (S460) did not recapitulate this phenotype, suggesting that it is not sufficient, however, mutations of 4 ST residues in a CT cluster (582-586) mimicked the full 18ST/A mutation, suggesting these are the important residues for Ght5 endocytosis.

      We thank the reviewer for appreciating the results in Fig. 2. As we explain below, we plan to perform an additional experiment to show that the Aly3 C-terminus is phosphorylated. With this result, our model will gain another experimental support.

      *Fig 3A. Further dissection did not allow to pinpoint this regulation to a specific residue, beyond the dispensability of the T586 residue. Fig 3B. The authors look at the effects of mutation of Aly3 on these sites at the protein level. They had to develop an antibody because HA-epitope tagging did not lead to a functional protein (Supp fig 2). Whereas I agree that the mutations causing a phenotype lead to a change in the migration pattern, I disagree with the statement that "This observation indicated that slower migrating bands were phosphorylated species of Aly3" (p.9 l.271). First, lack of phosphorylation usually causes a slower mobility on gel, which is not clear to spot here. Second, a smear appears on top of the mutated proteins (eg. 4th Ala) which is possibly caused by another modification. There are many precedents in the literature about arrestins being ubiquitinated when they are not phosphorylated (see the work on Bul1, Rod1, Csr2 in baker's yeast from various labs). My gut feeling is that lack of phosphorylation unleashes Aly3 ubiquitination leading to change in pattern. All in all, it is impossible to state about the phosphorylation of a protein without addressing its phosphorylation properly by phosphatase treatment + change in migration, or MS/MS. Thus, whereas the data looks promising, this hypothesis that Aly3 is phosphorylated at the indicated sites is not properly demonstrated.

      We disagree with the reviewer's opinion that a lack of phosphorylation usually causes slower mobility on gel. There are many examples in which phosphorylation causes slower mobility on gel, including budding yeast Rod1 (Alvaro et al, Genetics, 2016), and mammalian TXNIP (Wu et al, Mol Cell, 2013). In the revised manuscript, we will cite these reports to support our interpretation that the slower migrating bands are likely phosphorylated species of Aly3 (L270-271).

      Smear-like signals in Aly3(4th Ala), Aly3(4th A;A584S) and Aly3(4th A;A586T) might result from some modification, but identity of the modification is unknown. As the reviewer #2 mentioned, phosphorylation on Aly3 might negatively regulate another modification. The precedent studies revealed that budding yeast Rod1 and Rog3 arrestins tend to be ubiquitinated in snf1/AMPK-deficient cells (Becuwe et al, J Cell Biol, 2012; O'Donnell et al, Mol Cell Biol, 2015), and that Bul1 arrestin is dephosphorylated and ubiquitinated in budding yeast cells deficient in Npr1 kinase (Merhi and Andre, Mol Cell Biol, 2012). Also, budding yeast Csr2 arrestin is deubiquitinated and phosphorylated upon glucose replenishment, while non-phosphorylated Csr2 is ubiquitinated and activated by Rsp5 (Hovsepian et al, J Cell Biol, 2012). While the smear-like signals are interesting, we noticed that the smear-like signals did not necessarily correlate with cell proliferation defects in low glucose. We therefore think that clarifying the identity of the smear-like signals is beyond the scope of this study. We will discuss the smear-like signals only briefly in the revised manuscript, and would address this issue in our future work, hopefully.

      While the 4 S/T residues at the C-terminus of Aly3 as well as the other 14 S/T residues have been already shown to be phosphorylated in the precedent studies (Kettenbach et al, Mol Cell Proteomics, 2015; Tay et al, Cell Rep, 2019; Halova et al, Open Biol, 2021), we will confirm that the slower migrating Aly3 is indeed phosphorylated by phosphatase treatment in the revised manuscript. This planned experiment will further strengthen our study and support our conclusion and model.

      *Fig 4. The authors now look at the functional consequences of these mutations on ALy3 on Ght5 localisation. The data clearly shows that mutation of the 4 identified S/T residues (Aly3-4th A) causes aberrant localisation of the transporter to the vacuole, likely to cause the observed growth defect on low glucose. There is a nice correlation between the vacuolar localisation and growth in low-glucose for the various aly3 mutants. (A final proof could be to express this in the context of an endocytic mutant, which should restore membrane localisation and suppress the aly3-4thA phenotype - OPTIONAL). However, I still disagree with the statement that "These results indicate that phosphorylation of Aly3 at the C-terminal 582nd, 584th, and/or 585th serine residues is required for cell-surface localization of Ght5." given that phosphorylation was not properly demonstrated.

      While phosphorylation of the 582nd, 584th and/or 585th serine residues of Aly3 is not experimentally demonstrated in our hands, they have been shown to be phosphorylated in phospho-proteomics studies by other groups (Kettenbach et al, Mol Cell Proteomics, 2015; Tay et al, Cell Rep, 2019; Halova et al, Open Biol, 2021; Mak et al, EMBO J, 2021). Among them, the 584th serine residue (S584) was reported to be phosphorylated in a TORC2-dependent manner (Mak et al, EMBO J, 2021), consistent with our model. To explicitly demonstrate that S584 is phosphorylated, we plan to make a strain expressing a mutant Aly3 protein in which all the possible phosphorylation sites except S584 are replaced with alanine, namely Aly3(ST17A;S584). Hopefully, we can properly show the phosphorylation of S584 by measuring the mobility of the Aly3(ST17A;S584) on gel with/without phosphatase treatment or gad8 mutation.

      We thank the reviewer for suggestion of the experiment using an endocytic mutant. Previously we reported that vacuolar localization of Ght5 in gad8 mutant cells was suppressed by mutations in not only aly3 but also genes encoding ESCRT complexes (Toyoda et al, J Cell Sci, 2021). We therefore think that in cells expressing Aly3(ST18A) or Aly3(4th Ala), Ght5 is subject to endocytosis and ensuing selective transport to vacuoles via endosome-localized ESCRT complexes. We will discuss this point in the revised manuscript.

      *Fig 5. Here, the authors question the role of Aly3 mutations on Ght5 ubiquitination. They immunoprecipitate Ght5 and address its ubiquitination status in various Aly3 mutants. The data is encouraging for a role in Aly3 phosphorylation (?) in the negative control of Ght5 ubiquitination. My main problem with this experiment is that it seems that Ght5 immunoprecipitations were made in non-denaturing conditions, which leads to the question of what is the anti-ubiquitin revealing here (Ght5 or a co-immunoprecipitated protein, for example Aly3 itself, or the Pub ligases, or an unknown protein). It seems that this protocol was previously used in their previous paper, but I stand by my conclusion that ubiquitination of a given protein can only be looked in denaturing conditions. The experiments should be repeated in buffers classical for the study of protein ubiquitination to be able to conclude unambiguously that we are looking at Ght5 ubiquitination itself, especially in the absence of a non-ubiquitinable form of Ght5 as a negative control. Could the authors comment on the fact that S-A or S-D mutations display the same phenotype regarding the possible Ght5 ubiquitination?

      As mentioned above, immunoprecipitation of Ght5 in denaturating conditions is not feasible. Ght5 can be affinity-purified only in a non-denaturing condition. In addition, affinity purification in a native condition is considered as a major choice to examine its ubiquitination according to a literature by Emmerich and Cohen (Emmerich and Cohen, Biochem Biophys Res Comm, 2015). A drawback of native condition is, as the reviewer points out, that the affinity-purified fraction might include non-bait (non-Ght5) proteins. The 110 kDa band indicated by an arrow in Fig. 5A was confirmed to be Ght5, not a non-bait protein, as a band at the identical position was detected in the immunoblot with anti-GFP antibody. Because this band in the anti-GFP immunoblot was too faint to be visible in Fig. 5A of the original manuscript, we will add an additional panel showing the contrast-enhanced anti-GFP immunoblot in which the 110 kDa band is clearly visible.

      As for the result that "S-A or S-D mutations display the same phenotype regarding the possible Ght5 ubiquitination," we are afraid that the reviewer #2 misunderstood the labels of the samples. We apologize for confusing notational system of the sample name. Full description of samples is as follows; In Aly3(4th A), all of S582, S584, S585 and T586 are replaced with A; In Aly3(4th A;A584S), S582, S585 and T586 are replaced with A, whereas S584 remains intact; In Aly3(4th A;A584D), S582, S585 and T586 are replaced with A, and S584 is replaced with phospho-mimetic D. Because cells expressing Aly3(4th A;A584S) and Aly3(4th A;A584D) exhibited similarly low levels of Ght5 ubiquitination, we speculated that phosphorylation at S584 of Aly3 negatively regulates ubiquitination of Ght5.

      In the revised manuscript, we plan to add a table showing amino acid sequence of each species of Aly3 (just like Figure 3A) to avoid confusion.

      *Fig 6. The authors want to document the model whereby Aly3 may interact with some of the Nedd4 ligases (Pub1/2/3) to mediate its Ght5-ubiquitination function. They actually use the Aly3-4thA mutant, it should have been better with the WT protein. But the results indicate a clear interaction with at least Pub1 and Pub3. By the way, are the Pub1/2/3 fusions functional? Nedd4 proteins are notoriously affected in their function by C-terminal tagging and are usually tagged at their N-terminus (See Dunn et al. J Cell Biol 2004).

      We plan to test whether Pub1-myc is functional by comparing proliferation of the Pub1-myc-expressing strain and pub1Δ strain, as pub1Δ cells reportedly show proliferation defects at a high temperature (Tamai and Shimoda, J Cell Sci, 2002). As deletion of pub2 or pub3 reportedly exhibited no obvious defects (Tamai and Shimoda, J Cell Sci, 2002; Hayles et al, Open Biol, 2013), it is not easy to assess functionality of the myc-tagged genes.

      Please note that C-terminally tagged Pub1/2/3 proteins have been widely used in studies with fission yeast. Both Pub1-HA and non-tagged Pub1 were reported to be ubiquitinated (Nefsky and Beach, EMBO J, 1996; Strachan et al, J Cell Sci, 2023). Pub1-GFP, which complemented the high temperature sensitivity of pub1Δ, localized to cell surface and cytoplasmic bodies (Tamai and Shimoda, J Cell Sci, 2002). Pub2-GFP, overexpression of which arrested cell growth just like overexpression of non-tagged Pub2, localized to cell surface, and consistently Pub2-HA was detected in membrane-enriched pellet fractions after ultracentrifugation (Tamai and Shimoda, J Cell Sci, 2002). They also reported ubiquitin conjugation of the HECT domain of Pub2 fused with myc epitope at its C-terminus. Pub3-GFP localized to cell surface (Matsuyama et al, Nat Biotech, 2006).

      Regardless of functionality of the myc-tagged Pub1/2/3, we believe that results of this experiment (Figure 6) support our model, because the aim of this experiment, which is to identify the HECT-type and WW-domain containing ubiquitin ligase(s) that interact with Aly3, is irrelevant to functionality of the myc-tagged Pub proteins.

      *Fig 7. The authors want to provide genetic interaction between the Pub ligases and the growth defects in low glc due to alterations in Ght5 trafficking. It is unclear how the gad8ts pub1∆ mutant was generated since it doesn't seem to grow on regular glc concentration (Supp fig 5), could the authors provide some information about this? It is also not clear whether it can be stated thatches mutant is "more sensitive" to glc depletion because of the low level of growth to begin with (even at 3%). Altogether, the data show that deletion of pub3+ is able to suppress the growth defect of the gad8ts mutant on low glc medium, suggesting it is the relevant ligase for Ght5 endocytosis. This is confirmed by microscopy observations of Ght5 localisation. However, I would again tone down the main conclusion, which I feel is far-reaching: "Combined with physical interaction data, these results strongly suggest that Aly3 recruits Pub3, but not Pub2, for ubiquitination of Ght5." Work on Rsp5 in baker's yeast has shown that Rsp5 function goes beyond cargo ubiquitination, including ubiquitination of arrestins (which is often required for their function as mentioned in the introduction) or other endocytic proteins (epsins, amphyphysin etc). I agree that the data are compatible with this model but there are other possible explanations. Anything that would block endocytosis would supposedly suppress the gad8ts phenotype.

      gad8ts pub1Δ was produced at 26 {degree sign}C, a permissive temperature of the gad8ts mutant. While this is described in the Methods section of the original manuscript, we will mention this more clearly in the Results section of the revised manuscript.

      We did not conclude low glucose sensitivity of gad8ts pub1Δ cells in the indicated part (L376-377). Rather, we compared proliferation of gad8ts single mutant and pub1Δ single mutant cells in low glucose, and we found that the pub1Δ single mutant exhibited the higher sensitivity. In the revised manuscript we will correct the text to clarify that we compared proliferation of two single mutants (but not gad8ts pub1Δ mutant).

      We agree with the opinion that the recruited Pub3 may ubiquitinate proteins other than Ght5. In the revised manuscript, we will correct our conclusion of the Figure 7 experiment (L388-390), not to limit the possible ubiquitination target(s) to Ght5.

      In a genetic screen, we found that mutations in aly3+ and genes encoding ESCRT complexes suppressed low-glucose sensitivity and vacuolar transport of Ght5 of gad8ts mutant cells (Toyoda et al, J Cell Sci, 2021). This finding appears consistent with the reviewer's opinion that blocking endocytosis would supposedly suppress the gad8ts phenotype. We will mention this point in the revised manuscript.

      *Discussion Some analogy with the regulation of the Bul arrestins by TORC1/Npr1 and PP2A/Sit4 could be mentioned (Mehri et al. 2012), at the discretion of the authors. The possibility that phosphorylation may neutralise a basic patch on Aly3 Ct, possibly involved in electrostatic interactions with Ght5 is very interesting. Regarding the effect of the mutations on Aly3 localisation (p.15 l.498), did the authors tag Aly3 with GFP? There are examples where proteins tagged with HA are not functional whereas tagging with GFP does not alter their function (eg. Rod1, Laussel et al. 2022) - and here Supp Fig 2 only relates to HA-tagging. Proof of a change in Aly3 localisation upon mutation would definitely be a plus (OPTIONAL).

      We thank the reviewer for the suggestion of a reference. In the revised manuscript, we will cite the indicated report in the corresponding part for an additional support of TORC1-mediated control of Aly3 (de)phosphorylation.

      While examining localization of Aly3 by GFP-tagging is interesting, we do not believe that it is necessary in this study. We would like to produce Aly3-GFP and to examine its functionality and localization in our future study. We thank the reviewer's insightful suggestion.

      **Minor comments.

      *Introduction: - I believe the text corresponding to the work on TXNIP is incorrect (p.5 l.127). TXNIP is degraded after its phosphorylation, not "rectracted" from the surface.

      In the revised manuscript, we will correct the text accordingly.

      • For the sake of completion, the authors could add other references concerning the regulation of Rod1 in budding yeast such as Becuwe et al. 2012 J Cell Biol and O'Donnell et al. 2015 Mol Cell Biol, in addition to Llopis-Torregrosa et al. 2016.

      In the revised manuscript, we will add the suggested references and correct the text in the corresponding part of the Introduction (L123-138).

      • Other examples of the requirement for arrestin ubiquitination beyond Art1 (p.5 l.136-137) are listed in the ref cited: Kahlhofer et al. 2021.

      We will cite the indicated review to navigate readers for more examples of arrestin ubiquitination (and transporter ubiquitination).

      *Figures: In general, I think it would be clearer if the authors showed on the figures that the background strain in which the XXX gene is added (or its mutant forms) is a xxx∆ strain.

      We will modify the figures to clearly show the genetic background of the strains used.

      **Referees cross-commenting**

      Cross review of Reviewer 1 - *I don't believe that the authors "define a set of redundant c-terminal phosphorylation sites in Aly3", because phosphorylation is not proven. *I thinks the points raised for Fig 3B are valid but the authors should focus on making their story conclusive before expanding to other data (except for the explanation of the smear, see my review). Also, I don't think 2NBDG actually works to measure Glc uptake. * same for Fig 6 - not sure the interaction site mapping between Aly3 and Pubs would bring much value since there are more urgent things to do to make the story solid.

      As mentioned above, we will experimentally show phosphorylation of the Aly3 C-terminus in the revised manuscript. Such experiments would make our story more solid and conclusive. We truly appreciate the comments and suggestions.

      We agree with the comments on difficulty of measuring glucose uptake using 2-NBDG. In fact, we tried and failed measuring Ght5-mediated glucose uptake using 2-NBDG.


      Cros review of Reviewer 3 - we have many overlaps, so briefly : *I agree that the bibliography is incomplete (mentioned in my review) *I agree that there is no demonstration of the phospho-status of Aly3, and it is a problem *I agree that the results can be better quantified, esp. in the light of the points raised by this referee concerning the variability of expression of ST18A Other specific comments : *I agree that the statement that dephosphorylation activates alpha-arresting should be toned down - this was observed in several instances but there are examples of arrestin-mediated endocytosis which does not require their prior dephosphorylation. *I fully agree that efforts could be made regarding the classification/nomenclature of arrestins in S. pombe, this had escaped my attention

      As detailed in the individual point raised by the reviewers, we will add the suggested references and accordingly correct the text in the revised manuscript.

      In addition to experimentally showing Aly3 phosphorylation, we will quantify the immunoblot result.

      Our statement that dephosphorylation activates alpha-arrestins might be too generalized. We will mention reports in which arrestin-mediated endocytosis does not require prior dephosphorylation (e.g. O'Donnell et al, Mol Biol Cell, 2010; Gournas et al, Mol Biol Cell, 2017; Savocco et al, PLoS Biol, 2019), and modify the text precisely.

      Reviewer #2 (Significance):

      *strengths and limitations This study aims at deepening our understanding of the regulation of endocytosis by signalling pathways through arrestin-like proteins in S. pombe. Ght5 is a nice model to study a physiological regulation, and the authors have a great set of tools at hand, including the discovery of Aly3 as the main arrestin for this regulation, and a signalling pathway (TORC2/Gad8) acting upstream. The main question is now to understand at the mechanistic level how TORC2 signaling impinges on the regulation of this arrestin.

      Overall, the authors nicely demonstrate that C-terminal Ser/Thr residues are crucial for the function of Aly3 in Ght5 endocytosis. They propose a model whereby Aly3 phosphorylation by an unknownn kinase inhibits its function on Ght5 ubiquitination, which would favour its endocytosis. However, I think the conclusions are not always rigorous and the conclusions are sometimes far-reaching. The main problem is that much of the conclusions concern a potential phosphorylation of Aly3 which is not experimentally addressed. An additional issue is the fact that they look at Ght5 ubiquitination by co-immunoprecipitation in native conditions (or at least, it seems to me) which cannot be conclusive. Overall, I think some experiments should be performed to address (at least) these 2 points before the manuscript can be published, see detailed comments above.

      *Advance

      This study, if completed carefully, would provide among the first examples of mapping of phosphorylation sites on arrestins, which are usually phosphorylated at many sites and are thus difficult to study. Few studies went down to this level in this respect (see Ivshov et al. eLife 2020). There are no changes in paradigms or new conceptual insights, but this work is a nice example of the conservation of these regulatory mechanisms.

      We appreciate that this study is highly evaluated by this reviewer. We understand the main problems raised by the reviewer, and as we detailed above, we plan to perform an experiment and make explanation to respond to the problems. With the raised issues answered, we believe that conclusions of the revised manuscript will be more rigorous.

      Our study reveals mechanisms regulating vacuolar transport of the Ght5 hexose transporter via the TORC2 pathway in fission yeast. The serine residues at the Aly3 C-terminus (582nd, 584th and 585th serine residues), which are probably phosphorylated in a manner dependent on the TORC2 pathway, are required for sustained Ght5 localization to cell surface and cellular adaptation to low glucose. To our knowledge, there is no such study, and thus we think that this study is novel. By responding to the reviewers' comments and adding new data as explained above, the revised manuscript will be able to present novelty of our study more clearly. Comparison of our study in fission yeast to related studies in other model organisms may reveal the conservation and diversity of these regulatory mechanisms.

      *Audience Should be of interest for people studying basic research in the field of cell biology, signalling pathways, transporter regulation by physiology. Reviewer background is on the regulation of transporter endocytosis by signalling pathways and arrestin-like proteins.

      Reviewer #3 (Evidence, reproducibility and clarity): (Authors' response in blue)

      In this manuscript, the authors work to address how phospho-regulation of a-arrestin Aly3 in S. pombe regulates the glucose transporter Ght5. The authors use a series of phospho-mutants in Aly3 and assess function of these mutants using growth assays and localization of Ght5. My main concerns with the manuscript are that 1) there is a lack of appreciation for the similar work that has been done in S. cerevisiae to define a-arrestin phospho-regulation, which is evidenced by the severe lack of referencing throughout the document, 2) the sites mutated on Aly3 are not demonstrated to change phospho-status of Aly3 and so all interpretations of these mutants need to be better contextualized and 3) almost none of the findings are quantified (imaging or immunoblots) making it difficult to assess the rigor of the outcomes. More detailed comments are provided below.

      We thank the reviewer for thorough reading of the manuscript and the detailed comments. As explained below, we will respond to the points raised by the reviewer and accordingly modify the manuscript.

      Minor Comments

      Immunoblotting or immunostaining to define the levels and localization of phospho-mutants - In Figure 1, an immunoblot or immunostaining to define the abundance/localization of WT Ght5 vs its ST11A mutant would be appreciated. It is very difficult to know if ST11A is as functional as WT or not without an assessment of the levels and localization of the WT and mutant proteins to accompany the spot assays. Perhaps a version of Ght5 that is a phospho-mimetic would be more useful here as well since that version should not be dephosphorylated and then presumably would be internalized and not allow for growth on low glucose medium.

      We plan to add fluorescence microscopy data of WT Ght5 and Ght5(ST11A) in the revised manuscript, to compare the localization and abundance of these two Ght5 species. In our preliminary observation, those of two Ght5 species seemed to be indistinguishable.

      We'd like to emphasize that the primary aim of this study is to reveal mechanisms regulating Ght5 localization and consequently ensuring cell proliferation in low glucose. While analyzing a phospho-mimetic Ght5 mutant (e.g. Ght5(ST11D)) is interesting in terms of understanding of the nature of Ght5, we believe that such an analysis is out of the scope on this study. As Ght5(ST11A)-expressing cells proliferated comparably to Ght5(WT)-expressing cells and WT and ST11A Ght5 indistinguishably localize on the cell surface, phosphorylation of the ST residues of Ght5 is not likely to be the primary mechanism regulating Ght5 localization and function. We would like to assess a phospho-mimetic Ght5 mutant protein in our future studies.

      For the Aly3 mutants where the abundance of Aly3 appears lower via immunoblotting (i.e., 4thA-A582S or S582A) how is the near perfect functional readout explained when the levels of the protein are much lower than WT? For the ST18A mutant, this is a particularly important point since the authors indicate on lines 194-197 that based on the functional data for ST18A, some of these ST residues are needed for phospho-regulation of Aly3. However, in Figure 3B the authors clearly show that there is very little ST18A protein in cells, and so these mutations have impacted Aly3 stability, which may or may not be linked to its phospho-status. The authors should be upfront about this finding on lines 194-197 and should not present this phospho-model as the only reason for why ST18A may not be functional. On lines 265-276 for the authors indicate that ST18A is expressed equivalently to WT Aly3, which is just not the case in Figure 3B. Perhaps quantification of replicate data would help clarify this issue. Further, if the authors wish to conclude that the upper MW bands in Figure 3B are due to phosphorylation, perhaps they should perform phosphatase treatments of their extracts to collapse these bands. However, most certainly the overall abundance of the single band for ST18A is reduced compared to the total bands of WT Aly3.

      We disagree with the opinion that the levels of the mutant Aly3 are much lower than WT. For semi-quantitative measurement of the protein abundance, 2-fold dilution series of the WT Aly3 sample were loaded in the leftmost 3 lanes of Figure 3B. Although the levels of Aly3(4th A;A582S), Aly3(S582A) and Aly3(ST18A) were lower than that of WT Aly3, those are 50% or more of the WT, judging from the intensities of the serially-diluted WT samples. To clearly show that the expression of these Aly3 proteins is within comparable levels, we plan to add a column chart of the quantified expression levels and to mention abundances of the Aly3 proteins more quantitatively in the revised text. We do not think that replicate data (of Western blots as in Figure 3B) helps clarify this issue, because nmt1 promoter-driven gene transcription is induced with a small variation (Forsburg, Nuc Acid Res, 1993). We will cite this report and mention this point in the revised text.

      We are afraid that this reviewer seems to consider that Aly3(ST18A) is not functional, but it is not a case and we do not intend to claim so. While deletion of aly3 did not interfere with cell proliferation in low glucose (see vector controls in Figures 2B, 2C and 3A, -Thiamine), expression of the ST18A mutant clearly hinders cell proliferation in low glucose, indicating that the ST18A performs dominant negative function to inhibit cell proliferation. That is, even though the expression level and/or stability of the ST18A is reduced, it is still sufficiently abundant to perform the dominant negative function. We propose the phospho-model not due to dysfunctionality of ST18A, but its dominant negative functionality. The 18 S/T residues of Aly3, which are shown to be phosphorylated in precedent phospho-proteomics studies, seem to be required to down-regulate Aly3's function to inhibit cell proliferation in low glucose. We apologize for this confusion, and we will modify the text and figures to clarify these points in the revised manuscripts.

      To obtain an experimental support for our description that the slower migrating bands in Figure 3B are due to phosphorylation, we plan to perform a phosphatase treatment experiment as suggested.

      Figure 2A - how do the phosphorylation sites identified in Aly3 compare to those identified in Rod1 from S. cerevisiae? See PMID 26920760 or SGD for more information. I am confused as to why the Aly3 protein has an arrowhead at the C-terminus. What does this denote?

      We will mention reported phosphorylation sites of Aly3 and budding yeast Rod1/Art4 in the revised manuscript, by referring to the indicated report and database. It should be noted that similarity between amino acid sequences of Aly3 and S. cerevisiae Rod1 is not so high and limited in Arrestin-N and -C domains. The C-terminal half of Aly3, in which most of the potential phosphorylation sites are found, is not similar to Rod1. Thus, these sites are unlikely to be conserved between them.

      An arrowhead indicates the direction of transcription (from N to C-terminus). We will describe it explicitly in the revised figure legend.

      Figure 2 - The WT and Aly3-ST18A are expressed in S. pombe from a non-endogenous locus under the control of the Nmt1 promoter. However, are these mutants present in cells that contain WT copies of Aly3 at other genomic loci? If so, this would surely muddy the interpretations of these data as a- and b-arrestins are capable of multimerizing and the effect of multimerization on their activities can vary.

      As mentioned in L188, an aly3 deletion mutant strain (aly3Δ) was used as a host, and thus all strains harboring an nmt1-driven aly3 gene lack the endogenous aly3 gene. We will add an illustration clearly showing that the host strain lacks the endogenous aly3+ gene and modify the legend of Figure 2.

      Functional readouts for Aly3 using Ght5 localization - The reduced surface levels of Ght5 does correspond to the spot assay growth in low glucose for the various Aly3 mutants used. However, it would be useful if these assays incorporated an endocytosis inhibitor to help prevent the activities of these Aly3 plasmids to see if the transporter is retained at the PM. At the end of these mutational analyses, the authors conclude that phosphorylation of Aly3 at any of 3 sites is required for Ght5 trafficking to the vacuole in low glucose, however no experiment is done to demonstrate that these sites are phosphorylated residues. A phosphatase assay would be useful to help demonstrate that the modifications in 3B really are phosphorylation and a quantification of the phosphorylated bands in these WBs would also be useful to solidify the statement made on lines 306-309.

      We thank the reviewer for suggestion of the experiment using an endocytosis inhibitor. Previously we reported that vacuolar localization of Ght5 in gad8ts mutant cells was suppressed by mutations in not only aly3 but also genes encoding ESCRT complexes (Toyoda et al, J Cell Sci, 2021). We therefore think that, in cells expressing Aly3(ST18A) or Aly3(4th Ala), Ght5 is subject to endocytosis and subsequent selective transport to vacuoles via ESCRT complexes. We will mention these previous findings in the revised manuscript.

      As mentioned in responses to the comments above and other reviewer's, we will perform a phosphatase treatment experiment and its quantification in the revised manuscript. Here, we'd like to emphasize that these 3 sites have been shown to be phosphorylated in phospho-proteomic studies by other researchers (Kettenbach et al, Mol Cell Proteomics, 2015; Tay et al, Cell Rep, 2019; Halova et al, Open Biol, 2021; Mak et al, EMBO J, 2021), although we do not show it directly in this study.

      Phosphorylation assessments - in general, it would be good to not only build the non-phosphorylatable versions of Aly3 but also the phospho-mimetic forms.

      We produced a phospho-mimetic mutant Aly3 (i.e. Aly3(4th A;A584D)), and showed the result in Figure 5A; cells expressing Aly3(4th A;A584D) exhibited a low ubiquitination of Ght5, similarly to Aly3(WT)- and Aly3(4th A;A584S)-expressing cells. According to our experiences, replacing S/T with D/E does not necessarily mimic phosphorylation. Thus, we do not believe that systematic production of phospho-mimetic Aly3 mutants would help achieve the aim of this study.

      Pub1, 2, and 3 - It would be helpful if the authors indicated what genes Pubs 1-3 correspond to in S. cerevisiae, where Rsp5 is the predominant Ub ligase interacting with a-arrestins. Is there no ortholog of Rsp5 in S. pombe?

      Pub1, Pub2 and Pub3 are regarded as orthologs of budding yeast Rsp5, according to the fission yeast database PomBase. We will perform a homology search for these E3 proteins, and based on the result, we will add a description in the revised manuscript.

      Pub-Aly3 interactions - could the authors please comment on the reason why so very little Aly3 is copurified with Pub1 or Pub2? Can any clear conclusion be drawn about pub2 given how very little Pub2 is present in the IPs? Based on my understanding of these data I do not think that this can be cleanly interpreted. What is is the identity of the ~50kDa MW band in Figure 6 in the upper MYC detection panel?

      We do not have an accurate answer for the result that a small amount of Aly3 is copurified with Pub1 or Pub3. The Pub1/3-Aly3 interaction may be weak or transient. We will discuss this point in the revised manuscript.

      Regarding whether Aly3 interacts with Pub2, we agree with the reviewer. As described in the Results (L360-362), we could not conclude anything about Aly3-Pub2 interaction by this immunoprecipitation experiment alone. On the other hand, the genetic interaction experiment (Figure 7A) suggests that pub2+ is not involved in defects caused by the gad8ts mutation (while pub3+ and aly3+ are). By this experiment, we think that Pub2 is not a partner of Aly3.

      In the revised manuscript, we will describe that Pub2 is not a partner of Aly3 in a paragraph describing the Figure 7A experiment.

      Because the 50 kDa band found in the IP fraction of all the samples appears even in "beads only" (Figure 6), those are supposedly derived from mouse IgG dissociated from the beads used for immunoprecipitation. We will mention this in the legend of Figure 6.

      Phosphorylation and ubiquitination of a-arrestins - The paragraph from lines 123-138 is very superficial in addressing what is known about phosphorylation and ubiquitination of a-arrestins. The way this section is written, it feels misleading to the reader as it omits many of the details for regulation that would help place the current study in context. The discussion of Rod1 phosphorylation by AMPK for example, which is directly relevant to this study, is underdeveloped. I would recommend splitting this into two paragraphs and providing a more in depth, and accurate, view of the literature on this topic, with a focus on the regulation that is relevant for the ortholog of Aly3 in S. cerevisiae. For example, Rod1 phosphorylation by AMPK is greatly expanded upon in the following papers (PMID 22249293 and 25547292) and AMPK regulation of C-tail phosphorylation of a-arrestins is defined further in PMID 26920760. These references are each particularly important to compare with the current findings presented in this manuscript. Torc2 regulation ofa-arrestins is also reviewed in PMID 36149412 and references therein should be considered.

      Because the primary aim of this study is to reveal mechanisms regulating Ght5 localization in fission yeast, but not to dissect modification and regulation of α-arrestins, we decided not to get into the details of phosphorylation and ubiquitination of α-arrestins. Furthermore, although budding yeast Rod1 and Rog3 are found to be downstream of the TORC2-Ypk1 signaling in the context of internalization of the Ste2 pheromone receptor, it is not clear whether TORC2-Ypk1 signaling also regulate α-arrestin-mediated internalization of hexose transporters in budding yeast. For these reasons, we focused on limited literatures essential for interpretation of the results and omitted many references describing the details of α-arrestin regulation. However, as this reviewer commented, we realize that our decision makes the discussion superficial and misleading to the reader. We sincerely apologize for this inconvenience.

      In the revised manuscript, we will reorganize the paragraphs in the discussion and include the suggested references. Regarding budding yeast Rod1, we will cite the study reporting Ypk1-mediated phosphorylation on Rod1 in mating pheromone response via regulation of Ste2 endocytosis (Alvaro et al, Genetics, 2016). We will also mention other reports (Becuwe et al, J Cell Biol, 2012; O'Donnell et al, Mol Cell Biol, 2015) about AMPK-dependent phosphorylation of Rod1 in the corresponding part (e.g. L129-130). In addition, we will mention that Aly2, Rod1 and Rog3 α-arrestins were found downstream of the TORC2-Ypk1 signaling (Muir et al, eLife, 2014; Thorner, Biochem J, 2022).

      As a further detailed example, there is far more work done on ubiquitination of a-arrestins in S. cerevisiae than the single citation provided by the authors on line 137. The way this section is written it feels misleading. Considerable effort has been spent on defining how mono- and poly-ubiquitination regulate a-arrestins and the authors should consider the data provided in the following citations and revise the two sentences they provide in this introduction to better reflect the breadth of our understanding rather than simply indicate that the 'mechanisms that regulate functions of a-arrestisn are not fully understood'. (PMIDs 23824189; 22249293; 17028178; 28298493)

      Ubiquitination of α-arrestin itself is not the topic of this study, and physiological consequences of ubiquitination of Aly3 remain unknown. Because of these reasons, we did not describe the details of ubiquitination of α-arrestins in the original manuscript. However, we never intend to mislead the reader, and thus to avoid it, we will revise the indicated sentences and cite the suggested literatures (O'Donnell et al, J Biol Chem, 2013; Becuwe et al, J Cell Biol, 2012; Kee et al, J Biol Chem, 2006; Ho et al, Mol Biol Cell, 2017) in the revised manuscript.

      Context of the findings and lack of citations - The referencing in this manuscript is very poor as many of the key papers that report analogous findings in the budding yeast Saccharomyces cerevisiae are not cited. This oversight in citing the appropriate literature must be remedied before this manuscript can be considered further for publication. Examples of these omissions occur at the following places:

      We will modify the text and carefully cite more literatures describing analogous finding in budding yeast and other organisms in the revised manuscript. We appreciate the insightful suggestions by this reviewer. It should be noted, however, that it is not evident whether budding yeast Rod1 and Rog3 are orthologous to fission yeast Aly3. Although Rod1 and Aly3 share overlapping roles, amino acid sequence similarity of them is not high and limited only in domains which are generally conserved among α-arrestin-family proteins.

      Line 90 - The Puca and Brou citations is one example of this but the first examples come from Daniela Rotin's work looking at Rsp5 interactions in budding yeast, which is where the association between HECT-domain Ub ligases and a-arrestins is also documented by Scott Emr and Hugh Pelham's labs. Here are some PMID numbers to improve the citations of this section (PMID 17551511; 18976803; 19912579) and each of these references long predates the Puca and Brou publication.

      In the revised manuscript, we will improve the citations by including the suggested studies (Gupta et al, Mol Syst Biol, 2007; Lin et al, Cell, 2008; Nikko and Pelham, Traffic, 2009).

      Lines 123-126 - Phosphorylation can also increase vacuole-dependent degradation of alpha-arrestins as demonstrated in PMID 35454122. The interaction with 14-3-3 proteins that is driven by phosphorylation of a-arrestins was first demonstrated by the Leon group in PMID 22249293). Lines 129-132 - Here again the Leon reference that helps demonstrate the 14-3-3 inhibition of Rod1 is lacking (PMID 22249293).

      We will cite the suggested studies in description of these topics (Bowman et al, Biomolecules, 2022; Becuwe et al, J Cell Biol, 2012).

      Lines 130-132 - Please include references for the statement that dephosphorylation activates a-arrestin activity. There are no citations on this statement and there are many to choose from and I would urge the authors to cite the primary literature on these points.

      We will cite studies for the statement "Conversely, dephosphorylation is thought to activate α-arrestins and to promote selective endocytosis of transporter proteins" (L130-132).

      These are just a few examples from the Introduction, but the Discussion is similarly wrought with issues in referencing and framing the experimental results within the context of the larger field, including what is known about Rod1/Rog3 regulation in S. cerevisiae. For example, the Llopis-Torregrosa et al reference and statement on lines 508-510 is incorrect. There are other phosphorylation sites defined in the C-terminus of Rod1, as described in Alvaro et al. PMID: 26920760.

      We will carefully correct Discussion by citing the suggested references (e.g. Alvaro et al, Genetics, 2016) and framing the obtained results within the context of the larger field.

      Of note, a combination of α-arrestin, upstream kinase(s) and distinct phosphorylation sites appears to determine the target transporter (Kahlhofer et al, Biol Cell, 2021; Thorner, Biochem J, 2022), and it has not been explicitly proved that TORC2-Ypk1 signaling also regulate α-arrestin-mediated internalization of hexose transporters in budding yeast. For these reasons, we stated "S. cerevisiae Rod1 and Rog3 are phosphorylated solely by Snf1p/AMPK" in the context of internalization of hexose transporters. We will also discuss this point in the revised manuscript.

      Minor Comments Clarification needed - Lines 107-121 - The relationship between the S. pombe arrestins and those in other organisms is somewhat unclear. Frist, all the arrestins in humans and S. cerevisiae can be sorted into the alpha, beta and Vps26 classes. However, the authors indicate that the S. pombe genome has 11 arrestin-like proteins but only 4 of these are a-arrestins. What classes do the other 7 arrestins belong to? It would be appreciated if this point was clarified.

      To our knowledge, fission yeast arrestins are not well classified yet. We will perform a phylogenetic tree analysis to classify them, and modify the description of the indicated part accordingly. We will also cite our previous report (Toyoda et al, J Cell Sci, 2021), in which the overall protein structure and domains of 11 fission yeast arrestin-like proteins were reported.

      Next, for the 4 a-arrestins identified in S. pombe the authors indicate that Aly3 is the homolog of Rod1/Art4 and Rog3/Art7 from S. cerevisiae. What is the relationship of Rod1 in S. pombe to Rod1 in S. cerevisiae? Are these also homologs? You can see how the nomenclature is confusing and, given the functional overlap of S. cerevisiae Rod1/Rog3 proteins it is important to know if Aly3 is the only version of these a-arrestins or if there is an additional counterpart in S. pombe. This point becomes somewhat more confusing when on lines 134-136 the authors talk about Arn1/Any1 as an arrestin related protein in S. pombe yet this protein was not included on the list of a-arrestins in the preceding section. What class of arrestin is this protein?

      According to PomBase, both Aly3 and Rod1 are assigned as the orthologue of budding yeast Rod1 and Rog3. However, as mentioned in responses above, it is unclear whether Aly3 is really orthologous to budding yeast Rod1/Rod3. In the revised manuscript, we will perform a homology search for these 4 proteins, and add information on how much these arrestins share homology.

      Arn1/Any1 is regarded as a β-arrestin (Nakase et al, J Cell Sci, 2013). We will also mention this in the revised manuscript.

      Alpha-arrestin homology - On lines 127-129 the authors indicate that TXNIP is the mammalian homolog of Aly3. To my knowledge, there are no evolutionary analyses that can draw these lines of homology between the a-arrestins in humans and those in yeasts. It would be appreciated if the authors could cite the work that leads to this conclusion or revise the sentence to more accurately reflect what is known on this topic. It certainly appears that, given their functional overlap in regulating glucose transporters, Txnip and Rod1/Rog3 in humans and S. cerevisiae are functionally connected. I urge the authors to use more caution when describing this protein family.

      Among human α-arrestins, ARRDC2 (22%) but not TXNIP (20%) has the highest amino acid identity to Aly3 (Toyoda et al, J Cell Sci, 2021). However, as TXNIP has been reported to regulate endocytosis of hexose transporters, GLUT1 and 4 (Wu et al, Mol Cell, 2013; Waldhart et al, Cell Rep, 2017), we think that TXNIP and Aly3 share physiological roles. We will revise the sentence (L127-129) more accurately.

      Text editing - The text could use editing as there are awkward and grammatically incorrect sentences in several places. Here are a few examples to help the authors:

      Please note that the original manuscript is edited by a professional editor, who is a native English (American) speaker and has edited thousands of research papers, before initial submission. We will ask an editor to check the revised draft again before submission.

      Lines 57-60 - the protein is not expressed over the entire cell surface, but is localized to the entire cell surface.

      We will correct this wording.

      Lines 80-83 - this sentence is very confusing

      We will correct this part by changing the phrase "Unlike TORC1," into a clause.

      Line 86 - Is there more than one gene encoding Aly3 in S. pombe?

      No, there is only one gene encoding Aly3. We will correct this part so as to avoid being misunderstood.

      Line 88, 109, - these sentences need to start with a capitol so either capitalize the A in arrestin or write out Alpha with a capitol A.

      We will correct the sentence as suggested.

      Lines 145-148 - unclear as written

      We will clarify the meaning of the sentence by changing the voice.

      Line 224 - why are these amino acids being referred to as hydroxylated? Perhaps hydroxyl-containing amino acids or 18 amino acids with hydroxyl side chains would be better choices?

      We will correct the word as suggested.

      Line 300 - very confusing sentence structure

      We will correct this part by simplifying the structure of the sentence.

      And elsewhere....

      We will carefully check the revised text before submission.

      Reviewer #3 (Significance):

      The authors provide some information as to the residues needed in the Aly3 C-tail for Ght5 trafficking in S. Pombe. These results are not places in the context of similar phosphor-regulatory work done for a-arrestins in S. cerevisiae, and this is needed for appreciation of the significance of the study.

      Overall, it appears that the model put forth is very similar to the one already proposed in S. cerevisiae where phosphorylation impedes a-arrestin-mediated trafficking of glucose transporters. It is interesting to see this similarity hold in S. Pombe, but it does not dramatically alter our appreciation of a-arrestin biology.

      The significance of the findings are somewhat underscored by the fact that very little quantification of data are presented, making the rigor of the work difficult to assess.

      We thank the reviewer for careful reading and evaluation of our study. As the reviewer states, the results are not placed in the context of similar phospho-regulatory works done for α-arrestins in S. cerevisiae. This may partly come from the fact that it remains unclear whether internalization of hexose transporters is regulated by TORC2-dependent phosphorylation in S. cerevisiae. We believe that our study is novel and significant for this reason. By performing the additional experiments/quantification and revising the text as suggested by the reviewers, the manuscript will be further strengthened, and we will be able to clearly conclude that TORC2-dependent phosphorylation of Aly3 regulates localization of the Ght5 hexose transporter and cellular responses to glucose shortage stress.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary/background.

      This paper focuses on the regulation of endocytosis of the hexose transporter, Ght5, in S. pombe by nutrient limitation through the arrestin-like protein Aly3. Ght5 is induced when glucose is limiting and is required for growth and proliferation in these conditions. ght5+ encodes the only high-affinity glc transporter from fission yeast. ght5+ is induced in low glucose conditions at the transcriptional level and is translocated to the plasma membrane to allow glc import. Ght5 is targeted to the vacuole in conditions of N limitation. Mutations in the TORC2 pathway lead to the same process, thus preventing growth on low glucose medium, as shown in the gad8ts mutant, mutated for the Gad8 kinase acting downstream of TORC2. Previously, the authors demonstrated that the vacuolar delivery of Ght5 in the gad8ts mutant is suppressed by mutation of the arrestin-like protein Aly3. Arrestin-like proteins are in charge of recognising and ubiquitinating plasma membrane proteins to direct their vacuolar targeting by the endocytosis pathway. This suggested that Aly3 is hyperactive in TORC2 mutants, and accordingly, Ght5 ubiquitination was increased in gad8ts.

      Overall statement

      This study aims at deepening our understanding of the regulation of endocytosis by signalling pathways through arrestin-like proteins. Ght5 is a nice model to study a physiological regulation, and the authors have a great set of tools at hand. However, I think the conclusions are not always rigorous and the conclusions are sometimes far-reaching. The main problem is that much of the conclusions concern a potential phosphorylation of Aly3 which is not experimentally addressed. An additional issue is the fact that they look at Ght5 ubiquitination by co-immunoprecipitation in native conditions (or at least, it seems to me) which cannot be conclusive. Overall, I think some experiments should be performed to address (at least) these 2 points before the manuscript can be published, see detailed comments below.

      Major statements and criticism.

      • Fig 1. Based on the hypothesis that TORC2-mediated phosphorylation regulate Ght5 endocytosis, the authors first considered a possible phosphorylation of Ght5. They mutagenised 11 possible phosphorylation sites on the Ct of Ght5, but none affected the growth on low glucose in the absence of thiamine, suggesting that they don't contribute to the observed TORC2-mediated regulation. However, I disagree with the statement that "phosphorylation of Ght5 is dispensable for cell proliferation in low glucose", given that the authors do not show 1- that Ght5 is phosphorylated and 2-that this is abolished by these mutations. They should either provide data on this or tone down and say that these residues are not involved in the regulation, without implying phosphorylation which is not proven. In the presence of Thiamine (Supp fig 1), it seems that the ST/A mutant grows better in low glucose, and this is not explained nor commented. Since the transporter is not expressed, could the authors provide an explanation to this? If the promoter is leaky and some ght5-ST/A is expressed, it may be more stable and allow better growth than the WT, which would tend to indicate that impairing phosphorylation prevents endocytosis (which is classical for many transporters, see the body of work on CK1-mediated phosphorylation of transporters). Have the authors tried to decrease glc concentration lower than 0.14% in the absence of thiamine to see if this also true when the transporters is strongly expressed? (OPTIONAL)
      • Fig 2. The authors then follow the hypothesis that TORC2 exerts its Ght5-dependent regulation through the phosphorylation of Aly3. They mutagenised 18 possible phosphorylation sites on Aly3. This led to a strong defect in growth in low-glc medium. Mutation of the possible Gad8 site (S460) did not recapitulate this phenotype, suggesting that it is not sufficient, however, mutations of 4 ST residues in a CT cluster (582-586) mimicked the full 18ST/A mutation, suggesting these are the important residues for Ght5 endocytosis.
      • Fig 3A. Further dissection did not allow to pinpoint this regulation to a specific residue, beyond the dispensability of the T586 residue. Fig 3B. The authors look at the effects of mutation of Aly3 on these sites at the protein level. They had to develop an antibody because HA-epitope tagging did not lead to a functional protein (Supp fig 2). Whereas I agree that the mutations causing a phenotype lead to a change in the migration pattern, I disagree with the statement that "This observation indicated that slower migrating bands were phosphorylated species of Aly3" (p.9 l.271). First, lack of phosphorylation usually causes a slower mobility on gel, which is not clear to spot here. Second, a smear appears on top of the mutated proteins (eg. 4th Ala) which is possibly caused by another modification. There are many precedents in the literature about arrestins being ubiquitinated when they are not phosphorylated (see the work on Bul1, Rod1, Csr2 in baker's yeast from various labs). My gut feeling is that lack of phosphorylation unleashes Aly3 ubiquitination leading to change in pattern. All in all, it is impossible to state about the phosphorylation of a protein without addressing its phosphorylation properly by phosphatase treatment + change in migration, or MS/MS. Thus, whereas the data looks promising, this hypothesis that Aly3 is phosphorylated at the indicated sites is not properly demonstrated.
      • Fig 4. The authors now look at the functional consequences of these mutations on ALy3 on Ght5 localisation. The data clearly shows that mutation of the 4 identified S/T residues (Aly3-4th A) causes aberrant localisation of the transporter to the vacuole, likely to cause the observed growth defect on low glucose. There is a nice correlation between the vacuolar localisation and growth in low-glucose for the various aly3 mutants. (A final proof could be to express this in the context of an endocytic mutant, which should restore membrane localisation and suppress the aly3-4thA phenotype - OPTIONAL). However, I still disagree with the statement that "These results indicate that phosphorylation of Aly3 at the C-terminal 582nd, 584th, and/or 585th serine residues is required for cell-surface localization of Ght5." given that phosphorylation was not properly demonstrated.
      • Fig 5. Here, the authors question the role of Aly3 mutations on Ght5 ubiquitination. They immunoprecipitate Ght5 and address its ubiquitination status in various Aly3 mutants. The data is encouraging for a role in Aly3 phosphorylation (?) in the negative control of Ght5 ubiquitination. My main problem with this experiment is that it seems that Ght5 immunoprecipitations were made in non-denaturing conditions, which leads to the question of what is the anti-ubiquitin revealing here (Ght5 or a co-immunoprecipitated protein, for example Aly3 itself, or the Pub ligases, or an unknown protein). It seems that this protocol was previously used in their previous paper, but I stand by my conclusion that ubiquitination of a given protein can only be looked in denaturing conditions. The experiments should be repeated in buffers classical for the study of protein ubiquitination to be able to conclude unambiguously that we are looking at Ght5 ubiquitination itself, especially in the absence of a non-ubiquitinable form of Ght5 as a negative control. Could the authors comment on the fact that S-A or S-D mutations display the same phenotype regarding the possible Ght5 ubiquitination?
      • Fig 6. The authors want to document the model whereby Aly3 may interact with some of the Nedd4 ligases (Pub1/2/3) to mediate its Ght5-ubiquitination function. They actually use the Aly3-4thA mutant, it should have been better with the WT protein. But the results indicate a clear interaction with at least Pub1 and Pub3. By the way, are the Pub1/2/3 fusions functional? Nedd4 proteins are notoriously affected in their function by C-terminal tagging and are usually tagged at their N-terminus (See Dunn et al. J Cell Biol 2004).
      • Fig 7. The authors want to provide genetic interaction between the Pub ligases and the growth defects in low glc due to alterations in Ght5 trafficking. It is unclear how the gad8ts pub1∆ mutant was generated since it doesn't seem to grow on regular glc concentration (Supp fig 5), could the authors provide some information about this? It is also not clear whether it can be stated thatches mutant is "more sensitive" to glc depletion because of the low level of growth to begin with (even at 3%). Altogether, the data show that deletion of pub3+ is able to suppress the growth defect of the gad8ts mutant on low glc medium, suggesting it is the relevant ligase for Ght5 endocytosis. This is confirmed by microscopy observations of Ght5 localisation. However, I would again tone down the main conclusion, which I feel is far-reaching: "Combined with physical interaction data, these results strongly suggest that Aly3 recruits Pub3, but not Pub2, for ubiquitination of Ght5." Work on Rsp5 in baker's yeast has shown that Rsp5 function goes beyond cargo ubiquitination, including ubiquitination of arrestins (which is often required for their function as mentioned in the introduction) or other endocytic proteins (epsins, amphyphysin etc). I agree that the data are compatible with this model but there are other possible explanations. Anything that would block endocytosis would supposedly suppress the gad8ts phenotype.

      Discussion

      Some analogy with the regulation of the Bul arrestins by TORC1/Npr1 and PP2A/Sit4 could be mentioned (Mehri et al. 2012), at the discretion of the authors. The possibility that phosphorylation may neutralise a basic patch on Aly3 Ct, possibly involved in electrostatic interactions with Ght5 is very interesting. Regarding the effect of the mutations on Aly3 localisation (p.15 l.498), did the authors tag Aly3 with GFP? There are examples where proteins tagged with HA are not functional whereas tagging with GFP does not alter their function (eg. Rod1, Laussel et al. 2022) - and here Supp Fig 2 only relates to HA-tagging. Proof of a change in Aly3 localisation upon mutation would definitely be a plus (OPTIONAL).

      Minor comments.

      Introduction:

      • I believe the text corresponding to the work on TXNIP is incorrect (p.5 l.127). TXNIP is degraded after its phosphorylation, not "rectracted" from the surface.
      • For the sake of completion, the authors could add other references concerning the regulation of Rod1 in budding yeast such as Becuwe et al. 2012 J Cell Biol and O'Donnell et al. 2015 Mol Cell Biol, in addition to Llopis-Torregrosa et al. 2016.
      • Other examples of the requirement for arrestin ubiquitination beyond Art1 (p.5 l.136-137) are listed in the ref cited: Kahlhofer et al. 2021.

      Figures: In general, I think it would be clearer if the authors showed on the figures that the background strain in which the XXX gene is added (or its mutant forms) is a xxx∆ strain.

      Referees cross-commenting

      Cross review of Reviewer 1

      • I don't believe that the authors "define a set of redundant c-terminal phosphorylation sites in Aly3", because phosphorylation is not proven.
      • I thinks the points raised for Fig 3B are valid but the authors should focus on making their story conclusive before expanding to other data (except for the explanation of the smear, see my review). Also, I don't think 2NBDG actually works to measure Glc uptake.
      • same for Fig 6 - not sure the interaction site mapping between Aly3 and Pubs would bring much value since there are more urgent things to do to make the story solid.

      Cros review of Reviewer 3 - we have many overlaps, so briefly :

      • I agree that the bibliography is incomplete (mentioned in my review)
      • I agree that there is no demonstration of the phospho-status of Aly3, and it is a problem
      • I agree that the results can be better quantified, esp. in the light of the points raised by this referee concerning the variability of expression of ST18A

      Other specific comments :

      • I agree that the statement that dephosphorylation activates alpha-arresting should be toned down - this was observed in several instances but there are examples of arrestin-mediated endocytosis which does not require their prior dephosphorylation.
      • I fully agree that efforts could be made regarding the classification/nomenclature of arrestins in S. pombe, this had escaped my attention

      Significance

      strengths and limitations

      This study aims at deepening our understanding of the regulation of endocytosis by signalling pathways through arrestin-like proteins in S. pombe. Ght5 is a nice model to study a physiological regulation, and the authors have a great set of tools at hand, including the discovery of Aly3 as the main arrestin for this regulation, and a signalling pathway (TORC2/Gad8) acting upstream. The main question is now to understand at the mechanistic level how TORC2 signaling impinges on the regulation of this arrestin.

      Overall, the authors nicely demonstrate that C-terminal Ser/Thr residues are crucial for the function of Aly3 in Ght5 endocytosis. They propose a model whereby Aly3 phosphorylation by an unknownn kinase inhibits its function on Ght5 ubiquitination, which would favour its endocytosis. However, I think the conclusions are not always rigorous and the conclusions are sometimes far-reaching. The main problem is that much of the conclusions concern a potential phosphorylation of Aly3 which is not experimentally addressed. An additional issue is the fact that they look at Ght5 ubiquitination by co-immunoprecipitation in native conditions (or at least, it seems to me) which cannot be conclusive. Overall, I think some experiments should be performed to address (at least) these 2 points before the manuscript can be published, see detailed comments above.

      Advance

      This study, if completed carefully, would provide among the first examples of mapping of phosphorylation sites on arrestins, which are usually phosphorylated at many sites and are thus difficult to study. Few studies went down to this level in this respect (see Ivshov et al. eLife 2020). There are no changes in paradigms or new conceptual insights, but this work is a nice example of the conservation of these regulatory mechanisms.

      Audience

      Should be of interest for people studying basic research in the field of cell biology, signalling pathways, transporter regulation by physiology. Reviewer background is on the regulation of transporter endocytosis by signalling pathways and arrestin-like proteins.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary and Strengths:

      The ability of Wolbachia to be transmitted horizontally during parasitoid wasp infections is supported by phylogenetic data here and elsewhere. Experimental analyses have shown evidence of wasp-to-wasp transmission during coinfection (eg Huigins et al), host to wasp transmission (eg Heath et al), and mechanical ('dirty needle') transmission from host to host (Ahmed et al). To my knowledge this manuscript provides the first experimental evidence of wasp to host transmission. Given the strong phylogenetic pattern of host-parasitoid Wolbachia sharing, this may be of general importance in explaining the distribution of Wolbachia across arthropods. This is of interest as Wolbachia is extremely common in the natural world and influences many aspects of host biology.

      Weaknesses:

      The first observation of the manuscript is that the Wolbachia strains in hosts are more closely related to those in their parasitoids. This has been reported on multiple occasions before, dating back to the late 1990s. The introduction cites five such papers (the observation is made in other studies too that could be cited) but then dismisses them by stating "However, without quantitative tests, this observation could simply reflect a bias in research focus." As these studies include carefully collected datasets that were analysed appropriately, I felt this claim of novelty was rather strong. It is unclear why downloading every sequence in GenBank avoids any perceived biases, when presumably the authors are reanalysing the data in these papers.

      Thank you for bringing this to our attention, and we will make the necessary amendments in our revised manuscript.

      I do not doubt the observation that host-parasitoid pairs tend to share related Wolbachia, as it is corroborated by other studies, the effect size is large, and the case study of whitefly is clearcut. It is also novel to do this analysis on such a large dataset. However, the statistical analysis used is incorrect as the observations are pseudo-replicated due to phylogenetic non-independence. When analysing comparative data like this it is essential to correct for the confounding effects of related species tending to be similar due to common ancestry. In this case, it is well-known that this is an issue as it is a repeated observation that related hosts are infected by related Wolbachia. However, the authors treat every pairwise combination of species (nearly a million pairs) as an independent observation. Addressing this issue is made more complex because there are both the host and symbiont trees to consider. The additional analysis in lines 123-124 (including shuffling species pairs) does not explicitly address this issue.

      We concur with your observation regarding the non-independence of the data due to phylogenetic relationships. While common phylogenetic correction methods are indeed not directly applicable to wsp distances between species pairs, we are investigating the potential of phylogenetic mixed models to address this issue. We hope to include a revised analysis using this approach in our revised manuscript.

      The sharing of Wolbachia between whitefly and their parasitoids is very striking, although this has been reported before (eg the authors recently published a paper entitled "Diversity and Phylogenetic Analyses Reveal Horizontal Transmission of Endosymbionts Between Whiteflies and Their Parasitoids"). In Lines 154-164 it is suggested that from the tree the direction of transfer between host and parasitoid can be inferred from the data. This is not obvious to me given the poor resolution of the tree due to low sequence divergence. There are established statistical approaches to test the direction of trait changes on a tree that could have been used (a common approach is to use the software BEAST).

      Thank you for your insightful comments regarding the transfer direction of Wolbachia between whiteflies and their parasitoids. We acknowledge the concern about the resolution of the phylogenetic tree and the inference of the direction of Wolbachia transmission based on the available data. We considered the high infection frequency and obligate nature of Wolbachia in En. formosa, which exhibits a 100% infection rate, as a strong indicator that recent transmission of Wolbachia in this clade likely occurred from En. formosa to B. tabaci. We appreciate your recommendation and will ensure that our conclusions are supported by a more statistically sound approach. As you suggested, we will employ the software BEAST to rigorously test the direction of transmission, and we will revise our statements accordingly.

      Reviewer #2 (Public Review):

      The paper by Yan et al. aims to provide evidence for horizontal transmission of the intracellular bacterial symbiont Wolbachia from parasitoid wasps to their whitefly hosts. In my opinion, the paper in its current form consists of major flaws.

      Weaknesses:

      The dogma in the field is that although horizontal transmission events of Wolbachia occur, in most systems they are so rare that the chances of observing them in the lab are very slim.

      For the idea of bacteria moving from a parasitoid to its host, the authors have rightfully cited the paper by Hughes, et al. (2001), which presents the main arguments against the possibility of documenting such transmissions. Thus, if the authors want to provide data that contradict the large volume of evidence showing the opposite, they should present a very strong case.

      In my opinion, the paper fails to provide such concrete evidence. Moreover, it seems the work presented does not meet the basic scientific standards.

      We are grateful for your critical perspective on our work. Nonetheless, we are confident in the credibility of our findings regarding the horizontal transmission of Wolbachia from En. formosa to B. tabaci. Our study has documented this phenomenon through phylogenetic tree analyses, and we have further substantiated our observations with rigorous experiments in both cages and petri dishes. The horizontal transfer of Wolbachia was confirmed via PCR, with the wsp sequences in B. tabaci showing complete concordance with those in En. formosa. Additionally, we utilized FISH, vertical transmission experiments, and phenotypic assays to demonstrate that the transferred Wolbachia could be vertically transmitted and induce significant fitness cost in B. tabaci. All experiments were conducted with strict negative controls and a sufficient number of replicates to ensure reliability, thereby meeting basic scientific standards. The collective evidence we present points to a definitive case of Wolbachia transmission from the parasitoid En. formosa to the whitefly B. tabaci.

      My main reservations are:

      • I think the distribution pattern of bacteria stained by the probes in the FISH pictures presented in Figure 4 looks very much like Portiera, the primary symbiont found in the bacterium of all whitefly species. In order to make a strong case, the authors need to include Portiera probes along with the Wolbachia ones.

      We are very grateful for your critical evaluation regarding the specificity of FISH in our study. We assure the reliability of our FISH results based on several reasons.

      1) We implemented rigorous negative controls which exhibited no detectable signal, thereby affirming the specificity of our hybridization. 2) The central region of the whitefly nymphs is a typical oviposition site for En. formosa. Post-parasitism, we observed FISH signals around the introduced parasitoid eggs, distinct from bacteriocyte cells which are rich in endosymbionts including Portiera (FIG 3e-f). This observation supports the high specificity of our FISH method. 3) In the G3 whiteflies, we detected the presence of Wolbachia in bacteriocytes in nymphs and at the posterior end of eggs in adult females (FIG 4). This distribution pattern aligns with previously reported localizations of Wolbachia in B. tabaci (Shi et al., 2016; Skaljac et al., 2013). Furthermore, the distribution of Wolbachia in the whiteflies does indeed exhibit some overlap with that of Portiera (Skaljac et al., 2013; Bing et al., 2014). 4) The primers used in our FISH assays have been widely cited (Heddi et al., 1999) and validated in studies on B. tabaci and other systems (Guo et al., 2018; Hegde et al., 2024; Krafsur et al., 2020; Rasgon et al., 2006; Uribe-Alvarez et al., 2019; Zhao et al., 2013). Taking all these points into consideration, we stand by the reliability of our FISH results.

      References:

      Bing XL, Xia WQ, Gui JD, Yan GH, Wang XW, Liu SS. 2014. Diversity and evolution of the Wolbachia endosymbionts of Bemisia (Hemiptera: Aleyrodidae) whiteflies. Ecol Evol, 4(13): 2714-37.

      Guo, Y, Hoffmann, AA, Xu, XQ, Zhang X, Huang HJ, Ju JF, Gong JT, Hong XY. 2018. Wolbachia-induced apoptosis associated with increased fecundity in Laodelphax striatellus (Hemiptera: Delphacidae). Insect Mol Biol, 27: 796-807.

      Heddi A, Grenier AM, Khatchadourian C, Charles H, Nardon P. 1999. Four intracellular genomes direct weevil biology: Nuclear, mitochondrial, principal endosymbiont, and Wolbachia. Proc Natl Acad Sci USA, 96: 6814-6819.

      Hegde S, Marriott AE, Pionnier N, Steven A, Bulman C, Gunderson E, et al. 2024. Combinations of the azaquinazoline anti-Wolbachia agent, AWZ1066S, with benzimidazole anthelmintics synergise to mediate sub-seven-day sterilising and curative efficacies in experimental models of filariasis. Front Microbiol, 15: 1346068.

      Krafsur AM, Ghosh A, Brelsfoard CL. 2020. Phenotypic response of Wolbachia pipientis in a cell-free medium. Microorganisms, 8: 1060.

      Rasgon JL, Gamston, CE, Ren X. 2006. Survival of Wolbachia pipientis in cell-free medium. Appl Environ Microbiol, 72: 6934-6937.

      Shi P, He Z, Li S, An X, Lv N, Ghanim M, Cuthbertson AGS, Ren SX, Qiu BL. 2016. Wolbachia has two different localization patterns in whitefly Bemisia tabaci AsiaII7 species. PLoS One, 11: e0162558.

      Skaljac M, Zanić K, Hrnčić S, Radonjić S, Perović T, Ghanim M. 2013. Diversity and localization of bacterial symbionts in three whitefly species (Hemiptera: Aleyrodidae) from the east coast of the Adriatic Sea. Bull Entomol Res, 103(1): 48-59.

      Uribe-Alvarez C, Chiquete-Félix N, Morales-García L, Bohórquez-Hernández A, Delgado-Buenrostro N L, Vaca L, et al. 2019. Wolbachia pipientis grows in Saccharomyces cerevisiae evoking early death of the host and deregulation of mitochondrial metabolism. MicrobiologyOpen, 8: e00675.

      Zhao DX, Zhang XF, Chen DS, Zhang YK, Hong XY, 2013. Wolbachia-host interactions: Host mating patterns affect Wolbachia density dynamics. PLoS One, 8: e66373.

      • If I understand the methods correctly, the phylogeny presented in Figure 2a is supposed to be based on a wide search for Wolbachia wsp gene done on the NCBI dataset (p. 348). However, when I checked the origin of some of the sequences used in the tree to show the similarity of Wolbachia between Bemisia tabaci and its parasitoids, I found that most of them were deposited by the authors themselves in the course of the current study (I could not find this mentioned in the text), or originated in a couple of papers that in my opinion should not have been published to begin with.

      We appreciate your meticulous examination of the sources for our sequence data. All the sequences included in our phylogenetic analysis were indeed downloaded from the NCBI database as of July 2023. The sequences used to illustrate the similarity of Wolbachia between B. tabaci and its parasitoids include those from our previously published study (Qi et al., 2019), which were sequenced from field samples. Additionally, some sequences were also obtained from other laboratories (Ahmed et al., 2009; Baldo et al., 2006; Van Meer et al., 1999). We acknowledge that in our prior research (Qi et al., 2019), the sequences were directly submitted to NCBI and, regrettably, we did not update the corresponding publication information after the article were published. It is not uncommon for sequences on NCBI, with some never being followed by a published paper (e.g., FJ710487- FJ710511 and JF426137-JF426149), or not having their associated publication details updated post-publication (for instance, sequences MH918776-MH918794 from Qi et al., 2019, and KF017873-KF017878 from Fattah-Hosseini et al., 2018). We recognize that this practice can lead to confusion and apologize for the oversight in our work.

      References:

      Ahmed MZ, Shatters RG, Ren, SX, Jin GH, Mandour NS, Qiu BL. 2009. Genetic distinctions among the Mediterranean and Chinese populations of Bemisia tabaci Q biotype and their endosymbiont Wolbachia populations. J Appl Entomol, 133: 733-741.

      Baldo L, Hotopp JCD, Jolley KA, Bordenstein SR, Biber SA, Choudhury RR, et al. 2006. Multilocus sequence typing system for the endosymbiont Wolbachia pipientis. Appl Environ Microbiol, 72: 7098-110.

      Fattah-Hosseini S, Karimi J, Allahyari H. 2014. Molecular characterization of Iranian Encarsia formosa Gahan populations with natural incidence of Wolbachia infection. J Entomol Res Soc, 20: 85–100.

      Qi LD, Sun JT, Hong XY, Li YX. 2019. Diversity and phylogenetic analyses reveal horizontal transmission of endosymbionts between whiteflies and their parasitoids. J Econ Entomol, 112(2): 894-905.

      Van Meer MM, Witteveldt J, Stouthamer R. 1999. Phylogeny of the arthropod endosymbiont Wolbachia based on the wsp gene. Insect Mol Biol, 8: 399-408.

      • The authors fail to discuss or even acknowledge a number of published studies that specifically show no horizontal transmission, such as the one claimed to be detected in the study presented.

      Thank you for bringing this to our attention. We will address and discuss the published studies that report no evidence of horizontal transmission, as you've highlighted, in the revised version of our manuscript.

      Reviewer #3 (Public Review):

      This is a very ordinary research paper. The horizontal of endosymbionts, including Wolbachia, Rickettsia etc. has been reported in detail in the last 10 years, and parasitoid vectored as well as plant vectored horizontal transmission is the mainstream of research. For example, Ahmed et al. 2013 PLoS One, 2015 PLoS Pathogens, Chiel et al. 2014 Enviromental Entomology, Ahmed et al. 2016 BMC Evolution Biology, Qi et al. 2019 JEE, Liu et al. 2023 Frontiers in Cellular and Infection Microbiology, all of these reported the parasitoid vectored horizontal transmission of endosymbiont. While Caspi-Fluger et al. 2012 Proc Roy Soc B, Chrostek et al. 2017 Frontiers in Microbiology, Li et al. 2017 ISME Journal, Li et al. 2017 FEMS, Shi et al. 2024 mBio, all of these reported the plant vectored horizontal transmission of endosymbiont. For the effects of endosymbiont on the biology of the host, Ahmed et al. 2015 PLoS Pathogens explained the effects in detail.

      Thank you very much for your insightful comments and for highlighting the relevant literature in the field of horizontal transmission of endosymbionts, including Wolbachia and Rickettsia. After careful consideration of the studies you have mentioned, we believe that our work presents significant novel contributions to the field. 1) Regarding the parasitoid-mediated horizontal transmission of Wolbachia, most of the cited articles, such as Ahmed et al. 2013 in PLoS One and Ahmed et al. 2016 in BMC Evolutionary Biology, propose hypotheses but do not provide definitive evidence. The transmission of Wolbachia within the whitefly cryptic species complex (Ahmed et al. 2013) or between moths and butterflies (Ahmed et al. 2016) could be mediated by parasitoids, plants, or other unknown pathways. 2) Chiel et al. (2014 in Environmental Entomology reported “no evidence for horizontal transmission of Wolbachia between and within trophic levels” in their study system. 3) The literature you mentioned about Rickettsia, rather than Wolbachia, indirectly reflects the relative scarcity of evidence for Wolbachia horizontal transmission. For example, the evidence for plant-mediated transmission of Wolbachia remains isolated, with Li et al. 2017 in The ISME Journal being one of the few reports supporting this mode of transmission. 4) While the effects of endosymbionts on their hosts are not the central focus of our study, the effects of transgenerational Wolbachia on whiteflies are primarily demonstrated to confirm the infection of Wolbachia into whiteflies. Furthermore, the effects we report of Wolbachia on whiteflies are notably different from those reported by Ahmed et al. 2015 in PLoS Pathogens, likely due to different whitefly species and Wolbachia strains. 6) More importantly, our study reveals a mechanism of parasitoid-mediated horizontal transmission of Wolbachia that is distinct from the mechanical transmission suggested by Ahmed et al. 2015 in PLoS Pathogens. Their study implies transmission primarily through host-feeding contamination, without the need for Wolbachia to infect the parasitoid, suggesting host-to-host transmission at the same trophic level. In contrast, our findings demonstrate transmission from parasitoids to hosts through unsuccessful parasitism, which represents cross-trophic level transmission. To our knowledge, this is the first experimental evidence that Wolbachia can be transmitted from parasitoids to hosts. We believe these clarifications and the novel insights provided by our research contribute valuable knowledge to the field.

      References:

      Ahmed MZ, De Barro PJ, Ren SX, Greeff JM, Qiu BL. 2013. Evidence for horizontal transmission of secondary endosymbionts in the Bemisia tabaci cryptic species complex. PLoS One, 8: e53084.

      Ahmed MZ, Li SJ, Xue X, Yin XJ, Ren SX, Jiggins FM, Greeff JM, Qiu BL. 2015. The intracellular bacterium Wolbachia uses parasitoid wasps as phoretic vectors for efficient horizontal transmission. PLoS Pathog, 10: e1004672.

      Ahmed MZ, Breinholt JW, Kawahara AY. 2016. Evidence for common horizontal transmission of Wolbachia among butterflies and moths. BMC Evol Biol, 16: 118. doi.org/10.1186/s12862-016-0660-x.

      Caspi-Fluger A, Inbar M, Mozes-Daube N, Katzir N, Portnoy V, Belausov E, Hunter MS, Zchori-Fein E. 2012. Horizontal transmission of the insect symbiont Rickettsia is plant-mediated. Proc Biol Sci, 279(1734): 1791-6.

      Chiel E, Kelly SE, Harris AM, Gebiola M, Li X, Zchori-Fein E, Hunter MS. 2014. Characteristics, phenotype, and transmission of Wolbachia in the sweet potato whitefly, Bemisia tabaci (Hemiptera: Aleyrodidae), and its parasitoid Eretmocerus sp. nr. emiratus (Hymenoptera: Aphelinidae). Environ Entomol, 43(2): 353-62.

      Chrostek E, Pelz-Stelinski K, Hurst GDD, Hughes GL. 2017. Horizontal transmission of intracellular insect symbionts via plants. Front Microbiol, 8: 2237.

      Li SJ, Ahmed MZ, Lv N, Shi PQ, Wang XM, Huang JL, Qiu BL. 2017. Plantmediated horizontal transmission of Wolbachia between whiteflies. ISME J, 11: 1019-1028.

      Li YH, Ahmed MZ, Li SJ, Lv N, Shi PQ, Chen XS, Qiu BL. 2017. Plant-mediated horizontal transmission of Rickettsia endosymbiont between different whitefly species. FEMS Microbiol Ecol, 93(12). doi: 10.1093/femsec/fix138.

      Liu Y, He ZQ, Wen Q, Peng J, Zhou YT, Mandour N, McKenzie CL, Ahmed MZ, Qiu BL. 2023. Parasitoid-mediated horizontal transmission of Rickettsia between whiteflies. Front Cell Infect Microbiol, 12: 1077494. DOI: 10.3389/fcimb.2022.1077494

      Qi LD, Sun JT, Hong XY, Li YX. 2019. Diversity and phylogenetic analyses reveal horizontal transmission of endosymbionts between whiteflies and their parasitoids. J Econ Entomol, 112: 894-905.

      Shi PQ, Wang L, Chen XY, Wang K, Wu QJ, Turlings TCJ, Zhang PJ, Qiu BL. 2024. Rickettsia transmission from whitefly to plants benefits herbivore insects but is detrimental to fungal and viral pathogens. mBio, 15(3): e0244823.

      Weaknesses:

      In the current study, the authors downloaded the MLST or wsp genes from a public database and analyzed the data using other methods, and I think the authors may not be familiar with the research progress in the field of insect symbiont transmission, and the current stage of this manuscript lacking sufficient novelty.

      We appreciate your critical perspective on our study. However, we respectfully disagree with the viewpoint that our manuscript lacks sufficient novelty.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      i) "Enhancers dependent on TPR during senescence are enriched for binding sites of inflammatory transcription factors". *Proximity to genes does not confirm an enhancer role for that gene, although Tasdemir et al., 2016 suggested this. At that time, HI-C and Hi-CHiP techniques were not well-established. Nowadays, without combining HI-C and H3K27ac ChIP, Hi-ChIP alone cannot definitively identify actual enhancer regions. If we repeatedly use the Tasdemir et al., 2016 map, we risk incorrect mapping of enhancers of SASP. The authors should either use other public Hi-C databases to map the enhancer of SASP or temper their conclusions about enhancers. Otherwise, this could set a precedent for the SASP enhancer region that might not be entirely accurate. *

      The enhancer mapping for SASP is outdated, as advancements in Hi-C have significantly developed this area. Therefore, the claimed enhancers of SASP may not be accurate.

      __Response: __We agree with the reviewer that enhancers are not easy to define, or to pair with their target gene(s). Indeed, we would argue that even combined HI-C and H3K27ac does not define enhancers or enhancer-gene pairs and that the gold-standard evidence for an enhancer is genetics – does its deletion/mutation abrogate gene activation. We would also point out that we did not actually use the Tasdemir data to call enhancers. In response to the reviewer’s comment, we will temper our terminology and now refer to our inter-and intra-genic ATAC-seq peaks only as “putative enhancers”.

      ii) “Many of these include putative enhancers located close to key SASP genes, such as IL1B and IL8 (Figure 1D).” I have the same concern as mentioned above (i). However, I am interested in knowing the other key SASP genes where DNA is accessible near the genes. A supplementary table listing key SASP genes along with their distances to the TSS and affected by TPR knock-down would be helpful.

      __Response: __We thank the reviewer for this suggestion. We will provide tables listing the TPR dependent, senescent specific ATAC-seq peaks that are close to genes associated with the ‘positive regulation of inflammatory response’, ‘cytokine activity’ and ‘cytokine receptor binding’ gene ontology terms which were significant in the GREAT analysis, and which includes many SASP genes. We will also provide distances of these regions from the associated genes.

      iii) "As we previously reported, knockdown of TPR (siTPR) in RAS cells blocks SAHF formation, but it also results in reduced nuclear localisation (decreased nucleocytoplasmic ratio) of NF-κB, consistent with decreased NF-κB activation (Figure 2A and B, Figure S2A)." TPR is required for CCF, SASP, and SAHF. The relationship between CCF and SASP is well established, but the relationship between SAHF and CCF/SASP remains elusive. Both SAHF and CCF are enriched with heterochromatin markers, suggesting that CCF might originate from SAHF. However, this has not been confirmed. Do the authors think that SAHF is a prerequisite for CCF in the OIS model, or is it an independent event?

      Response: __We agree with the reviewer that CCFs likely originate from SAHF. Whilst we cannot definitively prove thisin our ER-Ras OIS model, in the revised manuscript we intend to further investigate the relationship between SAHF and CCF by knocking down HMGA1 during RAS-induced senescence. Like TPR, HMGA1 depletion is known to lead to loss of SAHF (Narita et al., Cell, 2006) but, unlike TPR, HMGA1 is a chromatin protein enriched on heterochromatin itself. We will assess whether loss of HMGA1 also abrogates CCF formation.__

      iv) The authors suggested that "it is plausible that the decrease in CCFs produced during the early phases of OIS upon TPR knockdown may be caused by an increase in the stability of the nuclear periphery due to the heterochromatin that remains there when SAHF are not formed." I do not completely agree with this explanation because CCF starts forming at day 3-4 but culminates at later time points. According to Figure 5A, only 5-6% of cells are positive for CCFs on day 5. What happens on day 8? By day 8, the percentage of CCF-positive cells could be 20-25%, or the number of CCFs per cell might be 0.2-0.3. If TPR is not required for CCF formation at this stage, then linking CCF to SASP at day 8 becomes critical. This suggests that another mechanism might be driving SASP expression and that TPR could be regulating downstream signaling of CCF. It is possible that changes in nuclear pore density affect the localization of cGAS from the nucleus to the cytoplasm.

      Response: __In our hands and using this IMR90 ER-RAS system, CCF formation decreases later in senescence (d8 - only 2% of cells) hence our focus on early timepoints after oncogenic RAS activation. At later timepoints, cGAS activation is also mediated by retrotransposons (de Cecco et al., Nature, 2019; Liu et al., Cell, 2023), as well as leakage of mitochondrial DNA (Victorelli et al., Nature, 2023; Chen et al., Nat. Comms, 2024), and so it is difficult to disentangle the net contribution of these three inputs.__

      v) Additionally, the authors did not address what happens in the later stages of CCF formation in the absence of TPR. If TPR is not required for CCF formation at later stages, it fails to explain the downstream processes at these time points adequately. This suggests that TPR may also have another mechanism of SASP regulation independent of CCF formation.

      __Response: __In our cellular system CCFs precede the SASP - CCFs are already present at day 3 but SASP factors are not secreted until day 5. However, CCFs are not necessarily required for maintenance of the SASP. Once initiated the SASP is maintained by cytokine feedback loops.

      …………

      Reviewer #2:

      1. The claim that TPR knockdown does not affect NFkappaB nuclear translocation indeed stands, but it would be nice if the authors also compared data across conditions in Fig. 2F, i.e. siCTRL+Ras CM versus siTPR+Ras CM in RAS cells and provided a p-value as it seems to me that there is some dampening of translocation intensity, which is clearly not the case for STOP cells. The authors focus on this for d3 and d5, but it seems to be also the case for later time points.

      __Response: __As basal NF-κB translocation is lower in RAS cells on TPR knockdown, we would expect a dampening in NF-κB translocation between siCTRL+RAS CM and siTPR+Ras CM regardless of whether there is a transportation defect. Consistent with this, the p-value for this comparison is significant, but we did not show it because it is not important in considering whether NF-κB nuclear translocation is impeded by TPR knockdown, which is the focus here. We will add a table with median nuclear:cytoplasmic NF-κB ratios and 95% confidence intervals to make the changes in basal level (treatment with STOP CM) clearer.

      Also, a comment based on literature or from the authors previous work on TPR, on the extent to which the structural integrity of the nuclear basket is at all affected upon TPR depletion would be helpful for data interpretation.

      __Response: __In the revised manuscript we will refer to the literature showing that TPR is the final component added to the nuclear pore and that its absence does not affect localisation of NUP153 to the nuclear basket (Hase and Cordes., Mol. Biol. Cell 2003; Aksenova et al., Nat Comms, 2020).

      Magnification of representative cells per each condition in Fig. 2E would be welcome.

      __Response: __We will provide a revised figure 2E with the magnifications as requested.

      Regarding the data in Figs 3 and S3: I am a bit confused about how the obviously decreased NFkappaB nuclear signal (e.g., in Fig. 3D) does not translate into a skewed N/C ratio (e.g., in Fig. 3C)? The western blots indicate that overall NFkappaB levels remain essentially unchanged? Am I missing something?

      Response: __As stated in the Methods section, we used a 50-pixel expansion of the detected nuclear area as our cytoplasmic area in the analysis (see image below). This was because we found detecting and segmenting the whole cytoplasmic area in the NF-κB channel to be unreliable. At day 3 and 5, the decrease in NF-κB nuclear signal in RAS cells on TPR knockdown was accompanied by a decrease in signal in the portion of the cytoplasm closest to the nucleus. This led to no change in the nuclear:cytoplasmic ratio. We believe the redistribution of NF-κB closer to the nucleus in the RAS siCTRL sample indicates early activation and will make this clearer in the revised text. We will also quantify the NF-κB western blots (see point 5), to help clarification of this issue.____ __

      Also, along these lines, d8 western blots seem to portray an overall drop in NFkappaB levels. Is this indeed so? Can the authors maybe quantify their blots' replicates and provide a box plot and statistical testing?

      Response: __We will provide quantification for the NF-κB western blots, though box plots would not be appropriate as we only have two replicates.__

      Regarding the ATAC-seq data from d3, I think it could be mined a bit more. For example, compare to d8 (which the authors have apparently done, but don't present in detail) and discuss which are these early regions that also become accessible by d3 and what kind of genes and motifs are associated with them. Moreover, the focus in Fig. S3E is on ATAC sites shared with d8; how about d3-specific ones? How many of these are there (if any) and how might they be affected?

      __Response: __As shown in Table S2, TPR knockdown did not cause any changes in chromatin accessibility at day 3, so there are no day 3 specific TPR dependent peaks. We will edit the text to make this clearer. We will carry out motif analysis and GREAT analysis on the day 3 peaks that become accessible in RAS cells but are not accessible in STOP (RAS-specific peaks).

      I trust that the authors quantified their STING blots for the conclusions they present, but since it is difficult to assess these confidently by eye, again, some quantification plots would be welcome in Figs 4C,D and S4D,E.

      __Response: __We will provide quantification for the STING western blots.

      As controls for Fig. 5, it would be interesting to see if active histone readouts also mark CCFs in this system.

      __Response: __Ivanov et al., J. Cell Biol., 2013 showed the absence of H3K9 acetylation from chromatin in CCFs. Further exploration of the types of chromatin/sequences in CCFs is outside the scope of our current manuscript.

      *The POM121 channel in Fig. 5C appears to have some small signal foci in the cytoplasm; could these be small CCFs? More generally, the authors focus on these large blobs that only appear in

      __Response: __The small signal foci the reviewer is highlighting are background from the POM121 antibody staining rather than CCFs – they do not show DAPI staining, and similar foci are evident in non-senescent cells where CCFs are generally not present. Our unpublished data (see response to Reviewer 1, point iv) from day 8 cells shows that only ~2% of senescent cells are positive for CCF regardless of TPR knockdown, which is a similar number to that observed in non-senescent cells at earlier timepoints. Thus, in our hands CCF formation occurs earlier, triggering the SASP, rather than at day 8 when the SASP is already established and reinforced through positive feedback cytokine signalling.

      I wonder if there is a simple experiment the authors could do to test if this mechanism is only linked to senescence, specifically oncogene-induced senescence? I don't think this is needed to support the conclusions drawn here, but it could significantly broaden the scope of their discovery of, for example, this was true in other senescence models or during proinflammatory activation in general?

      __Response: __These are interesting suggestions, but setting up, characterising and quantifying other senescence models will take a substantial amount of time that would be outside the scope of our current manuscript.

      ………….

      Reviewer #3

      1. The study uses a single cell strain IMR90 undergoing a single form of senescence, induced by activated Ras. To show the generalizability of the finding, the authors are advised to inhibit TPR in other forms of senescence in addition to IMR90. For example, IR or etoposide induces greater amount of CCF than in OIS of IMR90. BJ, MEFs, and ARPE-19 senescence also show prominent CCF.

      __Response: __These are interesting suggestions, but as we responded to reviewer 2, setting up, characterising and quantifying other senescence models will take a substantial amount of time that would be outside the scope of our current manuscript.

      To convincing show the CCF pathway is involved, the authors need to measure the activity of cGAS-STING pathway. Including cGAMP ELISA will be informative.

      __Response: __We thank the reviewer for this suggestion, and we will try to include this assay in our revised manuscript.

      The authors used conditioned media to show that TPR KD does not directly affect NFkB nuclear translocation. While this is helpful, conditions other than senescence will be more direct. For example, TNFa treatment or poly I:C transfection induces efficient NFkB nuclear translocation in IMR90 cells.

      __Response: __This experiment (Fig. 2EF) was designed to simply show that knocking down TPR does not impair the ability of activated NFkB to enter the nucleus, it is not about senescence per se. Indeed, this is why we included the addition of SASP (RAS) conditioned media to non-senescence STOP cells in Fig. 2. We do not think investigating other methods of activating NFkB would add more to the question of whether TPR loss abrogates NFkB nuclear import.

      Fig. 4C and Fig. S4D are identical.

      Response: Though these STING immunoblots look similar; in fact they are not identical. Below we attach the raw original image in which both biological replicates (Fig 4C and S4D) for Day 3 were run on the same gel as proof of this claim.

      Figure legend for Fig. S4F is mislabeled.

      __Response: __We will correct this.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      DNA damage triggers senescence, inducing chromatin reorganization and SASP activation. The authors previously demonstrated that the TPR nucleoprotein at nuclear pores is crucial for both SAHF formation and SASP activation during senescence. Here they also showed that TPR is required for the formation of cytoplasmic chromatin fragments (CCF), which activate cGAS-STING-TBK1-NF-kB signaling to express SASP. While the mechanistic regulation of CCF formation by TPR remains unclear, their study provides compelling evidence of downstream processes involving CCF. This study offers new insights into CCF formation, suggesting a promising direction for further research. I endorse the manuscript; however, there are several concerns that need addressing before acceptance.

      i) "Enhancers dependent on TPR during senescence are enriched for binding sites of inflammatory transcription factors".

      Proximity to genes does not confirm an enhancer role for that gene, although Tasdemir et al., 2016 suggested this. At that time, HI-C and Hi-CHiP techniques were not well-established. Nowadays, without combining HI-C and H3K27ac ChIP, Hi-ChIP alone cannot definitively identify actual enhancer regions. If we repeatedly use the Tasdemir et al., 2016 map, we risk incorrect mapping of enhancers of SASP. The authors should either use other public Hi-C databases to map the enhancer of SASP or temper their conclusions about enhancers. Otherwise, this could set a precedent for the SASP enhancer region that might not be entirely accurate.

      ii) Many of these include putative enhancers located close to key SASP genes, such as IL1B and IL8 (Figure 1D).

      I have the same concern as mentioned earlier about enhancers. However, I am interested in knowing the other key SASP genes where DNA is accessible near the genes. A supplementary table listing key SASP genes along with their distances to the TSS and affected by TPR knock-down would be helpful.

      iii) "As we previously reported, knockdown of TPR (siTPR) in RAS cells blocks SAHF formation, but it also results in reduced nuclear localisation (decreased nucleocytoplasmic ratio) of NF-κB, consistent with decreased NF-κB activation (Figure 2A and B, Figure S2A)." TPR is required for CCF, SASP, and SAHF. The relationship between CCF and SASP is well established, but the relationship between SAHF and CCF/SASP remains elusive. Both SAHF and CCF are enriched with heterochromatin markers, suggesting that CCF might originate from SAHF. However, this has not been confirmed. Do the authors think that SAHF is a prerequisite for CCF in the OIS model, or is it an independent event?

      iv) The authors suggested that "it is plausible that the decrease in CCFs produced during the early phases of OIS upon TPR knockdown may be caused by an increase in the stability of the nuclear periphery due to the heterochromatin that remains there when SAHF are not formed." I do not completely agree with this explanation because CCF starts forming at day 3-4 but culminates at later time points. According to Figure 5A, only 5-6% of cells are positive for CCFs on day 5. What happens on day 8? By day 8, the percentage of CCF-positive cells could be 20-25%, or the number of CCFs per cell might be 0.2-0.3. If TPR is not required for CCF formation at this stage, then linking CCF to SASP at day 8 becomes critical. This suggests that another mechanism might be driving SASP expression and that TPR could be regulating downstream signaling of CCF. It is possible that changes in nuclear pore density affect the localization of cGAS from the nucleus to the cytoplasm.

      Significance

      The authors previously demonstrated that the TPR nucleoprotein at nuclear pores is crucial for both SAHF formation and SASP activation during senescence. Here they also showed that TPR is required for the formation of cytoplasmic chromatin fragments (CCF), which activate cGAS-STING-TBK1-NF-kB signaling to express SASP. While the mechanistic regulation of CCF formation by TPR remains unclear, their study provides compelling evidence of downstream processes involving CCF. This study offers new insights into CCF formation, suggesting a promising direction for further research.

      However, there are some limitations to this study. The enhancer mapping for SASP is outdated, as advancements in Hi-C have significantly developed this area. Therefore, the claimed enhancers of SASP may not be accurate. Additionally, the authors did not address what happens in the later stages of CCF formation in the absence of TPR. If TPR is not required for CCF formation at later stages, it fails to explain the downstream processes at these time points adequately. This suggests that TPR may also have another mechanism of SASP regulation independent of CCF formation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors addressed how long-range interactions between boundary elements are established and influence their function in enhancer specificity. Briefly, the authors placed two different reporters separated by a boundary element. They inserted this construct ectopically ~140 kb away from an endogenous locus that contains the same boundary element. The authors used expression patterns driven by nearby enhancers as an output to determine which enhancers the reporters interact with. They complemented this analysis with 3D DNA contact mapping. The authors found that the orientation of the boundary element determined which enhancers each reporter interacted with. They proposed that the 3D interaction topology, whether being circular or stem configuration, distinguished whether the interaction was cohesin mediated or through an independent mechanism termed pairing.

      Strengths:

      The transgene expression assays are built upon prior knowledge of the enhancer activities. The 3D DNA contacts confirm that transgene expression correlates with the contacts. Using 4 different orientations covers all combinations of the reporter genes and the boundary placement.

      Weaknesses:

      The interpretation of the data as a refusal of loop extrusion playing a role in TAD formation is not warranted, as the authors did not deplete the loop extruders to show that what they measure is independent.

      (1.1) To begin with, our findings do not exclude the possibility that cohesin loop extrusion has some sort of role in the formation or maintenance of TADs in flies or other aspects of chromosome structure.  On the other hand, it clearly is not determinative in defining the end-points of TADs or in generating the resulting topology (stem-loop or circle-loop).  Our main point, which we feel we have established unequivocally, is that it can’t explain many essential features of TADs or chromosome loops (see below) in Drosophila.  This reviewer agrees with this point in their next paragraph (below).  We also think that the loop extrusion model’s general acceptance as THE driving force behind TAD formation in mammals is unwarranted and not fully consistent with the available data, as explained below.

      As to the reviewer’s specific point regarding depletion of loop extruders, we first note that completely eliminating factors encoding cohesin subunits in fly embryos isn’t readily feasible.  As cohesin is essential starting at the beginning of embryonic development, and is maternally deposited, knockdowns/depletions would likely be incomplete and there would always be some remaining activity.  As long as there is some residual activity—and no disruption in TAD formation is observed—this experimental test would be a failure.  In addition, any defects that are observed might arise not from a failure in TAD formation via loop extrusion but rather because the rapid mitotic cycles would be disrupted.  A far better approach would be to deplete/knockdown cohesin subunits in tissue culture cells, as there is no requirement for the cells to undergo embryonic development.  Moreover, since cell division is relatively slow, the depletion would likely eliminate much if not all of the activity before a checkpoint is reached.

      While a drastic depletion of cohesin is not feasible in our model organism, we would draw the reviewer’s attention to an experiment of this type which has already been done in mammalian tissue culture cells by Goel et al. (Goel et al. 2023).  Unlike most Hi-C studies in mammals, the authors used region capture MicroC (RCMC).  In contrast to published genome-wide mammalian MicroC experiments (c.f., (Hsieh et al. 2020; Krietenstein et al. 2020)) which require large bin sizes to visualize mammalian “TADs,” the resolution of the experiments in Goel et al. (Goel et al. 2023) is similar to the resolution in our MicroC experiments (200-400 bp).  A MicroC contact map from Goel et al. shows the Pdm1g locus on chromosome 5 before and after Rad21 depletion.  The contact map visualizes a 250 kb DNA segment, which is only slightly larger than the ~230 kb DNA segment in Fig. 2C in our paper.

      In this experiment, there was a 97% reduction in the amount of Rad21.  However, as can be seen by comparing the contact profiles above and below the diagonal, there is little or no difference in TAD organization after cohesin depletion when individual TADs are visualized with a bin size of 250 bp.  These results would indicate that mammalian TADs do not require cohesin.

      Note also that the weak 45o stripes connecting different TADs (c.f. blue/green arrowheads) are still present after Rad21 depletion.  In the most popular version of the loop extrusion model, cohesin loads at a site(s) somewhere in the TAD-to-be, and then extrudes both strands until it bumps into CTCF roadblocks.  As illustrated in Figure Sup 2, this mechanism generates a vertical stripe originating at the cohesin loading site and extending until cohesin bumps into the left or right roadblock, at which point the stripe transitions into 45o stripe that ends when cohesin bumps into the other roadblock.  While 45o stripes are visible, there is no hint of a vertical stripe.  This suggests that the mechanism for generating stripes, if it is an active mechanism (rather than passive diffusion) may be quite different.  The 45o stripes must be generated by a factor(s) that is anchored to one (blue arrowhead) or both (green arrowhead) boundaries.  In addition, this factor, whatever it is, is not cohesin.  The reason for this is that the 45o stripes are present both before and after Rad21 depletion.  Moreover, if one were to imagine that the stripes represent a process involved in TAD formation, this process does not require cohesin (see Goel et al 2023).

      It is worth noting another observation that is inconsistent with the cohesin loop extrusion/CTCF roadblock model for TAD formation/maintenance.  CTCF is not found at all of the TAD boundaries in this 250 kb DNA region.  This would suggest that there are other DNA binding proteins that have chromosomal architectural functions besides CTCF.  In flies, many of the chromosomal architectural proteins are, like CTCF, polydactyl zinc finger (PZF) proteins (Bonchuk et al. 2021; Bonchuk et al. 2022; Fedotova et al. 2017).  These include Su(Hw), CTCF, Pita, Zipic and CLAMP.  The PZF family in flies is quite large.  There are ~250 different PZF genes, and since only a handful of these have been characterized, it seems likely that additional members of this family will have architectural functions.  Thus far, only one boundary protein, CTCF, has received attention in studies on mammalian chromosome architecture.  As the mammalian genome is much larger and more complicated than the fly genome, it is difficult to believe that CTCF is the sole chromosomal architectural protein in mammals.  In this respect, it is worth noting that there are ~800 members of the PZF family in mammalian genomes (Fedotova et al. 2017).

      Goel et al. (Goel et al. 2023) did observe alterations in the contact profiles after Rad21 depletion when they visualized the Ppm1g region at much lower resolution (bin sizes of 5 kb and 1 kb). The 5 kb bin size visualizes a region of ~1.2 Mb, while the 1 kb bin size visualizes a region that spans ~800 kb.  These large triangular units do not correspond to the individual TADs seen when Goel et al. visualized the Ppm1g locus at 250 bp resolution. 

      Nor do they correspond to TADs in Fig. 2 of our paper.  Instead they represent TAD neighborhoods which, likely consist of 20-30 or more individual TADs.  Consequently the alterations in contact patterns seen after Rad21 depletion are occurring at the level of TAD neighborhoods.  This can be seen by comparing pixel density inside the blue lines before (above the diagonal) and after Rad21 depletion (below the diagonal) (Goel et al 2023).  The more distant contacts between individual TADs within this neighborhood are preferentially reduced by Rad21 depletion (the region below and to the left of the double arrowhead).  By contrast, the TADs themselves are unaffected, as are contacts between individual TADs and their immediate neighbors (see purple and light green asterisk).  The other interesting feature is the loss of contacts between what appears to be partially overlapping neighborhoods.  This loss of neighborhood-toneighborhood contacts can be seen in the region located between the green and blue lines.  The neighborhood that appears to partially overlap the Ppm1g neighborhood is outlined in purple.

      It worth noting that, with the exception of the high resolution experiments in Goel et al., all of the other studies on cohesin (and CTCF) have examined the effects on contact maps within (and between) large neighborhoods (bin sizes >1 kb).  In most cases, these large neighborhoods are likely to be composed of many individual TADs like those seen in Goel et al. and in Fig. 2 of our paper.  We also observe larger neighborhoods in the fly genome, though they do not appear to be as large as those in mammals.  Our experiments do not address what role cohesin might have in facilitating contacts between more distant TADs located within the same neighborhoods, or between TADs in different neighborhoods, or whether loop extrusion is involved.

      We would also note that the Drosophila DNA segment in Fig. 2C contains 35 different genes, while the mammalian DNA segment shown in Fig. 1 has only 9.  Thus, in this part of the fly genome, Pol II genes are more densely packed than in the mammalian DNA segment.  Much of the fly genome is also densely packed, and the size of individual TADs will likely be smaller, on average, than in mammals.  Nevertheless, the MicroC profiles are not all that different.  As is also common in flies, each TAD in the Ppm1g region only encompasses one or two genes.  Note also that there are no volcano triangles with plumes as would be predicted for TADs that have a stem-loop topology.

      In fact, as shown in Author response image 1, the high-resolution contact profile for the Ppm1g region shows a strong resemblance to that observed for the fly Abd-B regulatory domains.  These regulatory domains are part of larger neighborhood that encompasses the abd-A and Abd-B genes and their regulatory domains.

      Author response image 1.

      Abd-B regulatory domains

      As the authors show, the single long DNA loop mediated by cohesin loop extrusion connecting the ectopic and endogenous boundary is clearly inconsistent with the results, therefore the main conclusion of the paper that the 3D topology of the boundary elements a consequence of pairing is strong. However, the loop extrusion and pairing are not mutually exclusive models for the formation of TADs. Loop-extruding cohesin complexes need not make a 140 kb loop, multiple smaller loops could bring together the two boundary elements, which are then held together by pairing proteins that can make circular topologies.

      (1.2) In the pairing model, distant boundaries bump into each other (by random walks or partially constrained walks), and if they are “compatible” they pair with each other, typically in an orientation-dependent manner.  As an alternative, the reviewer argues that cohesin need not make one large 140 kb loop.  Instead it could generate a series of smaller loops (presumably corresponding to the intervening TADs).  These smaller loops would bring homie in the transgene in close proximity to the eve locus so that it could interact with the endogenous homie and nhomie elements in the appropriate orientation, and in this way only one of the reporters would be ultimately activated.

      There are two problems with the idea that cohesin-dependent loop extrusion brings transgene homie into contact with homie/nhomie in the eve locus by generating a series of small loops (TADs).  The first is the very large distances over which specific boundary:boundary pairing interactions can occur.  The second is that boundary:boundary pairing interactions can take place not only in cis, but also in trans.

      We illustrate these points with several examples. 

      Fujioka et al. 2016, Fig 7 shows an experiment in which attP sites located ~2 Mb apart were used to insert two different transgenes, one containing a lacZ reporter and the other containing the eve anal plate enhancer (AP) (Fujioka et al. 2016).  If the lacZ reporter and the AP transgenes also contain homie, the AP enhancer can activate lacZ expression (panel A,).  On the other hand, if one of the transgenes has lambda DNA instead of homie, no regulatory interactions are observed (panel A,).  In addition, as is the case in our experiments using the -142 kb platform, orientation matters.  In the combination on the top left, the homie boundary is pointing away from both the lacZ reporter and the AP enhancer.  Since homie pairs with itself head-tohead, pairing brings the AP enhancer into contact with the lacZ reporter.  A different result is obtained for the transgene pair in panel A on the top right.  In this combination, homie is pointing away from the lacZ reporter, while it is pointing towards the AP enhancer.  As a consequence, the reporter and enhancer are located on opposite sides of the paired homie boundaries, and in this configuration they are unable to interact with each other.

      On the top left of panel B, the homie element in the AP enhancer transgene was replaced by a nhomie boundary oriented so that it is pointing towards the enhancer.  Pairing of homie and nhomie head-to-tail brings the AP enhancer in the nhomie transgene into contact with the lacZ reporter in the homie transgene, and it activates reporter expression.  Finally, like homie, nhomie pairs with itself head-to-head, and when the nhomie boundaries are pointing towards both the AP reporter and the lacZ reporter, reporter expression is turned on.

      Long distance boundary-dependent pairing interactions by the bithorax complex Mcp boundary have also been reported in several papers.  Fig. 6 from Muller et al. (Muller et al. 1999) shows the pattern of regulatory interactions (in this case PRE-dependent “pairing-sensitive silencing”) between transgenes that have a mini-white reporter, the Mcp and scs’ boundaries and a PRE that is located close to Mcp.  In this experiment flies carrying transgenes inserted at the indicated sites on the left and right arms of the 3rd chromosome were mated in pairwise combinations, and their trans-heterozygous progeny examined for pairing-sensitive silencing of the mini-white reporter.

      Two examples of long-distance pairing-sensitive silencing mediated by Mcp/scs’ are shown in Fig. 5b from Muller et al. 1999.  The transgene inserts in panel A are w#12.43 and ff#10.5w#12.43 is inserted close to the telomere of 3R at 99B.  ff10.5 is inserted closer to the middle of 3R at 91A.  The estimate distance between them is 11.3 Mb.  The transgene inserts in panel B are ff#10.5 and ff#11.102ff#11.102 is inserted at 84D, and the distance between them is 11 Mb.  Normally, the eye color phenotype of the mini-white reporter is additive: homozygyous inserts have twice as dark eye color as hemizygous inserts, while in trans-_heterozygous flies the eye color would be the sum of the two different transgenes.  However, when a PRE is present and the transgene can pair, silencing is observed.  In panel A, the t_rans-_heterozygous combination has a lighter eye color than either of the parents.  In panel B, the _trans-_heterozygous combination is darker than one of the parents (_ff#10.5) but much lighter than the other (ff#11.102).

      All ten of the transgenes tested were able to engage in long distance (>Mbs) trans_regulatory interactions; however, likely because of how the chromosome folds on the Mb scale (e.g., the location of meta-loops: see #2.1 and Author response image 3) not all of the possible pairwise silencing interactions are observed.  The silencing interactions shown in Muller et.al. are between transgenes inserted on different homologs.  _Mcp/scs'-dependent silencing interactions can also occur in cis. Moreover, just like the homie and nhomie experiments described above, Muller et.al. (Muller et al. 1999) found that Mcp could mediate long-distance activation of mini-white and yellow by their respective enhancers.

      The pairing-sensitive activity of the PRE associated with the Mcp boundary is further enhanced when the mini-white transgene has the scs boundary in addition to Mcp and scs’.  In the experiment shown in Fig. 8 from Muller et al. 1999, the pairing-sensitive silencing interactions of the Mcp/scs’/scs transgene are between transgenes inserted on different chromosomes.  Panel A shows pairing-sensitive silencing between w#15.60, which is on the X chromosome, and w#15.102, which is on the 2nd chromosome.  Panel B shows pairing-sensitive silencing between the 2nd chromosome insert w#15.60 and a transgene, w#15.48, which is inserted on the 3rd chromosome.

      The long-distance trans and cis interactions described here are not unique to homie, nhomie, Mcp, scs’, or scs.  Precisely analogous results have been reported by Sigrist and Pirrotta (Sigrist and Pirrotta 1997) for the gypsy boundary when the bxd PRE was included in the mini-white transgene.  Also like the Mcp-containing transgenes in Muller et al. (Muller et al. 1999), Sigrist and Pirrotta observed pairing-sensitive silencing between gypsy bxd_PRE _mini-white transgenes inserted on different chromosomes.  Similar long-distance (Mb) interactions have been reported for Fab-7 (Bantignies et al. 2003; Li et al. 2011).  In addition, there are examples of “naturally occurring” long-distance regulatory and/or physical interactions.  One would be the regulatory/physical interactions between the p53 enhancer upstream of reaper and Xrp1 which was described by Link et al. (Link et al. 2013).  Another would be the nearly 60 meta-loops identified by Mohana et al. (Mohana et al. 2023).

      Like homie at -142 kb, the regulatory interactions (pairing-sensitive silencing and enhancer activation of reporters) reported in Muller et al. (Muller et al. 1999) involve direct physical interactions between the transgenes.  Vazquez et al. (Vazquez et al. 2006) used the lacI/lacO system to visualize contacts between distant scs/Mcp/scs’-containing transgenes in imaginal discs.  As indicated in Vasquez et al. 2006, Table 3 lines #4-7,  when both transgenes have Mcp and were inserted on the same chromosome, they colocalized in trans-_heterozygotes (single dot) in 94% to 97% of the disc nuclei in the four pairwise combinations they tested.  When the transgenes both lacked _Mcp (Vasquez et al. 2006, Table 3 #1), co-localization was observed in 4% of the nuclei.  When scs/Mcp/scs’-containing transgenes on the 2nd and 3rd chromosome were combined (Vasquez et al. 2006, Table 3 #8), colocalization was observed in 96% of the nuclei.  They also showed that four different scs/Mcp/scs’ transgenes (two at the same insertion site but on different homologs, and two at different sites on different homologs) co-localized in 94% of the eye imaginal disc nuclei (Vasquez et al. 2006, Table 3 #9).  These pairing interactions were also found to be stable over several hours.  Similar co-localization experiments together with 3C were reported by Li et al. (Li et al. 2011).

      The de novo establishment of trans interactions between compatible boundary elements has been studied by Lim et al. (Lim et al. 2018).  These authors visualized transvection (enhancer activation of a MS2 loop reporter in trans) mediated by the gypsy insulator, homie and Fab-8  in NC14 embryos.  When both transgenes shared the same boundary element, transvection/physical pairing was observed in a small subset of embryos.  The interactions took place after a delay and increased in frequency as the embryo progressed into NC14.  As expected, transvection was specific: it was not observed when the transgenes had different boundaries.  For homie it was also orientation-dependent.  It was observed when homie was orientated in the same direction in both transgenes, but not when homie was orientated in opposite directions in the two transgenes.

      While one could imagine that loop extrusion-dependent compaction of the chromatin located between eve and the transgene at -142 kb into a series of small loops (the intervening TADs) might be able to bring homie in the transgene close to homie/nhomie in the eve locus, there is no cohesinbased loop extrusion scenario that would bring transgenes inserted at sites 6 Mb, 11 Mb, on different sides of the centromere, or at opposite ends of the 3rd chromosome together so that the distant boundaries recognize their partners and physically pair with each other.  Nor is there a plausible cohesin-based loop extrusion mechanism that could account for the fact that most of the documented long-distance interactions involve transgenes inserted on different homologs.  This is not to mention the fact that long-distance interactions are also observed between boundarycontaining transgenes inserted on different chromosomes.

      In fact, given these results, one would logically come to precisely the opposite conclusion.  If boundary elements inserted Mbs apart, on different homologs and on different chromosomes can find each other and physically pair, it would be reasonable to think that the same mechanism (likely random collisions) is entirely sufficient when they are only 142 kb apart.

      Yet another reason to doubt the involvement or need for cohesin-dependent loop extrusion in bringing the transgene homie in contact with the eve locus comes from the studies of Goel et al. (Goel et al. 2023).  They show that cohesin has no role in the formation of TADs in mammalian tissue culture cells.  So if TADs in mammals aren’t dependent on cohesin, there would not be a good reason to think at this point that the loops (TADs) that are located between eve and the transgene are generated by, or even strongly dependent on, cohesin-dependent loop extrusion.

      It is also important to note that even if loop-extrusion were to contribute to chromatin compaction in this context and make the looping interactions that lead to orientation-specific pairing more efficient, the role of loop extrusion in this model is not determinative of the outcome, it is merely a general compaction mechanism.  This is a far cry from the popular concept of loop extrusion as being THE driving force determining chromosome topology at the TAD level.

      Reviewer #2 (Public Review):

      In Bing et al, the authors analyze micro-C data from NC14 fly embryos, focusing on the eve locus, to assess different models of chromatin looping. They conclude that fly TADs are less consistent with conventional cohesin-based loop extrusion models and instead rely more heavily on boundaryboundary pairings in an orientation-dependent manner.

      Overall, I found the manuscript to be interesting and thought-provoking. However, this paper reads much more like a perspective than a research article. Considering eLIFE is aimed at the general audience, I strongly suggest the authors spend some time editing their introduction to the most salient points as well as organizing their results section in a more conventional way with conclusion-based titles. It was very difficult to follow the authors' logic throughout the manuscript as written. It was also not clear as written which experiments were performed as part of this study and which were reanalyzed but published elsewhere. This should be made clearer throughout.

      It has been shown several times that Drosophila Hi-C maps do not contain all of the features (frequent corner peaks, stripes, etc.) observed when compared to mammalian cells. Considering these features are thought to be products of extrusion events, it is not an entirely new concept that Drosophila domains form via mechanisms other than extrusion.

      (2.1) While there are differences between the Hi-C contact profiles in flies and mammals, these differences likely reflect in large part the bin sizes used to visualize contact profiles.  With the exception of Goel et al. (Goel et al. 2023), most of the mammalian Hi-C studies have been low resolution restriction enzyme-based experiments, and required bin sizes of >1 kb or greater to visualize what are labeled as  “TADs.”  In fact, as shown by experiments in Goel et al., these are not actually TADs, but rather a conglomeration of multiple TADs into a series of TAD neighborhoods.  The same is true for the MicroC experiments of Krietenstein et al. and Hsieh et al. on human and mouse tissue culture cells (Hsieh et al. 2020; Krietenstein et al. 2020).  This is shown in Author response image 2.  In this image, we have compared the MicroC profiles generated from human and mouse tissue culture cells with fly MicroC profiles at different levels of resolution.

      For panels A-D, the genomic DNA segments shown are approximately 2.8 Mb, 760 kb, 340 kb, and 190 kb.  For panels E-H, the genomic DNA segments shown are approximately 4.7 Mb, 870 kb, 340 kb and 225 kb.  For panels I-L, the genomic DNA segments shown are approximately 3 Mb, 550 kb, 290 kb and 175 kb.

      As reported for restriction enzyme-based Hi-C experiments, a series of stripes and dots are evident in mammalian MicroC profiles.  In the data from Krietenstein et al., two large TAD “neighborhoods” are evident with a bin size of 5 kb, and these are bracketed by 45o stripes (A: black arrows).  At 1 kb (panel B), the 45o stripe bordering the neighborhood on the left no longer defines the edge of the neighborhood (blue arrow: panel B), and both stripes become discontinuous (fuzzy dots).  At 500 (panel C) and 200 bp (panel D) bin sizes, the stripes largely disappear (black arrows) even though they were the most prominent feature in the TAD landscape with large bin sizes.  At 200 bp, the actual TADs (as opposed to the forest) are visible, but weakly populated.  There are no stripes, and only one of the TADs has an obvious “dot” (green asterisk: panel C).

      Author response image 2.

      Mammalian MicroC profiles different bin sizes.

      Large TAD neighborhoods bordered by stripes are also evident in the Hsieh et al. data set in Author response image 2 panels E and F (black arrows in E and F and green arrow in F).  At 400 bp resolution (panel G), the narrow stripe in panel F (black arrows) becomes much broader, indicating that it is likely generated by interactions across one or two small TADs that can be discerned at 200 bp resolution.  The same is true for the broad stripe indicated by the green arrows in panels F, G and H.  This stripe arises from contacts between the TADs indicated by the red bar in panels G and H and the TADs to the other side of the volcano triangle with a plume (blue arrow in panel H).  As in flies, we would expect that this volcano triangle topped by a plume corresponds to a stem-loop.  However, the resolution is poor at 200 bp, and the profiles of the neighboring TADs are not very distinct.

      For the fly data set, stripes can be discerned when analyzed at 800 bp resolution (see arrows in Author response image 3);  however, these stripes are flanked by regions of lower contact, and represent TAD-TAD interactions.  At 400 bp, smaller neighborhoods can be discerned, and these neighborhoods exhibit a complex pattern of interaction with adjacent neighborhoods.  With bin sizes of 200 bp, individual TADs are observed, as are TAD-TAD interactions like those seen near eve.  Some of the TADs have dots at their apex, while others do not—much like what is seen in the mammalian MicroC studies.

      Author response image 3.

      Mammalian MicroC profiles different bin sizes.

      Stripes: As illustrated in Author response image 2 A-D and E-H, the continuous stripes seen in low resolution mammalian studies (>1 kb bins) would appear to arise from binning artefacts.  At high resolution where single TADs are visible, the stripes seem to be generated by TAD-TAD interactions, and not by some type of “extrusion” mechanism.  This is most clearly seen for the volcano with plume TAD in Author response inage 2 G and H.  While stripes in Author response image 2 disappear at high resolution, this is not always true.  There are stripes that appear to be “real” in Geol et al. 2023 for the TADs in the Ppm1g region, and in Author response image 1 for the Abd-B regulatory domain TADs.  Since the stripes in the Ppm1g region are unaffected by Rad21 depletion, some other mechanism must be involved (c.f. (Shidlovskii et al. 2021)).

      Dots: The high resolution images of mammalian MicroC experiments in Author response image 2D and H show that, like Drosophila (Author response image 3L), mammalian TADs don’t always have a “dot” at the apex of the triangle.  This is not surprising.  In the MicroC procedure, fixed chromatin is digested to mononucleosomes with MNase.  Since most TAD boundaries in flies, and presumably also in mammals, are relatively large (150-400 bp) nuclease hypersensitive regions, extensive MNase digestion will typically reduce the boundary element sequences to oligonucleotides.

      In flies, the only known sequences (at least to date) that end up giving dots (like those seen in Author response image 1) are bound by a large (>1,000 kd) GAF-containing multiprotein complex called LBC.  In the Abd-B region of BX-C, LBC binds to two ~180 bp sequences in Fab-7 (dHS1 and HS3: (Kyrchanova et al. 2018; Wolle et al. 2015), and to the centromere proximal (CP) side of Fab-8.  The LBC elements in Fab-7 (dHS1) and Fab-8 (CP) have both blocking and boundary bypass activity (Kyrchanova et al. 2023; Kyrchanova et al. 2019a; Kyrchanova et al. 2019b; Postika et al. 2018).  Elsewhere, LBC binds to the bx and bxd PREs in the Ubx regulatory domains, to two PREs upstream of engrailed, to the hsp70 promoter, the histone H3-H4 promoters, and the eve promoter (unpublished data).  Based on ChIP signatures, it likely binds to most PREs/tethering elements in the fly genome (Batut et al. 2022; Li et al. 2023).  Indirect end-labeling experiments (Galloni et al. 1993; Samal et al. 1981; Udvardy and Schedl 1984) indicate that LBC protects an ~150-180 bp DNA segment from MNase digestion, which would explain why LBC-bound sequences are able to generate dots in MicroC experiments.  Also unlike typical boundary elements, the pairing interactions of the LBC elements we’ve tested appear to be orientation-independent (unpublished data).

      The difference in MNase sensitivity between typical TAD boundaries and LBC-bound elements is illustrated in the MicroC of the Leukocyte-antigen-related-like (Lar) meta-loop in Author response image 4 panels A and B.  Direct physical pairing of two TAD boundaries (blue and purple) brings two TADs encompassing the 125 kb lar gene into contact with two TADs in a gene poor region 620 kb away.  This interaction generates two regions of greatly enhanced contact: the two boxes on either side of the paired boundaries (panel A).  Note that like transgene homie pairing with the eve boundaries, the boundary pairing interaction that forms the lar meta-loop is orientation-dependent.  In this case the TAD boundary in the Lar locus pairs with the TAD boundary in the gene poor region head-to-head (arrow tip to arrow tip), generating a circle-loop.  This circle-loop configuration brings the TAD upstream of the blue boundary into contact with the TAD upstream of the purple boundary.  Likewise, the TAD downstream of the blue boundary is brought into contact with the TAD downstream of the purple boundary.

      In the MicroC procedure, the sequences that correspond to the paired boundaries are not recovered (red arrow in Author response image 4 panel B).  This is why there are vertical and horizontal blank stripes (red arrowheads) emanating from the missing point of contact.  Using a different HiC procedure (dHS-C) that allows us to recover sequences from typical boundary elements (Author response image 4 panels C and D), there is a strong “dot” at the point of contact which corresponds to the pairing of the blue and purple boundaries.

      There is a second dot (green arrow) within the box that represents physical contacts between sequences in the TADs downstream of the blue and purple boundaries.  This dot is resistant to MNase digestion and is visible both in the MicroC and dHS-C profiles.  Based on the ChIP signature of the corresponding elements in the two TADs downstream of the blue and purple boundaries, this dot represents paired LBC elements.

      Author response image 4.

      Lar metaloop. Panels A & bB: MicroC. Panels C & D: dHS-C

      That being said, the authors' analyses do not distinguish between the formation and the maintenance of domains. It is not clear to this reviewer why a single mechanism should explain the formation of the complex structures observed in static Hi-C heatmaps from a population of cells at a single developmental time point. For example, how can the authors rule out that extrusion initially provides the necessary proximity and possibly the cis preference of contacts required for boundaryboundary pairing whereas the latter may more reflect the structures observed at maintenance?

      (2.2) The MicroC profiles shown in Fig. 2 of our paper were generated from nuclear cycle (NC) 14 embryos.  NC14 is the last nuclear cycle before cellularization (Foe 1989).  After the nuclei exit mitosis, S-phase begins, and because satellite sequences are late replicating in this nuclear cycle, S phase lasts 50 min instead of only 4-6 min during earlier cycles (Shermoen et al. 2010).  So unlike MicroC studies in mammals, our analysis of chromatin architecture in NC14 embryos likely offers the best opportunity to detect any intermediates that are generated during TAD formation.  In particular, we should be able to observe evidence of cohesin linking the sequences from the two extruding strands together (the stripes) as it generates TADs de novo.  However, there are no vertical stripes in the eve TAD as would be expected if cohesin entered at a few specific sites somewhere within the TAD and extruded loops in opposite directions synchronously, nor are their stripes at 45o as would be expected if it started at nhomie or homie (see Figure Supplemental 1).  We also do not detect cohesin-generated stripes in any of the TADs in between eve and the attP site at -142 kb. Note that in some models, cohesin is thought to be continuously extruding loops. After hitting the CTCF roadblocks, cohesin either falls off after a short period and starts again or it breaks through one or more TAD boundaries generating the LDC domains. In this dynamic model, stripes of crosslinked DNA generated by the passing cohesin complex should be observed throughout the cell cycle.  They are not. 

      As for formation versus maintenance, and the possible involvement of cohesin loop extrusion in the former, but not the latter:  This question was indirectly addressed in point #1.2 above.  In this point we described multiple examples of specific boundary:boundary pairing interactions that take place over Mbs, in cis and in trans and even between different chromosomes.  These long-distance interactions don’t preexist;  instead they must be established de novo and then maintained.  This process was actually visualized in the studies of Lim et al. (Lim et al. 2018) on the establishment of trans boundary pairing interactions in NC14 embryos.  There is no conceivable mechanism by which cohesin-based loop extrusion could establish the long or short distance trans interactions that have been documented in many studies on fly boundary elements.  Also as noted above, its seems unlikely that it is necessary for long-range interactions in cis.  

      A more plausible scenario is that cohesin entrapment helps to stabilize these long-distance interactions after they are formed.  If this were true, then one could argue that cohesin might also function to maintain TADs after boundaries have physically paired with their neighbors in cis.  However, the Rad21 depletion experiments of Goel et al. (Goel et al. 2023) would rule out an essential role for cohesin in maintaining TADs after boundary:boundary pairing.  In short, while we cannot formally rule out that loop extrusion might help bring sequences closer together to increase their chance of pairing, neither the specificity of that pairing, nor its orientation can be explained by loop extrusion.  Furthermore, since pairing in trans cannot be facilitated by loop extrusion, invoking it as potentially important for boundary-boundary pairing in cis can only be described as a potential mechanism in search of a function, without clear evidence in its favor.

      On the other hand, the apparent loss of contacts between TADs within large multi-TAD neighborhoods (Geol et al. 2023) would suggest that there is some sort of decompaction of neighborhoods after Rad21 depletion.  It is possible that this might stress interactions that span multiple TADs as is the case for homie at -142, or for the other examples described in #1.2 above.  This kind of involvement of cohesin might or might not be associated with a loop extrusion mechanism.

      Future work aimed at analyzing micro-C data in cohesin-depleted cells might shed additional light on this.

      (2.3) This experiment has been done by Goel et al. (Goel et al. 2023) in mammalian tissue culture cells.  They found that TADs, as well as local TAD neighborhoods, are not disrupted/altered by Rad21 depletion (see Geol at al. 2023 and our response to point #1.1 of reviewer #1).

      Additional mechanisms at play include compartment-level interactions driven by chromatin states. Indeed, in mammalian cells, these interactions often manifest as a "plume" on Hi-C maps similar to what the authors attribute to boundary interactions in this manuscript. How do the chromatin states in the neighboring domains of the eve locus impact the model if at all?

      (2.4) Chromatin states have been implicated in driving compartment level interactions. 

      Compartments as initially described were large, often Mb sized, chromosomal segments that “share” similar chromatin marks/states, and are thought to merge via co-polymer segregation.  They were visualized using large multi-kb bin sizes.  In the studies reported here, we use bin sizes of 200 bp to examine a DNA segment of less than 200 kb which is subdivided into a dozen or so small TADs.  Several of the TADs contain more than one transcription unit, and they are expressed in quite different patterns, and thus might be expected to have different “chromatin states” at different points in development and in different cells in the organism. However, as can be seen by comparing the MicroC patterns in our paper that are shown in Fig. 2 with Fig. 7, Figure Supplemental 5 and Figure Supplemental 6, the TAD organization in NC14 and 12-16 hr embryos is for the most part quite similar.  There is no indication that these small TADs are participating in liquid phase compartmentalization that depends upon shared chromatin/transcriptional states in NC14 and then again in 12-16 hr embryos. 

      In NC14 embryos, eve is expressed in 7 stripes, while it is potentially active throughout much of the embryo.  In fact, the initial pattern in early cycles is quite broad and is then refined during NC14.  In 12-16 hr embryos, the eve gene is silenced by the PcG system in all but a few cells in the embryo.  However, here again the basic structure of the TAD, including the volcano plume, looks quite similar at these different developmental stages.  

      As for the suggestion that the plume topping the eve volcano triangle is generated because the TADs flanking the eve TAD share chromatin states and coalesce via some sort of phase separation:

      This model has been tested directly in Ke et al. (Ke et al. 2024).  In Ke et al., we deleted the nhomie boundary and replaced it with either nhomie in the reverse orientation or homie in the forward orientation.  According to the compartment model, changing the orientation of the boundaries so that the topology of the eve TAD changes from a stem-loop to a circle-loop should have absolutely no effect on the plume topping the eve volcano triangle.  The TADs flanking the eve TAD would still be expected to share the same chromatin states and would still be able to coalesce via phase transition.  However, this is not what is observed.  The plume disappears and is replaced by “clouds” on both sides of the eve TAD. The clouds arise because the eve TAD bumps into the neighboring TADs when the topology is a circle-loop.  

      We would also note that “compartment-level” interactions would not explain the findings presented in Muller at al. 1999, in Table 1 or in Author response image 4.  It is clear that the long distant (Mb) interactions observed for Mcp, gypsy, Fab-7, homie, nhomie and the blue and purple boundaries in Author response image 4 arise by the physical pairing of TAD boundary elements.  This fact is demonstrated directly by the MicroC experiments in Fig. 7 and Fig Supplemental 4 and 5, and by the MicroC and dHS-C experiments in Author response image 4.  There is no evidence for any type of “compartment/phase separation” driving these specific boundary pairing interactions.

      In fact, given the involvement of TAD boundaries in meta-loop formation, one might begin to wonder whether some of the “compartment level interactions” are generated by the specific pairing of TAD boundary elements rather than by “shared chromatin” states.  For example, the head-tohead pairing of the blue and purple boundaries generates a Lar meta-loop that has a circle-loop topology.  As a consequence, sequences upstream of the blue and purple boundary come into contact, generating the small dark rectangular box on the upper left side of the contact map.  Sequences downstream of the blue and purple boundary also come into contact, and this generates the larger rectangular box in the lower right side of the contact map.  A new figure, Fig. 9, shows that the interaction pattern flips (lower left and top right) when the meta-loop has a stem-loop topology.  If these meta-loops are visualized using larger bin sizes, the classic “compartment” patchwork pattern of interactions emerges.  Would the precise patchwork pattern of “compartmental” interactions involving the four distant TADs that are linked in the two meta-loops shown in Fig. 9 persist as is if we deleted one of the TAD boundaries that forms the meta-loop?  Would the precise patchwork pattern persist if we inverted one of the meta-loop boundaries so that we converted the topology of the loop from a circle-loop to a stem-loop or vice versa?  We haven’t used MicroC to compare the compartment organization after deleting or inverting a meta-loop TAD boundary; however, a comparison of the MicroC pattern in WT in Fig. 1C with that for the homie transgenes in Fig. 7 and Figs. Supplemental 5, 6 and 7 indicates a) that novel patterns of TAD:TAD interactions are generated by this homie dependent mini-meta-loop and b) that the patterns of TAD:TAD interactions depend upon loop topology. Were these novel TAD:TAD interactions generated instead by compartment level interactions/shared chromatin states, they should be evident in WT as well (Fig. 1).  They are not.

      How does intrachromosomal homolog pairing impact the models proposed in this manuscript (Abed et al. 2019; Erceg et al., 2019). Several papers recently have shown that somatic homolog pairing is not uniform and shows significant variation across the genome with evidence for both tight pairing regions and loose pairing regions. Might loose pairing interactions have the capacity to alter the cis configuration of the eve locus?

      (2.5) At this point it is not entirely clear how homolog pairing impacts the cis configuration/MicroC contact maps.  We expect that homolog pairing is incomplete in the NC14 embryos we analyzed;  however, since replication of eve and the local neighborhood is likely complete, sister chromosomes should be paired.  So we are likely visualizing the 3D organization of paired TADs.

      In summary, the transgenic experiments are extensive and elegant and fully support the authors' models. However, in my opinion, they do not completely rule out additional models at play, including extrusion-based mechanisms. Indeed, my major issue is the limited conceptual advance in this manuscript. The authors essentially repeat many of their previous work and analyses.

      (2.6) In our view, the current paper makes a number of significant contributions that go well beyond those described in our 2016 publication.  These are summarized below.

      A) While our 2016 paper used transgenes inserted in the -142 kb attP site to study pairing interactions of homie and nhomie, we didn’t either consider or discuss how our findings might bear on the loop extrusion model.  However, since the loop extrusion model is currently accepted as established fact by many labs working on chromosome structure, it is critically important to devise experimental approaches which test the predictions of this particular model.  One approach would be to deplete cohesin components; however, as discussed in #1.1, our experimental system is not ideal for this type of approach.  On the other hand, there are other ways to test the extrusion model.  Given the mechanism proposed for TAD formation—extruding a loop until cohesin bumps into CTCF/boundary road blocks—it follows that only two types of loop topologies are possible: stemloop and unanchored loop.  The loop extrusion model, as currently conceived, can’t account for the two cases in this study in which the reporter on the wrong side of the homie boundary from the eve locus is activated by the eve enhancers.  In contrast, our findings are completely consistent with orientation-specific boundary:boundary pairing.

      B) In the loop extrusion model, cohesin embraces both of the extruded chromatin fibers, transiently bringing them into close proximity.  As far as we know, there have been no (high resolution) experiments that have actually detected these extruding cohesin complexes during TAD formation.  In order to have a chance of observing the expected signatures of extruding cohesin complexes, one would need a system in which TADs are being formed.  As described in the text, this is why we used MicroC to analyze TADs in NC14 embryos.  We do not detect the signature stripes that would be predicted (see Figure Supp 2) by the current version of the loop extrusion model.

      C) Reporter expression in the different -142 kb transgenes provides only an indirect test of the loop extrusion and boundary:boundary pairing models for TAD formation.  The reporter expression results need to be confirmed by directly analyzing the pattern of physical interactions in each instance.  While we were able to detect contacts between the transgenes and eve in our 2016 paper, the 3C experiments provided no information beyond that.  By contrast, the MicroC experiments in the current paper give high resolution maps of the physical contacts between the transgene and the eve TAD.  The physical contacts track completely with reporter activity.  Moreover, just as is the case for reporter activity, the observed physical interactions are inconsistent with the loop extrusion model.

      D) Genetic studies in Muller et al. (Muller et al. 1999) and imaging in Vazquez et al. (Vazquez et al. 2006) suggested that more than two boundaries can participate in pairing interactions.  Consistent with these earlier observations, viewpoint analysis indicates the transgene homie interacts with both eve boundaries.  While this could be explained by transgene homie alternating between nhomie and homie in the eve locus, this would require the remodeling of the eve TAD each time the pairing interaction switched between the three boundary elements.  Moreover, two out of the three possible pairing combinations would disrupt the eve TAD, generating an unanchored loop (c.f., the lambda DNA TAD in Ke et al., (Ke et al. 2024)).  However, the MicroC profile of the eve TAD is unaffected by transgenes carrying the homie boundary.  This would suggest that like Mcp, the pairing interactions of homie and nhomie might not be exclusively pairwise.  In this context is interesting to compare the contact profiles of the lar meta-loop shown in Author response image 4 with the different 142 kb homie inserts.  Unlike the homie element at -142 kb, there is clearly only a single point of contact between the blue and purple boundaries.

      E) Chen et al. (Chen et al. 2018) used live imaging to link physical interactions between a homie containing transgene inserted at -142 kb and the eve locus to reporter activation by the eve enhancers.  They found that the reporter was activated by the eve enhancers only when it was in “close proximity” to the eve gene.  “Close proximity” in this case was 331 nM.  This distance is equivalent to ~1.1 kb of linear duplex B form DNA, or ~30 nucleosome core particles lined up in a row.  It would not be possible to ligate two DNAs wrapped around nucleosome core particles that are located 330 nM apart in a fixed matrix.  Since our MicroC experiments were done on embryos in which the gene is silent in the vast majority of cells, it is possible that the homie transgene only comes into close enough proximity for transgene nucleosome: eve nucleosome ligation events when the eve gene is off.  Alternatively, and clearly more likely, distance measurements using imaging procedures that require dozens of fluorescent probes may artificially inflate the distance between sequences that are actually close enough for enzymatic ligation.

      F) The findings reported in Goel et al. (Goel et al. 2023) indicate that mammalian TADs don’t require cohesin activity; however, the authors do not provide an alternative mechanism for TAD formation/stability.  Here we have suggested a plausible mechanism.

      The authors make no attempt to dissect the mechanism of this process by modifying extrusion components directly.

      (2.7) See point #1.1

      Some discussion of Rollins et al. on the discovery of Nipped-B and its role in enhancer-promoter communication should also be made to reconcile their conclusions in the proposed absence of extrusion events.

      (2.8) The reason why reducing nipped-B activity enhances the phenotypic effects of gypsy-induced mutations is not known at this point; however, the findings reported in Rollins et al. (Rollins et al. 1999) would appear to argue against an extrusion mechanism for TAD formation.

      Given what we know about enhancer blocking and TADs, there are two plausible mechanisms for how the Su(Hw) element in the gypsy transposon blocks enhancer-promoter interactions in the gypsy-induced mutants studied by Rollins et al.  First, the Su(Hw) element could generate two new TADs through pairing interactions with boundaries in the immediate neighborhood.  This would place the enhancers in one TAD and the target gene in another TAD.  Alternatively, the studies of Sigrist and Pirrotta (Sigrist and Pirrotta 1997) as well as several publications from Victor Corces’ lab raise the possibility that the Su(Hw) element in gypsy-induced mutations is pairing with gypsy transposons inserted elsewhere in the genome.  This would also isolate enhancers from their target genes.  In either case, the loss of nipped-B activity increases the mutagenic effects of Su(Hw) element presumably by strengthening its boundary function.  If this is due to a failure to load cohesin on to chromatin, this would suggest that cohesin normally functions to weaken the boundary activity of the Su(Hw) element, i.e., disrupting the ability of Su(Hw) elements to interact with either other boundaries in the neighborhood or with themselves.  Were this a general activity of cohesin (to weaken boundary activity), one would imagine that cohesin normally functions to disrupt TADs rather than generate/stabilize TADs.

      An alternative model is that Nipped-B (and thus cohesion) functions to stabilize enhancerpromoter interactions within TADs.  In this case, loss of Nipped-B would result in a destabilization of the weak enhancer:promoter interactions that can still be formed when gypsy is located between the enhancer and promoter.  In this model the loss of these weak interactions in nipped-b mutants would appear to increase the “blocking” activity of the gypsy element.  However, this alternative model would also provide no support for the notion that Nipped-B and cohesin function to promote TAD formation.

      Reviewer #3 (Public Review):

      Bing et al. attempt to address fundamental mechanisms of TAD formation in Drosophila by analyzing gene expression and 3D conformation within the vicinity of the eve TAD after insertion of a transgene harboring a Homie insulator sequence 142 kb away in different orientations. These transgenes along with spatial gene expression analysis were previously published in Fujioka et al. 2016, and the underlying interpretations regarding resulting DNA configuration in this genomic region were also previously published. This manuscript repeats the expression analysis using smFISH probes in order to achieve more quantitative analysis, but the main results are the same as previously published. The only new data are the Micro-C and an additional modeling/analysis of what they refer to as the 'Z3' orientation of the transgenes. The rest of the manuscript merely synthesizes further interpretation with the goal of addressing whether loop extrusion may be occurring or if boundary:boundary pairing without loop extrusion is responsible for TAD formation. The authors conclude that their results are more consistent with boundary:boundary pairing and not loop extrusion; however, most of this imaging data seems to support both loop extrusion and the boundary:boundary models. This manuscript lacks support, especially new data, for its conclusions.

      (3.1) The new results/contributions of our paper are described in #2.6 above. 

      Although there are (two) homie transgene configurations that give expression patterns that would be consistent with the loop extrusion model, that is not quite the same as strong evidence supporting loop extrusion.  On the contrary, key aspects of the expression data are entirely inconsistent with loop extrusion, and they thus rule out the possibility that loop extrusion is sufficient to explain the results.  Moreover, the conclusions drawn from the expression patterns of the four transgenes are back up by the MicroC contact profiles—profiles that are also not consistent with the loop extrusion model.  Further, as documented above, loop extrusion is not only unable to explain the findings reported in this manuscript, but also the results from a large collection of published studies on fly boundaries.  Since all of these boundaries function in TAD formation, there is little reason to think that loop extrusion makes a significant contribution at the TAD level in flies.   Given the results reported by Goel et al. (Goel et al. 2023), one might also have doubts about the role of loop extrusion in the formation/maintenance of mammalian TADs. 

      To further document these points, we’ve included a new figure (Fig. 9) that shows two meta-loops.  Like the loops seen for homie-containing transgenes inserted at -142 kb, meta-loops are formed by the pairing of distant fly boundaries.  As only two boundaries are involved, the resulting loop topologies are simpler than those generated when transgene homie pairs with nhomie and homie in the eve locus.  The meta-loop in panel B is a stem-loop.  While a loop with this topology could be formed by loop extrusion, cohesion would have to break through dozens of intervening TAD boundaries and then somehow know to come to a halt at the blue boundary on the left and the purple boundary on the right.  However, none of the mechanistic studies on either cohesin or the mammalian CTCF roadblocks have uncovered activities of either the cohesin complex or the CTCF roadblocks that could explain how cohesin would be able to extrude hundreds of kb and ignore dozens of intervening roadblocks, and then stop only when it encounters the two boundaries that form the beat-IV meta-loop.  The meta-loop in panel A is even more problematic in that it is a circle-loop--a topology that can’t be generated by cohesin extruding a loop until comes into contact with CTCF roadblocks on the extruded strands.

      Furthermore, there are many parts of the manuscript that are difficult to follow. There are some minor errors in the labelling of the figures that if fixed would help elevate understanding. Lastly, there are several major points that if elaborated on, would potentially be helpful for the clarity of the manuscript.

      Major Points:

      (1) The authors suggest and attempt to visualize in the supplemental figures, that loop extrusion mechanisms would appear during crosslinking and show as vertical stripes in the micro-C data. In order to see stripes, a majority of the nuclei would need to undergo loop extrusion at the same rate, starting from exactly the same spots, and the loops would also have to be released and restarted at the same rate. If these patterns truly result from loop extrusion, the authors should provide experimental evidence from another organism undergoing loop extrusion.

      (3.2) We don’t know of any reports that actually document cohesion extrusion events that are forming TADs (TADs as defined in our paper, in the RCMC experiments of Goel et al. (Goel et al. 2023), in response #1.1, or in the high-resolution images from the MicroC data of Krietenstein et al (Krietenstein et al. 2020) and Hseih et al. (Hsieh et al. 2020). However, an extruding cohesin complex would be expected to generate stripes because it transiently brings together the two chromatin strands as illustrated by the broken zipper in Figure Supplemental 2 of our paper.  While stripes generated by cohesin forming a TAD have not to our knowledge ever been observed, Fig. 4 in Goel et al. (Goel et al. 2023)) shows 45o stripes outlining TADs and connecting neighboring TADs.  These stripes are visible with or without Rad21.

      In some versions of the loop extrusion model, cohesin extrudes a loop until it comes to a halt at both boundaries, where it then remains holding the loop together.  In this model, the extrusion event would occur only once per cell cycle.  This is reason we selected NC14 embryos as this point in development should provide by far the best opportunity to visualize cohesin-dependent TAD formation.  However, the expected stripes generated by cohesin embrace of both strands of the extruding loop were not evident.  Other newer versions of the loop extrusion model are much more dynamic—cohesin extrudes the loop, coming to a halt at the two boundaries, but either doesn’t remain stably bound or breaks through one or both boundaries. In the former case, the TAD needs to be reestablished by another extrusion event, while in the latter case LDC domains are generated.  In this dynamic model, we should also be able to observe vertical and 45o stripes (or stripes leaning to one side or another of the loading site if the extrusion rates aren’t equal on both fibers) in NC14 embryos corresponding to the formation of TADs and LDC domains.  However, we don’t.

      (2) On lines 311-314, the authors discuss that stem-loops generated by cohesin extrusion would possibly be expected to have more next-next-door neighbor contacts than next-door neighbor contacts and site their models in Figure 1. Based on the boundary:boundary pairing models in the same figure would the stem-loops created by head-to-tail pairing also have the same phenotype? Making possible enrichment of next-next-door neighbor contacts possible in both situations? The concepts in the text are not clear, and the diagrams are not well-labeled relative to the two models.

      (3.3) Yes, we expect that stem-loops formed by cohesin extrusion or head-to-tail pairing would behave in a similar manner.  They could be stem-loops separated by unanchored loops as shown in Fig. 1B and E.  Alternatively, adjacent loops could be anchored to each other (by cohesin/CTCF road blocks or by pairing interactions) as indicated in Fig. 1C and F.  In stem-loops generated either by cohesin extrusion or by head-to-tail pairing, next-next door neighbors should interact with each other, generating a plume above the volcano triangle.  In the case of circle-loops, the volcano triangle should be flanked by clouds that are generated when the TAD bumps into both next-door neighbors.  In the accompanying paper, we test this idea by deleting the nhomie boundary and then a) inserting nhomie back in the reverse orientation, or b) by inserting homie in the forward orientation.  The MicroC patterns fit with the predictions that were made in this paper.

      (3) The authors appear to cite Chen et al., 2018 as a reference for the location of these transgenes being 700nM away in a majority of the nuclei. However, the exact transgenes in this manuscript do not appear to have been measured for distance. The authors could do this experiment and include expression measurements.

      (3.4) The transgenes used in Chen et al. are modified versions of a transgene used in Fujioka et al. (2016) inserted into the same attP site.  When we visualize reporter transcription in NC14 embryos driven by the eve enhancers using smFISH, HCR-FISH or DIG, only a subset of the nuclei at this stage are active.  The number of active nuclei we detect is similar to that observed in the live imaging experiments of Chen et al.  The reason we cited Chen et al. (Chen et al. 2018) was that they found that proximity was a critical factor in determining whether the reporter was activated or not in a given nucleus.  The actual distance they measured wasn’t important.  Moreover, as we discussed in response #2.6 above, there are good reasons to think that the “precise” distances measured in live imaging experiments like those used in Chen et al. are incorrect.  However, their statements are certainly correct if one considers that a distance of ~700 nM or so is “more distant” relative to a distance of ~300 nM or so, which is “closer.”

      (4) The authors discuss the possible importance of CTCF orientation in forming the roadblock to cohesin extrusion and discuss that Homie orientation in the transgene may impact Homie function as an effective roadblock. However, the Homie region inserted in the transgene does not contain the CTCF motif. Can the authors elaborate on why they feel the orientation of Homie is important in its ability to function as a roadblock if the CTCF motif is not present? Trans-acting factors responsible for Homie function have not been identified and this point is not discussed in the manuscript.

      We discussed the “importance” of CTCF orientation in forming roadblocks because one popular version of the cohesin loop extrusion/CTCF roadblock model postulates that CTCF must be oriented so that the N-terminus of the protein is facing towards the oncoming cohesin complex, otherwise it won’t be able to halt extrusion on that strand.  When homie in the transgene is pointing towards the eve locus, the reporter on the other side (farther from eve) is activated by the eve enhancers.  One possible way to explain this finding (if one believes the loop extrusion model) is that when homie is inverted, it can’t stop the oncoming cohesin complex, and it runs past the homie boundary until it comes to a stop at a properly oriented boundary farther away.  In this case, the newly formed loop would extend from the boundary that stopped cohesin to the homie boundary in the eve locus, and would include not only the distal reporter, but also the proximal reporter.  If both reporters are in the same loop with the eve enhancers (which they would have to be given the mechanism of TAD formation by loop extrusion), both reporters should be activated.  They are not.

      For the boundary pairing model, the reporter that will be activated will depend upon the orientation of the pairing interaction—which can be either head-to-head or head-to-tail (or both: see discussion of LBC elements in #2.1).  For an easy visualization of how the orientation of pairing interactions is connected to the patterns of interactions between sequences neighboring the boundary, please look at Fig. 9.  This figure shows two different meta-loops.  In panel A, head-tohead pairing of the blue and purple boundaries brings together, on the one hand, sequences upstream of the blue and purple boundary, and on the other hand, sequences downstream of the blue and purple boundaries.  In the circle loop configuration, the resulting rectangular boxes of enhanced contact are located in the upper left and lower right of the contact map.  In panel B, the head-to-tail pairing of the blue and purple boundary changes how sequences upstream and downstream of the blue and purple boundaries interact with each other.  Sequences upstream of the blue boundary interact with sequences downstream of the purple boundary, and this gives the rectangular box of enhanced interactions on the top right.  Sequences downstream of the blue boundary interact with sequences upstream of the purple boundary, and this gives the rectangular box of enhanced contact on the lower left.

      CTCF: Our analysis of the homie boundary suggests that CTCF contributes little to its activity.  It has an Su(Hw) recognition sequence and a CP190 “associated” sequence.  Mutations in both compromise boundary activity (blocking and -142 kb pairing).  Gel shift experiments and ChIP data indicate there are half a dozen or more additional proteins that associate with the 300 bp homie fragment used in our experiments.

      Orientation of CTCF or other protein binding sites:  The available evidence suggests that orientation of the individual binding sites is not important (Kyrchanova et al. 2016; Lim et al. 2018)).  Instead, it is likely that the order of binding sites affects function.

      (5) The imaging results seem to be consistent with both boundary:boundary interaction and loop extrusion stem looping.

      It is not clear whether the reviewer is referring to the different patterns of reporter expression— which clearly don’t fit with the loop extrusion model in the key cases that distinguish the two models—or the live imaging experiments in Chen et al. (Chen et al. 2018).

      (6) The authors suggest that the eveMa TAD could only be formed by extrusion after the breakthrough of Nhomie and several other roadblocks. Additionally, the overall long-range interactions with Nhomie appear to be less than the interactions with endogenous Homie (Figures 7, 8, and supplemental 5). Is it possible that in some cases boundary:boundary pairing is occurring between only the transgenic Homie and endogenous Homie and not including Nhomie?

      Yes, it is possible.  On the other hand, the data that are currently available supports the idea that transgene homie usually interacts with endogenous homie and nhomie at the same time.  This is discussed in #2.6D above.  The viewpoints indicate that crosslinking occurs more frequently to homie than to nhomie.  This could indicate that when there are only pairwise interactions, these tend to be between homie and homie.  Alternatively, this could also be explained by a difference in relative crosslinking efficiency.

      (7) In Figure 4E, the GFP hebe expression shown in the LhomieG Z5 transgenic embryo does not appear in the same locations as the LlambdaG Z5 control. Is this actually hebe expression or just a background signal?

      The late-stage embryos shown in E are oriented differently.  For GlambdaL, the embryo is oriented so that hebe-like reporter expression on the ventral midline is readily evident.  However, this orientation is not suitable for visualizing eve enhancer-dependent expression of the reporters in muscle progenitor cells.  For this reason, the 12-16 hr GeimohL embryo in E is turned so that the ventral midline isn’t readily visible in most of the embryo.  As is the case in NC14 embyros, the eve enhancers drive lacZ but not gfp expression in the muscle progenitor cells.

      (8) Figure 6- The LhomieG Z3 (LeimohG) late-stage embryo appears to be showing the ventral orientation of the embryo rather than the lateral side of the embryo as was shown in the previous figure. Is this for a reason? Additionally, there are no statistics shown for the Z3 transgenic images.

      Were these images analyzed in the same way as the Z5 line images?

      The LeimohG embryo was turned so that the hebe enhancer-dependent expression of lacZ is visible.  While the eve enhancer-dependent expression of lacZ in the muscle progenitor cells isn’t visible with this orientation, eve enhancer-dependent expression in the anal plate is.

      (9) Do the Micro-C data align with the developmental time points used in the smFISH probe assays?

      The MicroC data aligns with the smFISH images of older embryos: 12-14 hour embryos or stages 14-16.  

      Recommendations for the authors:   

      Reviewer #1 (Recommendations For The Authors):

      This was a difficult paper to review. It took me several hours to understand the terminology and back and forth between different figures to put it together. It might be useful to put the loop models next to the MicroC results and have a cartoon way of incorporating which enhancers are turning on which reporters.

      I also found the supercoiled TAD models in Figure 1 not useful. These plectoneme-type of structures likely do not exist, based on the single-cell chromosome tracing studies, and the HiC structures not showing perpendicular to diagonal interactions between the arms of the plectonemes.

      We wanted to represent the TAD as a coiled 30nM fiber, as they are not likely to resemble the large loops like those shown in Fig. 1 A, D, and G.

      There are no stripes emerging from homies, which is consistent with the pairing model, but there seem to be stripes from the eve promoter. I think these structures may be a result of both the underlying loop extruders + pairing elements.

      There are internal structures in the eve TAD that link the upstream region of the eve promoter to the eve PRE and sequences in nhomie.  All three of these sequences are bound by LBC.  Each of the regulatory domains in BX-C also have LBC elements and, as shown in Author response image 1, you can see stripes connecting some of these LBC elements to each other.  Since the stripes that Goel et al. (Goel et al. 2023) observed in their RCMC analysis of Ppm1g didn’t require cohesin, how these stripes are generated (active: e.g, a chromatin remodeler or passive: e.g., the LBC complex has non-specific DNA binding activity that can be readily crosslinked as the chromatin fiber slides past) isn’t clear.

      The authors say there are no TADs that have "volcano plumes" but the leftmost TAD TA appears to have one. What are the criteria for calling the plumes? I am also not clear why there is a stripe off the eve volcano. It looks like homie is making a "stripe" loop extrusion type of interaction with the next TAD up. Is this maybe cohesin sliding off the left boundary?

      The reviewer is correct, the left-most TAD TA appears to have a plume.  We mentioned TA seems to have a plume in the original text, but it was inadvertently edited out.

      Two different types of TADßàTAD interactions are observed.  In the case of eve, the TADs to either side of eve interact more frequently with each other than they do with eve.  This generates a “plume” above the eve volcano triangle.  The TADs that comprise the Abd-B regulatory domains (see Author response image 1) are surrounded by clouds of diminishing intensity.  Clouds at the first level represent interactions with both next-door neighbors; clouds at the second level represent interactions with both next-next-door neighbors; clouds at the third level represent interactions with next-next-next door neighbors.  The Abd-B TADs are close to the same size, so that interactions with neighbors are relatively simple.  However, this is not always the case.  When there are smaller TADs near larger TADs the pattern of interaction can be quite complicated.  An example is indicated by the red bar in Author response image 2

      The authors state "In the loop-extrusion model, a cohesin complex initiating loop extrusion in the eve TAD must break through the nhomie roadblock at the upstream end of the eve TAD. It must then make its way past the boundaries that separate eve from the attP site in the hebe gene, and come to a halt at the homie boundary associated with the lacZ reporter." Having multiple loops formed by cohesin would also bring in the 142kb apart reporter and homie. Does cohesin make 140 kb long loops in flies?

      A mechanism in which cohesin brings the reporter close to the eve TAD by generating many smaller loops (which would be the intervening TADs) was discussed in #1.2.

      Figure 5 title mistakes the transgene used?

      Fixed.

      In figure 6, the orientation of the embryos does not look the same for the late-stage panels. So it was difficult to tell if the eve enhancer was turning the reporter on.

      Here we were focusing mainly on the AP enhancer activation of the reporter, as this is most easily visualized.  It should be clear from the images that the appropriate reporter is activated by the AP enhancer for each of the transgene inserts.

      It is not clear to me why the GFP makes upstream interactions (from the 4C viewpoint) in GhomileLZ5 but not in LhomieGZ5? Corresponding interactions for Fig Supp 5 & 6 are not the same. That is, LacZ in the same place and with the same homie orientation does not show a similar upstream enrichment as the GFP reporter does.

      We are uncertain as to whether we understand this question/comment.  In GhomieLZ5 (now GhomieL, the lacZ reporter is on the eve side of the homie boundary while gfp is on the hebe enhancer side of the homie boundary.  Since homie is pointing away from gfp, pairing interactions with homie and nhomie in the eve locus bring the eve enhancers in close proximity with the gfp reporter.  This is what is seen in Fig. 7 panel D—lower trace.  In LhomieGZ5 (now GeimohL) the lacZ reporter is again on the eve side of the homie boundary while gfp is on the hebe enhancer side of the homie boundary.  However, in this case homie is inverted so that it is points away from lacZ (towards gfp).  In this orientation, pairing brings the lacZ reporter into contact with the eve enhancers.  This is what is seen in the upper trace in Fig. 7 panel D.

      The orientation of the transgene is switch in Fig. Supp 5 and 6.  For these “Z3) transgenes (now called LeimohG and LhomieG the gfp reporter is on the eve side of homie while the lacZ reporter is on the hebe enhancer side of homie.  The interactions between the reporters and eve are determined by the orientation of homie in the transgene.  When homie is pointing away from gfp (as in LeimohG), gfp is activated and that is reflected in the trace in Supp Fig. 5. When homie is pointing away from lacZ, lacZ is activated and this is reflected (though not as cleanly as in other cases) in the trace in Supp Fig. 6.  

      I did not see a data availability statement. Is the data publicly available? The authors also should consider providing the sequences of the insertions, or provide the edited genomes, in case other researchers would like to analyze the data.

      Data have been deposited.

      Reviewer #3 (Recommendations For The Authors):

      Minor Points:

      (1) There is an inconsistency in the way that some of the citations are formatted. Some citations have 'et al' italicized while others do not. It seems to be the same ones throughout the manuscript. Some examples: Chetverina et al 2017, Chetverina et al 2014, Cavalheiro et al 2021, Kyrchanova et al 2008a, Muravyova et al 2001.

      Fixed

      (2) Pita is listed twice in line 48.

      Fixed

      (3) Line 49, mod(mdg4)67.2 is written just as mod(mdg4). The isoform should be indicated.

      This refers to all Mod isoforms.

      (4) Homie and Nhomie are italicized throughout the manuscript and do not need to be.

      This is the convention used previously.  

      (5) The supplemental figure captions 1 and 2 in the main document are ordered differently than in the supplemental figures file. This caused it to look like the figures are being incorrectly cited in lines 212-214 and 231-232.

      Fixed

      (6) Is the correct figure being cited in line 388-389? The line cites Figure 6E when mentioning LlambdaG Z5; however, LlambdaG Z5 is not shown in Figure 6.

      Fixed

      (7) Section heading 'LhomieG Z5 and GhomieL Z5' could be renamed for clarity. GhomieL Z5 results are not mentioned until the next section, named 'GhomieL Z5'.

      Fixed

      (8) Can the authors provide better labeling for control hebe expression? This would help to determine what is hebe expression and what is background noise in some of the embryos in Figures 4-6.

      Author response image 5 shows expression of the lacZ reporter in GeimohL and GlambdaL.  For the GlambdaL transgene, the hebe enhancers drive lacZ expression in 1216 hr embryos.  Note that lacZ expression is restricted to a small set of quite distinctive cells along the ventral midline.  lacZ is also expressed on the ventral side of the GeimohL embryo (top panel).  However, their locations are quite different from those of the lacZ positive cells in the GlambdaL transgene embryo.  These cells are displaced from the midline, and are arranged as pairs of cells in each hemisegment, locations that correspond to eve-expressing cells in the ventral nerve cord.  The eve enhancers also drive lacZ expression elsewhere in the GeimohL embryo, including the anal plate and dorsal muscle progenitor cells (seen most clearly in the lower left panel).

      Author response image 5.

      lacZ expression in Giemohl and Glambdal embryos

      (9) The Figure 5 title is labeled with the wrong transgene.

      Fixed

      (10) Heat map scales are missing for Figures 7, supplemental 5, and supplemental 6.

      Fixed

      (11) Did the authors check if there was a significant difference in the expression of GFP and lacZ from lambda control lines to the Homie transgenic lines?

      Yes.  Statistical analysis added in Table Supplemental #1

      (12) The Figure 7 title references that these are Z3 orientations, however, it is Z5 orientations being shown.

      Fixed

      (13) The virtual 4C data should include an axis along the bottom of the graphs for better clarity. An axis is missing in all 4C figures.

      References:

      Bantignies F, Grimaud C, Lavrov S, Gabut M, Cavalli G. 2003. Inheritance of polycomb-dependent chromosomal interactions in drosophila. Genes Dev. 17(19):2406-2420.

      Batut PJ, Bing XY, Sisco Z, Raimundo J, Levo M, Levine MS. 2022. Genome organization controls transcriptional dynamics during development. Science. 375(6580):566-570.

      Bonchuk A, Boyko K, Fedotova A, Nikolaeva A, Lushchekina S, Khrustaleva A, Popov V, Georgiev P. 2021. Structural basis of diversity and homodimerization specificity of zinc-fingerassociated domains in drosophila. Nucleic Acids Res. 49(4):2375-2389.

      Bonchuk AN, Boyko KM, Nikolaeva AY, Burtseva AD, Popov VO, Georgiev PG. 2022. Structural insights into highly similar spatial organization of zinc-finger associated domains with a very low sequence similarity. Structure. 30(7):1004-1015.e1004.

      Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, Gregor T. 2018. Dynamic interplay between enhancer–promoter topology and gene activity. Nat Genet. 50(9):1296.

      Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. 2017. C2h2 zinc finger proteins: The largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae. 9(2):4758.

      Foe VE. 1989. Mitotic domains reveal early commitment of cells in drosophila embryos. Development. 107(1):1-22.

      Fujioka M, Mistry H, Schedl P, Jaynes JB. 2016. Determinants of chromosome architecture: Insulator pairing in cis and in trans. PLoS Genet. 12(2):e1005889.

      Galloni M, Gyurkovics H, Schedl P, Karch F. 1993. The bluetail transposon: Evidence for independent cis-regulatory domains and domain boundaries in the bithorax complex. The EMBO Journal. 12(3):1087-1097.

      Goel VY, Huseyin MK, Hansen AS. 2023. Region capture micro-c reveals coalescence of enhancers and promoters into nested microcompartments. Nat Genet. 55(6):1048-1056.

      Hsieh TS, Cattoglio C, Slobodyanyuk E, Hansen AS, Rando OJ, Tjian R, Darzacq X. 2020. Resolving the 3d landscape of transcription-linked mammalian chromatin folding. Mol Cell. 78(3):539553.e538.

      Ke W, Fujioka M, Schedl P, Jaynes JB. 2024. Chromosome structure ii: Stem-loops and circle-loops. eLife.

      Krietenstein N, Abraham S, Venev SV, Abdennur N, Gibcus J, Hsieh TS, Parsi KM, Yang L, Maehr R, Mirny LA et al. 2020. Ultrastructural details of mammalian chromosome architecture. Mol Cell. 78(3):554-565.e557.

      Kyrchanova O, Ibragimov A, Postika N, Georgiev P, Schedl P. 2023. Boundary bypass activity in the abdominal-b region of the drosophila bithorax complex is position dependent and regulated. Open Biol. 13(8):230035.

      Kyrchanova O, Kurbidaeva A, Sabirov M, Postika N, Wolle D, Aoki T, Maksimenko O, Mogila V, Schedl P, Georgiev P. 2018. The bithorax complex iab-7 polycomb response element has a novel role in the functioning of the fab-7 chromatin boundary. PLoS Genet. 14(8):e1007442.

      Kyrchanova O, Mogila V, Wolle D, Deshpande G, Parshikov A, Cleard F, Karch F, Schedl P, Georgiev P. 2016. Functional dissection of the blocking and bypass activities of the fab-8 boundary in the drosophila bithorax complex. PLoS Genet. 12(7):e1006188.

      Kyrchanova O, Sabirov M, Mogila V, Kurbidaeva A, Postika N, Maksimenko O, Schedl P, Georgiev P.

      2019a. Complete reconstitution of bypass and blocking functions in a minimal artificial fab7 insulator from drosophila bithorax complex. Proceedings of the National Academy of Sciences.201907190.

      Kyrchanova O, Wolle D, Sabirov M, Kurbidaeva A, Aoki T, Maksimenko O, Kyrchanova M, Georgiev P, Schedl P. 2019b. Distinct elements confer the blocking and bypass functions of the bithorax fab-8 boundary. Genetics.genetics. 302694.302019.

      Li H-B, Muller M, Bahechar IA, Kyrchanova O, Ohno K, Georgiev P, Pirrotta V. 2011. Insulators, not polycomb response elements, are required for long-range interactions between polycomb targets in drosophila melanogaster. Mol Cell Biol. 31(4):616-625.

      Li X, Tang X, Bing X, Catalano C, Li T, Dolsten G, Wu C, Levine M. 2023. Gaga-associated factor fosters loop formation in the drosophila genome. Mol Cell. 83(9):1519-1526.e1514.

      Lim B, Heist T, Levine M, Fukaya T. 2018. Visualization of transvection in living drosophila embryos. Mol Cell. 70(2):287-296. e286.

      Link N, Kurtz P, O'Neal M, Garcia-Hughes G, Abrams JM. 2013. A p53 enhancer region regulates target genes through chromatin conformations in cis and in trans. Genes Dev. 27(22):24332438.

      Mohana G, Dorier J, Li X, Mouginot M, Smith RC, Malek H, Leleu M, Rodriguez D, Khadka J, Rosa P et al. 2023. Chromosome-level organization of the regulatory genome in the drosophila nervous system. Cell. 186(18):3826-3844.e3826.

      Muller M, Hagstrom K, Gyurkovics H, Pirrotta V, Schedl P. 1999. The mcp element from the drosophila melanogaster bithorax complex mediates long-distance regulatory interactions. Genetics. 153(3):1333-1356.

      Postika N, Metzler M, Affolter M, Müller M, Schedl P, Georgiev P, Kyrchanova O. 2018. Boundaries mediate long-distance interactions between enhancers and promoters in the drosophila bithorax complex. PLoS Genet. 14(12):e1007702.

      Rollins RA, Morcillo P, Dorsett D. 1999. Nipped-b, a drosophila homologue of chromosomal adherins, participates in activation by remote enhancers in the cut and ultrabithorax genes. Genetics. 152(2):577-593.

      Samal B, Worcel A, Louis C, Schedl P. 1981. Chromatin structure of the histone genes of d. Melanogaster. Cell. 23(2):401-409.

      Shermoen AW, McCleland ML, O'Farrell PH. 2010. Developmental control of late replication and s phase length. Curr Biol. 20(23):2067-2077.

      Shidlovskii YV, Bylino OV, Shaposhnikov AV, Kachaev ZM, Lebedeva LA, Kolesnik VV, Amendola D, De Simone G, Formicola N, Schedl P et al. 2021. Subunits of the pbap chromatin remodeler are capable of mediating enhancer-driven transcription in drosophila. Int J Mol Sci. 22(6).

      Sigrist CJ, Pirrotta V. 1997. Chromatin insulator elements block the silencing of a target gene by the drosophila polycomb response element (pre) but allow trans interactions between pres on different chromosomes. Genetics. 147(1):209-221.

      Udvardy A, Schedl P. 1984. Chromatin organization of the 87a7 heat shock locus of drosophila melanogaster. J Mol Biol. 172(4):385-403.

      Vazquez J, Muller M, Pirrotta V, Sedat JW. 2006. The mcp element mediates stable long-range chromosome-chromosome interactions in drosophila. Molecular Biology of the Cell. 17(5):2158-2165.

      Wolle D, Cleard F, Aoki T, Deshpande G, Schedl P, Karch F. 2015. Functional requirements for fab-7 boundary activity in the bithorax complex. Mol Cell Biol. 35(21):3739-3752.

    1. we used our words we used what words we had to weld, what words we had we wielded, kneeled, we knelt.

      I think the opening lines of the poem are the first of many examples of Choi employing Parallelism in this poem. I think the repetitive nature of this parallelism may be a commentary on how society pushes us to to fit in to a mold both in our daily routine, as well as in our identities.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The manuscript by Lu et al aims to study the effects of tubulin post-translational modification in C. elegans touch receptor neurons. Authors use gene editing to engineer various predicted PTM mutations in a-tubulin MEC-12 and b-tubulin MEC-7. Authors generate and analyze an impressive battery of mutants in predicted phosphorylation site and acetylation site of b-tubulin MEC-7, K40 acetylation site in a-tubulin MEC-12, enzymatic site of the a-tubulin acetyltransferase MEC-17, and PTM sites in the MEC-12 and MEC-7 C-tails (glutamylation, detyrosination, delta-tubulin). This represents a lot of work, and will appeal to a readership interested in C. elegans touch receptor neurons. The major concern/criticism of this manuscript is whether the introduced mutation(s) directly affects a specific PTM or whether the mutation affects gene expression, protein expression/stability/localization, etc. As such, this work does convincingly demonstrate, as stated in the title, that "Editing of endogenous tubulins reveals varying effects of tubulin posttranslational modifications on axonal growth and regeneration." 

      We thank the reviewer for the constructive comments. With regards to the major concern or criticism, we like to point out that we have previously characterized ~100 missense mutations in mec-7 and mec-12 (Zheng et al., 2017, PMID: 28835377; Lee et al., 2021, PMID: 33378215). So, we are familiar with the phenotypes associated with mutations that affect gene expression or protein stability, which mostly result in a null phenotype. When analyzing the PTM site mutants, we compared their phenotypes with the previously categorized phenotypes of null alleles, neomorphic mutations that increase microtubule stability, and antimorphic mutations that prevent polymerization or disrupt microtubule stability. For example, in the case of mec-7 S172 mutations, we found that S172P mutants had the same phenotype as the mec-7 knockout (mild neurite growth defects), suggesting that S172P likely affects protein folding or stability, resulting in the loss of MEC-7. In contrast, S172A and S172E mutations showed phenotypes similar to neomorphic alleles (the emergence of ectopic ALM posterior neurite) and antimorphic alleles (the severe shortening of all neurites in the TRNs), respectively. These phenotypic differences suggested to us that the effects of S172A and S172E mutations cannot be simply attributed to the loss of protein expression and stability. Similar logic was applied to the studies of other PTM-inactivating or -mimicking mutations.

      (2) For example, the authors manipulate the C-terminal tail of MEC-12 and MEC-7, to test the idea that polyglutamylation may be an important PTM. These mutants displayed subtle phenotypes. The authors show that branch point GT335 and polyglutamyation polyE recognizing antibodies stain cultured embryonic touch receptor neurons (TRNs), but did not examine staining in C. elegans TRNs in situ. To my knowledge, these antibodies have not been shown to stain the TRNs in any published papers, raising the question of how these "glutamylation" mutations are affecting mec-12 and -7. The rationale for using cultured embryonic TRNs and the relevance of the data and its interpretation are not clear. 

      The GT335 and polyE antibodies were used by previous studies (O’Hagan et al., 2011, PMID: 21982591; and O’Hagan et al., 2017, PMID: 29129530) to detect the polyglutamylation signals in the sensory cilia of C. elegans. We initially tried to stain the whole animals using these antibodies but could not get clear and distinct signals in the TRNs. We reason that the tubulin polyglutamylation signals in the TRNs may be weak, and the in situ staining method which requires the antibodies to penetrate multiple layers of tissues (e.g., cuticles and epidermis) to reach the TRN axons may be not sensitive enough to detect the signal. In fact, the TRN axons are located deeper in the worm body compared to the sensory cilia that are mostly exposed to the environment. Another reason could be that the tissues (mostly epidermis) surrounding the TRN axons also have polyglutamylation staining, which makes it difficult to recognize TRN axons. This is a situation different from the anti-K40 acetylation staining, which only occurs in the TRNs because MEC-12 is the only a-tubulin isotype that carries K40. Due to these technical difficulties, we decided to use the in vitro cultured TRNs for the staining experiment, which allows both easy access of the antibodies (thus higher sensitivity) and the dissociation of the TRNs from other tissues. The fact that we were able to observe reduced staining in the ttll mutants and the tubulin mutants that lost the glutamate residues suggest that these antibodies indeed detected glutamylation signals in the cells.

      (3) The final paragraph of the discussion is factually incorrect. The C. elegans homologs of the CCP carboxypeptidases are called CCPP-1 and CCPP-6. There are several publications on their functions in C. elegans.

      We thank the reviewer for pointing out the mistake in the text. We intended to say that “there is no C. elegans homolog of the known tubulin carboxypeptidases that catalyze detyrosination”, which is true given that the detyrosinase vasohibins (VASH1/VASH2) homologs cannot be found in C. elegans. We are aware of the publications on CCPP-1 and CCPP-6; CCPP-1 is known to regulate tubulin deglutamylation in the cilia of C. elegans (O’Hagan et al., 2011 and 2017), while CCPP-6 may function in the PLM to regulate axonal regeneration (Ghosh-Roy et al., 2012). In the revised manuscript, we have corrected the error.

      Reviewer #2 (Public Review):

      Summary:

      The tubulin subunits that make up microtubules can be posttranslationally modified and these PTMs are proposed to regulate microtubule dynamics and the proteins that can interact with microtubules in many contexts. However, most studies investigating the roles of tubulin PTMs have been conducted in vitro either with purified components or in cultured cells. Lu et al. use CRISPR/Cas9 genome editing to mutate tubulin genes in C. elegans, testing the role of specific tubulin residues on neuronal development. This study is a real tour de force, tackling multiple proposed tubulin modifications and following the resulting phenotypes with respect to neurite outgrowth in vivo. There is a ton of data that experts in the field will likely reference for years to come as this is one of the most comprehensive in vivo analyses of tubulin PTMs in vivo.

      This paper will be very important to the field, however would be strengthened if: 1) the authors demonstrated that the mutations they introduced had the intended consequences on microtubule PTMs, 2) the authors explored how the various tubulin mutations directly affect microtubules, and 3) the findings are made generally more accessible to non C. elegans neurobiologists.

      (1) The authors introduce several mutations to perturb tubulin PTMs, However, it is unclear to what extent the engineered mutations affect tubulin in the intended way i.e. are the authors sure that the PTMs they want to perturb are actually present in C. elegans. Many of the antibodies used did not appear to be specific and antibody staining was not always impacted in the mutant cases as expected. For example, is there any evidence that S172 is phosphorylated in C. elegans, e.g. from available phosphor-proteomic data? Given the significant amount of staining left in the S172A mutant, the antibody seems non-specific in this context and therefore not a reliable readout of whether MTs are actually phosphorylated at this residue. As another example, there is no evidence presented that K252 is acetylated in C. elegans. At the very least, the authors should consider demonstrating the conservation of these residues and the surrounding residues with other organisms where studies have demonstrated PTMs exist. 

      We thank the reviewer for the comments. To our knowledge, there are very few phosphor-proteome data available for C. elegans. We searched a previously published dataset (Zielinska et al., 2009; PMID: 19530675) and did not find the S172 phosphorylation signal in MEC-7. This is not surprising, given that only six touch receptor neurons expressed MEC-7 and the abundance of MEC-7 in the whole animal lysate may be below the detection limit. However, this phosphorylation site S172 is highly conserved across species and tubulin isotypes (Figure 1-figure supplement 1 in the revised manuscript), suggesting that this site is likely phosphorylated in MEC-7.

      In the case of K252, the potential acetylation site and the flanking sequences are extremely conserved across species and isotypes. In fact, the 20 amino acids from 241-260 a.a. are identical among the tubulin genes of C. elegans, fruit flies, Xenopus, and humans (Figure 4-figure supplement 1B). Thus, although K252 acetylation was found in the HeLa cells, this site can possibly be acetylated. 

      In the case of K40, we observed sequence divergence at the PTM site and adjacent sequences among the tubulin isotypes in C. elegans. MEC-12 is the only C. elegans a-tubulin isotype that has the K40 residue, and the 40-50 a.a. region of MEC-12 appears to be more conserved than other isotypes when compared to Drosophila, frog, and human a-tubulins (Figure 4-figure supplement 1A).

      (2) Given that the authors have the mutants in hand, it would be incredibly valuable to assess the impact of these mutations on microtubules directly in all cases. MT phenotypes are inferred from neurite outgrowth phenotypes in several cases, the authors should look directly at microtubules and/or microtubule dynamics via EBP-2 when possible OR show evidence that the only way to derive the neurite phenotypes shown is through the inferred microtubule phenotypes. For example, the effect of the acetylation or detyrosination mutants on MTs was not assessed. 

      We thank the reviewer for the suggestions. In this study, we created >20 tubulin mutants. Due to limited time and resources, we were not able to examine microtubule dynamics in every mutant strain using EBP-2 kymographs. We assessed the effects of the tubulin mutations mostly based on the changes on neurite growth pattern. From our previous experience of analyzing ~100 mec-7 and mec-12 missense mutations (Zheng et al., 2017, MBoC; Lee et al., 2021, MBoC), we found that the changes in microtubule dynamics are correlated with the changes in neuronal morphologies. For example, the growth of ectopic ALM-PN is correlated with fewer EBP-2 comets and potentially reduced microtubule dynamics; this correlation holds true for several mec-7 neomorphic missense alleles we examined before (Lee et al., 2021, MBoC) and the PTM site mutants [e.g., mec-7(S172A) and mec-12(4Es-A)] analyzed in this study. Similarly, the shortening of TRN neurites is correlated with more EBP-2 comets and increased microtubule dynamics. For the mutants that don’t show neurite growth defects, our previous experience is that they are not likely to show altered microtubule dynamics in EBP-2 tracking experiments. So, we did not analyze the acetylation mutants (which had no defects in neurite growth) and the detyrosination mutants (which had weak ALM-PN phenotype). Nevertheless, we agree with the reviewer that we could not rule out the possibility that there may be some slight changes to microtubule dynamics in these mutants.

      Using tannic acid staining and electron microscopy (EM), we previously examined the microtubule structure in several tubulin missense mutants (Zheng et al., 2017, MBoC) and found that the loss-of-function and antimorphic mutations significantly reduced the number of microtubules and altered microtubule organizations by reducing protofilament numbers. These structural changes are consistent with highly unstable microtubules and defects in neurite growth. On the other hand, neomorphic mutants had only slight decrease in microtubule abundance, maintained the 15-protofilament structure, and had a more tightly packed microtubule bundles that filled up most of the space in the TRN neurite (Zheng et al., 2017, MBoC). These structural features are consistent with increased microtubule stability and ectopic neurite growth. Although we did not directly examine the microtubule abundance and structure using EM in this study, we would expect similar changes that are correlated with the neurite growth phenotypes in the PTM mutants. We agree with the reviewer, it will be informative to conduct more comprehensive analysis on these mutants using EM and other structural biology methods.

      (3) There is a ton of data here that will be important for experts working in this field to dig into, however, for the more general cell biologist, some of the data are quite inaccessible. More cartoons and better labeling will be helpful as will consistent comparisons to control worms in each experiment.

      Response: We thank the reviewer for the comment. In the revised manuscript, we added some cartoons to Figure 2G to show the location of the synaptic vesicles. The neurite growth phenotype should be quite straightforward. Nevertheless, we added one more Figure (Figure 8) to summarize all the results in the study with cartoons that depicted the changes to neuronal morphologies.

      (4) In addition, I am left unconvinced of the negative data demonstrating that MBK does not phosphorylate tubulin. First, the data described in lines 207-211 does not appear to be presented anywhere. Second, RNAi is notoriously finicky in neurons, thus necessitating tissue-specific degradation using either the ZF/ZIF-1 or AID/TIR1 systems which both work extremely well in C. elegans. Third, there appears to be increasing S172 phosphorylation in Figure 3 Supplement 2 with added MBK-2, but there is no anti-tubulin blot to show equal loading, so this experiment is hard to interpret.

      We added the results of mbk-1, mbk-2, and hpk-1 mutants and cell-specific knockdown of MBK-2 into Figure 3-figure supplement 1D. Considering the reviewer’s suggestion, we attempted to use a ZIF-1 system to remove the MBK-2 proteins specifically in the TRNs using a previously published method (PMID: 28619826). We fused endogenous MBK-2 with GFP by gene editing and then expressed an anti-GFP nanobodies fused with ZIF-1 in the TRNs to induce the degradation of MBK-2::GFP. To our surprise, unlike the mbk-2p::GFP transcriptional reporter, the MBK-2::GFP did not show detectable expression in the TRNs, although expression can be seen in early embryos, which is consistent with the “embryonic lethal” phenotype of the mbk-2(-) mutants (Figure 3-figure supplement 2A-B in the revised manuscript). We reason that either endogenous MBK-2 is not expressed in the TRNs or is expressed at a very low level. We then crossed mbk-2::GFP with ItSi953 [mec-18p::vhhGFP4::Zif-1] to trigger the degradation of any potential MBK-2 proteins and did not observe the ectopic growth of ALM-PN (Figure 3- figure supplement 2C). These results suggest that MBK-2 is not likely to regulate tubulin phosphorylation in the TRNs, which is consistent with the results of other genetic mutants and the RNAi experiments.

      For Figure 3 Supplement 2 (Figure 3-figuer supplement 3 in revised manuscript), because we added the same amount of purified MEC-12/MEC-7 to all reactions and had established equal loading in Figure 3E, we did not do the anti-tubulin staining in this experiment. Since higher concentration (1742 nM) of MBK-2 did not produce stronger signal than the condition with 1268 nM, we don’t think the 1268 nM band represents true phosphorylation. Moreover, the signal is not significantly stronger than the control without MBK-2 and is much lower than the signal generated by CDK1 in Figure 3E. Based on these results, we concluded that MBK-2 is not likely to phosphorylate MEC-7.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      General:

      A summary table would help the reader digest the vast amount of phenotypic data.

      Cartoons to help a non-C. elegans reader understand the figures. 

      We added Figure 8 to summarize and illustrate the effects of the various mutants analyzed in this study.

      Specific:

      The authors engineered mutations into the predicted phosphorylation site of b-tubulin mec-7. These CRISPR-alleles mutations phenocopied previously identified loss-of-function, gain-of-function, and neomorphic mec-7 alleles identified in genetic screens by the Chalfie lab. Next, the authors sought to identify the responsible kinase, taking a candidate gene approach. The most likely family - minibrain - had no effect when knocked down/out. The authors showed that cdk-1 mutants displayed ectopic ALM-PN outgrowth. Whether cdk-1 specifically acts in the TRNs was not demonstrated, calling into question whether CDK-1 phosphorylates S172 in vivo. In their introduction (lines 45-59), the authors built a case for engineering PTM mutations directly into tubulins, because the PTM enzymes may have multiple substrates. This logic applies to the cdk-1 experiment and its interpretation. 

      The reviewer is right. Since CDK1 and minibrain kinase are the only known kinases that catalyze S172 phosphorylation, our results suggest that CDK-1 is more likely to catalyze S172 phosphorylation in the TRNs compared to MBK-1/2. Genetic studies found that cdk-1(-); mec-7(S172A) double mutants did not show stronger phenotype than the two single mutants, suggesting that they function in the same pathway. Nevertheless, we could not rule out the possibility that other kinases may also control S172 phosphorylation, and the effect of CDK-1 is indirect. We mentioned this possibility in the revised manuscript.

      For a-tubulin MEC-12, acetyl-mimicking K40Q and unmodifiable K40R mutants failed to stain with the anti-acetyl-a-tubulin (K40) antibody and displayed subtle TRN phenotypes. The enzymatically dead MEC-17 had phenotypes similar to those described by Topalidou (2012), confirming the Chalfie lab finding that MEC-17 has functions in addition and independent of its acetyltransferase activity. The authors moved onto a predicted acetylation site in MEC-7 and observed TRN developmental defects, and acknowledged that this may be due to tubulin instability and not a PTM. This is a concern for all mutants, as there is no way to measure whether the protein is expressed, stable, or localized properly. 

      We acknowledge that this is a caveat of mutational studies. An amino acid substitution at the PTM site may have multiple effects, including the change of the PTM state and potential alteration of protein conformation. Without direct evidence for enzymatic modification of the PTM site in the neurons, we could not rule out the possibility the phenotype we observed is not related to PTM and instead is the result of abnormal protein conformation and function caused by the mutation.

      Nevertheless, as stated in our above response to the first point in the public review, we can phenotypically differentiate loss-of-function and gain-of-function mutants. If the mutation reduces expression or general protein stability, it is more likely to cause a loss-of-function phenotype. For most PTM site mutants, this is not the case. We observed mostly gain-of-function phenotype, suggesting that the missense mutations did not simply inactivate the tubulin protein and instead affected the functional properties of the protein.

      From here, the authors manipulate the C-terminal tail of MEC-12 and MEC-7, testing the idea that polyglutamylation may be an important PTM. These mutants displayed subtle phenotypes. The authors show that branch point GT335 and polyglutamyation polyE recognizing antibodies stain cultured embryonic TRNs, but did not examine staining in TRNs. To my knowledge, these antibodies have not been shown to stain the TRNs in any published papers (see next point). The rationale for using cultured embryonic TRNs is not clear. 

      See our response to the second point in the public review.

      Lines 548-553 There are several publications on CCPP-1 and CCPP-6 functions in TRNs and ciliated sensory neurons. See

      PMID: 20519502

      PMID: 21982591

      PMID: 21943602

      PMID: 23000142

      PMID: 29129530

      PMID: 33064774

      PMID: 36285326

      PMID: 37287505 

      We thank the reviewer for pointing out these references, some of which were cited in the revised manuscript. We made a mistake in the Discussion by saying that there are no C. elegans homologs of tubulin carboxypeptidases while we intended to state that there is no homolog of tubulin detyrosinase in C. elegans. We are aware of the studies of CCPP-1 and CCPP-6 and have corrected the mistake in revised manuscript (also see our response to the third point in the public review).

      Reviewer #2 (Recommendations For The Authors):

      Figures: 

      As stated in the public review, more cartoons and better labeling will be helpful as will consistent comparisons to control worms in each experiment. A good example of this issue is demonstrated in Figure 2 and Figure 4: 

      (1) Figure 2: Please label images with what is being probed in each panel. 

      We added labels to the panels.

      (2) Figure 2G is very hard to interpret - cartoon diagramming what is being observed would be helpful. 

      We added cartoons to help illustrate the images.

      (3) Line 182-185: is this referring to your data or to Wu et al? It is not clear in this paragraph when the authors are describing published work versus their own data presented here. 

      It is from our data. We have made it clear in the revised manuscript.

      (4) Figure 2 - 2K is not well described. What experiment is being done here? What is dlk-1 and why did you look at this mutant? 

      Figure 2K showed that both wild-type animals and S172A mutants could reconnect the severed axons after laser axotomy. Previous studies have found that dlk-1(-) mutants were not able to regenerate axons due to altered microtubule dynamics (PMID: 19737525; PMID: 23000142). We used dlk-1(-) mutants as a negative control, because DLK-1 promotes microtubule growth following axotomy, and the DLK-1 pathway is essential for regeneration (PMID: 23000142). We want to highlight the phenotypic difference between dlk-1(-) mutants and the S172E mutants. Although both mutants showed similar regrowth length, dlk-1(-) mutants showed unbranched regrowth probably due to the lack of microtubule polymerization, whereas the S172E mutants showed a mesh-like regrowth pattern likely due to highly dynamic and unstable microtubules. We explained the different phenotypes in the revised manuscript.

      (5) Figure 4C: this phenotype is hard to interpret. Where is the wt control? Where is the quantification? 

      In the Figure legend, we have referred the readers to Figure 1G for the wild-type image. Quantification is provided in the text (~20% of the animals showed the branching defects).

      (6) There are no WT comparison images in Figure 4I, making the quantification difficult to interpret 

      In the Figure legend, we have referred the readers to Figure 1A for the wild-type control. Moreover, we included a new Figure 8 to summarize the phenotypes of all mutants.

      Experimental:

      (1) Is it clear that only MEC-7/MEC-12 are the only a- and b-tubulin present in the TRNs? The presence of other tubulins not mutated would complicate the interpretation of the results. 

      According to the mRNA levels, the expression of MEC-7 and MEC-12 are >100 fold higher than other tubulin isotypes. For example, single-cell transcriptomic data (Taylor et al., 2021) showed that mec-7 mRNA is at 135,940 TPM in ALM neurons, whereas two other tubulin isotypes, tbb-1 and tbb-2, have expression value of 54 and 554 TPM, respectively in the ALM. So, even if there are some other tubulin isotypes, their abundance is much lower than mec-7 and mec-12 and are not likely to interfere with the effects of the mec-7 and mec-12 mutants.

      (2) The in vitro kinase assays should be quantified. 

      We have added the quantification.

      (3) The idea that Cdk1 phosphorylates tubulin in interphase is surprising and I am left wondering how the authors propose that Cdk1 is activated in interphase. Is cyclin B (or another cyclin) present in interphase in this cell type? Expression but not activation of Cdk1 is not discussed. 

      CDK1 can work with cyclin A and cyclin B. C. elegans has one cyclin A gene (cya-1) and four cyclin B genes (cyb-1, cyb-2.1, cyb-2.2, and cyb-3). According to single-cell transcriptomic data of L4 animals, cya-1 and cyb-1 showed weak expression in many postmitotic neurons (including the ALM neurons), while cyb-2.1, cyb-2.2, and cyb-3 had no expression in neurons. So, it is possible that cya-1/cyclin A and cyb-1/cyclin B has low level of expression in the TRNs. A previous study also found the expression of cell cycle regulators (including cyclins) in postmitotic neurons in mouse brain (Akagawa et al., 2021; PMID: 34746147).

      (4) What is the significance of neurite swelling and looping in Figure 4H? The underlying cause of this phenotype is not described. 

      The neurite swelling and looping phenotype of mec-17(-) mutants were described by Topalidou et al., (2012; PMID: 22658602) and were caused by the bending of the microtubules. It appears that the loss of the a-tubulin acetyltransferase altered the organization of microtubules in the TRNs. These defects were partially rescued by the enzymatically dead MEC-17, suggesting that MEC-17 may play a non-enzymatic (and likely structural) role in regulating microtubule organization. We added more explanation in the revised manuscript.

      (5) It is quite surprising that polyglutamylation is not affected in the quintuple ttll mutant. Since the authors made the sextuple ttll mutant, could they demonstrate whether polyglutamylation is further reduced in this mutant via GT335 staining? 

      We did not make the comparison of the quintuple and sextuple ttll mutants because they were crossed with TRN markers with different colors for technical reasons. The quintuple mutants CGZ1475 carried uIs115 [mec-17p::TagRFP] IV, whereas the sextuple mutants CGZ1474 carried zdIs5 [mec-4p::GFP] I. As a result, we need to use different secondary antibodies for the antibody staining, which makes the results not compatible.

      Polyglutmaylation signal in the cell body was strongly affected by the ttll mutations. In fact, in the ttll-4(-); ttl-5(-); ttll-12(-) triple mutants, the signal is significantly reduced in the cell body of the TRNs, as well as the cell body of other cells. What’s surprising is that the signal in the axons persisted in the ttll triple and quintuple mutants. As the reviewers suggested, we also stained the sextuple mutants and found similar pattern as the triple and quintuple mutants (new Figure 6-figure supplement 1C in the revised manuscript), although the results are not quantitatively comparable due to the use of secondary antibodies with different fluorophores.

      Writing:

      (1) The beginning of the results section is quite jarring. The information in lines 96-104 should be in the Introduction. 

      Due to the nature of this paper, each section deals with a particular PTM. We think it is helpful to discuss some background information before describing our results on each PTM rather than giving all in the introduction. Nevertheless, we modified the beginning of the results to make it more coherent and more connected with the preceding paragraphs.

      (2) Line 122-126: conclusions are not supported by the data: it is suggested from previous experiments, but authors do not look at MTs directly. 

      We have rephrased the statement to acknowledge that we made such conclusion based on phenotypic similarity with mutants we previously examined.

      (3) I am confused by the usage of both mec-12(4EtoA) and mec-12(4Es-A). Are these the same mutations? If so, there needs to be consistency. If not, each case needs to be defined. 

      They are the same. We have corrected the mistake and are now using mec-12(4Es-A) to refer to the mutants.

      Line 105: phosphor --> phospho 

      Line 187: were --> was 

      Line 298: is --> are

      The above typos are corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Recommendations For The Authors):

      I still find it really impressive that the Purkinje cell stimulation so closely mimics the pathogenic phenotypes - in my opinion, the strongest part of the paper. I would like just a little clarification on some of my previous questions.

      Major points:

      (1) Can the authors clarify where the new units came from? Are these units that were recorded before the initial submission and excluded, but are now included? If so, why were they excluded before? Or are these units that were recorded since the original submission?

      The number of units increased in Figure 1 for three reasons: 1) We have now plotted the classifier results in Figure 1 instead of the validation results, which have been moved to Figure 1 Supplement 3. 2) In response to reviewer comments, we no longer include units that had >60 s of recording in both our model creation and validation. We had previously used 30 s for creating the model and a different 30 s for validating the model, if an additional 30 s were available. 3) We changed our model creation and validation strategy based on previous reviewer comments. The new units in Figures 2-4 were taken from our pool of previously collected but unanalyzed data (we collect neural data on a rolling basis and thus these data were not initially available). We were fortunate to have these data to analyze in order to address the concerns about the number of cells included in the manuscript. The number of units increased in Figure 5 because new units were recorded in response to reviewer comments.

      (2) Why did some of the neuron counts go down? For example, in Pdx1Cre;Vglut2fl/fl mice, the fraction of units with the control signature went from 11/21 to 7/23. Is this because the classifier changed between the original submission and the revision?

      Yes, the proportion of cells matching each classification changed due to the different parameters and thresholds used in the updated classifier model.

      Minor points:

      In the Discussion: "We find some overlap and shared spike features between the different disease phenotypes and show that healthy cerebellar neurons can adapt multiple disease-associated spike train signatures." I think "adapt" should be "adopt"

      In the Discussion: "compare" is misspelled as "compared"

      Thank you for bringing these typos to our attention. We will upload a new version of the text with the typos corrected.


      The following is the authors’ response to the original reviews.

      We would like to thank the Reviewers for providing excellent and constructive suggestions that have enabled us to strengthen our overall presentation of our data. We have addressed each of the comments by altering the text, providing additional data, and revising the figures, as requested.

      Below are our explanations for how we have altered the manuscript in this revised version.

      Recommendations for the authors:

      I think you will have seen from the comments that there was great enthusiasm for the importance of this study. There were also shared concerns about how the classifier may be inadequate in its current format, as well as specific suggestions to consider to improve. I hope that you will consider a revision to really amplify the impact of the importance of this study.

      Reviewer #1 (Recommendations For The Authors):

      Distinct motor phenotypes are reflected in different neuronal firing patterns at different loci in motor circuits. However, it is difficult to determine if these altered firing patterns: 1) reflect the underlying neuropathology or phenotype, 2) whether these changes are intrinsic to the local cell population or caused by larger network changes, and 3) whether abnormal firing patterns cause or reflect abnormal movement patterns. This manuscript attempts to address these questions by recording neural firing patterns in deep cerebellar nucleus neurons in several models of cerebellar dysfunction with distinct phenotypes. They develop a classifier based on parameters of single unit spike trains that seems to do an inconsistent job of predicting phenotype (though it does fairly well for tremor). The major limitation of the recording/classifier experiments is the low number of single units recorded in each model, greatly limiting statistical power. However, the authors go on to show that specific patterns of Purkinje cell stimulation cause consistent changes in interposed nucleus activity that map remarkably well onto behavioral phenotypes. Overall, I did not find the recording/classifier results to be very convincing, while the stimulation results strongly indicate that interposed nucleus firing patterns are sufficient to drive distinct behavioral phenotypes.

      We thank the reviewer for their comments. We describe below how we have addressed the major concerns.

      Major concerns:

      (1) I don't think it's legitimate to use two 30-second samples from the same recording to train and validate the classifier. I would expect recordings from the same mouse, let alone the same unit, to be highly correlated with each other and therefore overestimate the accuracy of the classifier. How many of the recordings in the training and validation sets were the same unit recorded at two different times?

      We previously published a paper wherein we measured the correlation (or variability) between units recorded from the same mouse versus units recorded from different mice (see: Van der Heijden et al., 2022 – iScience, PMID: 36388953). In this paper we did not find that nuclei neuron recordings from the same mouse were more correlated or similar to each other than recordings from different mice. 

      Upon this reviewer comment, however, we did observe strong correlations between the two 30-second samples from the same recording units. We therefore decided to no longer validate our classifier based on a training and validation sets that had overlapping units. Instead, we generated 12 training sets and 12 non-overlapping validation sets based on our entire database. We then trained 12 classifier models and ranked these based on their classification ability on the validation sets (Figure 1 – supplemental Figure 3). We found that the top two performing classifier models were the same, and used this model for the remainder of the paper. 

      (2) The n's are not convincing for the spike signature analyses in different phenotypic models. For example, the claim is that Pdx1Cre;Vglut2fl/fl mice have more "control" neurons than ouabain infusion mice (more severe phenotype). However, the numbers are 11/21 and 7/20, respectively. The next claim is that 9/21 dystonic neurons are less than 11/20 dystonic neurons. A z-test for proportions gives a p-value of 0.26 for the first comparison and a pvalue of 0.44 for the second. I do not think any conclusions can be drawn based on these data.

      We included more cells in our analyses and found that the z-test for n the proportion of cells with the “control” and “dystonia” signature is indeed statistically significant. 

      (3) Since the spiking pattern does not appear to predict an ataxic phenotype and the n's are too small to draw a conclusion for the dystonic mice, I think the title is very misleading - it does not appear to be true that "Neural spiking patterns predict behavioral phenotypes...", at least in these models.

      We have changed the title to: “Cerebellar nuclei cells produce distinct pathogenic spike signatures in mouse models of ataxia, dystonia, and tremor.” We feel that this new title captures the idea that we find differences between spike signatures associated with ataxia, dystonia, and tremor and that these signatures induce pathological movements.

      (4) I don't think it can be concluded from the optogenetic experiments that the spike train signatures do not depend on "developmental changes, ...the effect of transgene expression, ... or drug effects outside the cerebellum." The optogenetic experiments demonstrate that modulating Purkinje cell activity is sufficient to cause changes in DCN firing patterns and phenotypes (i.e., proof-of-principle). However, they do not prove that this is why DCN firing is abnormal in each model individually.

      Thank you for highlighting this section of the text. We agree that the optogenetic experiments cannot explain why the DCN is firing abnormally in each model. We have edited this section of the text to prevent this conclusion from being drawn by the readers.

      Minor points:

      (1) It would be nice to see neural recordings in the interposed nucleus during Purkinje terminal stimulation to verify that the firing patterns observed during direct Purkinje neuron illumination are reproduced with terminal activation. This should be the case, but I'm not 100% certain it is.

      We have edited the text to clarify that representative traces and analysis of interposed nucleus neurons in response to Purkinje terminal stimulation are the data in Figure 5.

      (2) How does the classifier validation (Fig. 1E) compare to chance? If I understand correctly, 24/30 neurons recorded in control mice are predicted to have come from control mice (for example). This seems fairly high, but it is hard to know how impressive this is. One approach would be to repeat the analysis many (1000s) of times with each recording randomly assigned to one of the four groups and see what the distribution of "correct" predictions is for each category, which can be compared against the actual outcome.

      We have now also included the proportion of spike signatures in the entire population of neurons and show that the spike signatures are enriched in each of the four groups (control, ataxia, dystonia, tremor) relative to the presence of these signatures in the population (Figure 1E). 

      (3) I don't think this is absolutely necessary, but do the authors have ideas about how their identified firing patterns might lead to each of these phenotypes? Are there testable hypotheses for how different phenotypes caused by their stimulation paradigms arise at a network level?

      We have added some ideas about how these spike signatures might lead to their associated phenotypes to the discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) As mentioned earlier, my main concern pertains to the overall architecture and training of the classifier. Based on my reading of the methods and the documentation for the classifier model, I believe that the classifier boundaries may be biased by the unequal distribution of neurons across cerebellar disease groups (e.g., n=29 neurons in control versus n=19 in ataxics). As the classifier is trained to minimize the classification error across the entire sample, the actual thresholds on the parameters of interest may be influenced by the overrepresentation of neurons from control mice. To address this issue, one possible solution would be to reweight each group so that the overall weight across classes is equal. However, I suggest a better strategy might be to revise the classifier architecture altogether (as detailed below).

      We have retrained the classifier model based on equal numbers of ataxic, dystonic, and tremor cells (n=20) but we intentionally included more control cells (n=25). We included more control cells because we assume this is the baseline status for all cerebellar neurons and wanted to avoid assigning disease signatures to healthy neurons too easily. 

      (2) As the authors make abundantly clear, one mouse model of disease could potentially exhibit multiple phenotypes (e.g., a mouse with both ataxia and tremor). To address this complexity, it might be more valuable to predict the probability of a certain CN recording producing specific behavioral phenotypes. In this revised approach, the output of the classifier wouldn't be a single classification (e.g., "this is an ataxic mouse") but rather the probability of a certain neural recording corresponding to ataxia-like symptoms (e.g., "the classifier suggests that this mouse has a 76% likelihood of exhibiting ataxic symptoms given this CN recording"). This modification wouldn't require additional data collection, and the exemplar disease models could still be used to train such a revised network/classifier, with each mouse model corresponding to 0% probability of observing all other behavioral phenotypes except for the specific output corresponding to the disease state (e.g., L7CreVgat-fl/fl would be 0% for all categories except ataxia, which would be trained to produce a score of 100%). This approach could enhance the validation results across other mouse models by allowing flexibility in a particular spike train parameter to produce a diverse set of phenotypes.

      This is a great comment. Unfortunately, our current dataset is constrained to fully address this comment for the following reasons:

      - We have a limited number of neurons on which we can train our classifier neurons. Further dividing up the groups of neurons or complicating the model limited the power of our analyses and resulted in overfitting of the model on too few neurons.

      - The recording durations (30 seconds) used to train our model are likely too short to find multiple disease signatures within a single recording. We feel that the complex phenotypes are likely resulting from cells within one mouse exhibiting a mix of disease signatures (as in the Car8wdl/wdl mice).

      We think this question would be great for a follow-up study that uses a large number of recordings from single mice to fully predict the mouse phenotype based on the population spike signatures. 

      To limit confusion about our classifier model, we have also altered the language of our manuscript and refer to the cells exhibiting a spike signature instead of predicting a phenotype. 

      However, the paper falls short in terms of the classifier model itself. The current implementation of this classifier appears to be rather weak. For instance, the crossvalidated performance on the same disease line mouse model for tremor is only 56%. While I understand that the classifier aims to simplify a high-dimensional dataset into a more manageable decision tree, its rather poor performance undermines the authors' main objectives. In a similar vein, although focusing on three primary features of spiking statistics identified by the decision tree model (CV, CV2, and median ISI) is useful for understanding the primary differences between the firing statistics of different mouse models, it results in an overly simplistic view of this complex data. The classifier and its reliance on the reduced feature set are the weakest points of the paper and could benefit from further analysis and a different classification architecture. Nevertheless, it is commendable that the authors have collected high-quality data to validate their classifier. Particularly impressive is their inclusion of data from multiple mouse models of ataxia, dystonia, and tremor, enabling a true test of the classifier's generalizability.

      We intentionally simplified our parameter space from a high-dimensional dataset into a more manageable decision tree. We did this for the following reasons:

      - The parameters, even though all measuring different features, are highly correlated (see Figure 1 – supplemental Figure 2). Further, we were training our dataset on a limited number of recordings. We found that including all parameters (for example using a linear model) caused overfitting of the data and poor model performance.

      - Describing the spike signatures using a lower number of parameters allowed us to design optogenetic parameters that would mimic this parameter space. This would be infinitely more complex with a bigger parameter space. 

      We agree with the reviewer that inclusion of multiple mouse models in addition to the optogenetics experiments provide the classifier’s generalizability. 

      Minor Comments:

      (1) The blown-up CN voltage traces in Figures 5C and Supplementary Figure 2B appear more like bar plots than voltage traces on my machine.

      Thank you for bringing this to our attention. We have improved the rendering of the traces.

      (2) The logic in lines 224-228 is somewhat confusing. The spike train signatures are undoubtedly affected by all the factors mentioned by the authors. What, I believe, the authors intend to convey is that because changes in CN firing rates can be driven by multiple factors, it is the CN firing properties themselves that likely drive disease-specific phenotypes.

      We agree that our discussion of the CN firing needs clarification. We have made the appropriate edits in the text.

      Reviewer #3 (Recommendations For The Authors):

      It's quite astounding that this can be done from single spike trains from what are almost certainly mixed populations of neurons. Could you add something to the discussion about this? Some questions that could be addressed would be would multiple simultaneous recordings additionally help classify these diseases, or would non-simultaneous recordings from the same animal be useful? Also more discussion about which cells you are likely recording from would be useful.

      Thank you for this suggestion. We have added discussion about multiple recordings, simultaneous vs non-simultaneous recordings, and our thoughts on the cell population recorded in this work.

      Data in figure 2 is difficult to understand - it appears that the majority of dysregulated cells in 2 ataxic models are classified as dystonia cells, not ataxic cells. This appears surprising as it seems to be at odds with earlier data from Fig 1. In my opinion, it is not discussed adequately in the Results or Discussion section.

      We have added further discussion of the ataxia models represented in Figures 1 and 2.

      Minor comment:

      The colours of the subdivisions of the bars in 2C and 3C, and the rest of the paper appear to be related to the groups in the middle (under "predicted"), but the colours are much paler in the figure than in the legend, although the colours in the bars and the legends match in the first figure (1E). Does this signify something?

      These figures were remade with the same colors across the board.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study by Prieto et al. faces the increasingly serious problem of bacterial resistance to antimicrobial agents. This work has an important element of novelty proposing a new approach to control antibiotic resistance spread by plasmids. Instead of targeting the resistance determinant, plasmid-borne proteins are used as antigens to be bound by specific nanobodies (Nbs). Once bound plasmid transfer was inhibited and Salmonella infection blocked. This in-depth study is quite detailed and complex, with many experiments (9 figures with multiple panels), rigorously carried out. Results fully support the authors' conclusions. Specifically, the authors investigated the role of two large molecular weight proteins (RSP and RSP2) encoded by the IncHI1 derivative-plasmid R27 of Salmonella. These proteins have bacterial Ig-like (Big) domains and are expressed on the cell surface, creating the opportunity for them to serve as immunostimulatory antigens. Using a mouse infection model, the authors showed that RSP proteins can properly function as antigens, in Salmonella strains harboring the IncHI1 plasmid. The authors clearly showed increased levels of specific IgG and IgA antibodies against these RSP proteins proteins in different tissues of immunized animals. In addition, non-immunized mice exhibited Salmonella colonization in the spleen and much more severe disease than immunized ones. 

      However, the strength of this work is the selection and production of nanobodies (Nbs) that specifically interact with the extracellular domain of RSP proteins. The procedure to obtain Nbs is lengthy and complicated and includes the immunization of dromedaries with purified RPS and the construction of a VHH (H-chain antibody variable region) library in E. coli. As RSP is expressed on the surface of E. coli, specific Nbs were able to agglutinate Salmonella strains harboring the p27 plasmid encoding the RSP proteins. 

      The authors demonstrated that Nbs-RSP reduced the conjugation frequency of p27 thus limiting the diffusion of the amp resistance harbored by the plasmid. This represents an innovative and promising strategy to fight antibiotic resistance, as it is not blocked by the mechanism that determines, in the specific case, the amp resistance of p27 but it targets an antigen associated with HincHI- derivative plasmids. Thus, RPS vaccination could be effective not only against Salmonella but also against other enteric bacteria. A possible criticism could be that Nbs against RSP proteins reduce the severity of the disease but do not completely prevent the infection by Salmonella.

      It is true that vaccina2on of mice with purified RSP protein did not provide complete protec2on against infec2on with a Salmonella strain harboring an IncHI plasmid. As this finding is based on an animal model, further inves2ga2on is required to evaluate its clinical efficacy. In any case, even par2al protec2on provided by nanobodies or by a vaccine could poten2ally improve survival rates among cri2cally ill pa2ents infected with a pathogenic bacterium harboring an IncHI plasmid. An addi2onal beneficial aspect of our approach is that it will reduce dissemina2on of IncHI plasmids among pathogenic bacteria, which would reduce the presence of an2bio2c resistance plasmids in the environment and in the bacteria infec2ng pa2ents. 

      Reviewer #2 (Public Review):

      Summary:

      This manuscript aims to tackle the antimicrobial resistance through the development of vaccines. Specifically, the authors test the potential of the RSP protein as a vaccine candidate. The RSP protein contains bacterial Ig-like domains that are typically carried in IncHl1 plasmids like R27. The extracellular location of the RSP protein and its role in the conjugation process makes it a good candidate for a vaccine. The authors then use Salmonella carrying an IncHl plasmid to test the efficacy of the RSP protein as a vaccine antigen in providing protection against infection of antibioticresistant bacteria carrying the IncHl plasmid. The authors found no differences in total IgG or IgA levels, nor in pro-inflammatory cytokines between immunized and non-immunized mice. They however found differences in specific IgG and IgA, attenuated disease symptoms, and restricted systemic infection.

      The manuscript also evaluates the potential use of nanobodies specifically targeting the RSP protein by expressing it in E. coli and evaluating their interference in the conjugation of IncHl plasmids. The authors found that E. coli strains expressing RSPspecific nanobodies bind to Salmonella cells carrying the R27 plasmid thereby reducing the conjugation efficacy of Salmonella. 

      Strengths:

      The main strength of this manuscript is that it targets the mechanism of transmission of resistance genes carried by any bacterial species, thus making it broad.

      The experimental setup is sound and with proper replication.

      Weaknesses:

      The two main experiments, evaluating the potential of the RSP protein and the effects of nanobodies on conjugation, seem as parts of two different and unrelated strategies.

      In preparing our manuscript, we were aware that we included two different strategies to combat an2microbial resistance. However, we deemed it valuable to include both in the paper. The development of new vaccines and the inhibi2on of the transfer of an2bio2c resistance determinants are currently considered relevant approaches to combat an2microbial resistance. Our inten2on in the ar2cle is to integrate these two strategies. 

      The survival rates shown in Figure 1A and Figure 3A for Salmonella pHCM1 and non-immunized mice challenged with Salmonella, respectively, are substantially different. In the same figures, the challenge of immunized mice and Salmonella pHCM1 and mice challenged with Salmonella pHCM1 with and without ampicillin are virtually the same. While this is not the only measure of the effect of immunization, the inconsistencies in the resulting survival curves should be addressed by the authors more thoroughly as they can confound the effects found in other parameters, including total and specific IgG and IgA, and pro-inflammatory cytokines.

      Overall the results are inconsistent and provide only partial evidence of the effectiveness of the RSP protein as a vaccine target.

      To address the concerns regarding the disparities in survival rates depicted in Figures 1A and 3A, it is important to refer to several factors that contribute to these variations. Firstly, it should be noted that the data depicted in these figures stem from distinct experimental sets conducted at different times employing different batches of mice. Despite the use of the same strain and supplier, individual animals and their batches can exhibit variability in susceptibility to infection due to inherent biological differences.

      Unlike in vitro cell culture experiments, which can achieve high replicability due to the homogeneity of cell lines, in vivo animal studies often exhibit greater variability. This variability is influenced not only by genetic variations within animal populations, even if originating from the same supplier, but also by environmental factors within the animal facility. These factors include temperature variations, the concentration y of non-pathogenic microorganisms in the facility, which can modify the immune responses, or the density of animals in the environment, consequently affecting human traffic and generating potential disturbances. 

      When designing experiments with animals, it is desirable for the results to be consistent across different animal batches. If one bacterial strain exhibits higher mortality rates than another across multiple experimental series, this pattern should be reproducible despite the inherent variability in in vivo studies. It is more important to demonstrate consistency in trends than to focus on absolute figures when validating experimental results. 

      It is also important to clarify that when we refer to survival rates, it doesn’ t necessarily mean that the animals were found deceased. The animal procedures were approved by the Ethics Committee of Animal Experimentation of the Universitat de Barcelona, which include an animal monitoring protocol. Our protocol requires close daily monitoring of several health and behavioral parameters, each evaluated according to specific criteria. When an animal reaches a predetermined score threshold indicating severe distress or suffering, euthanasia is administered to alleviate further suffering. At this point, biological samples are collected for subsequent analysis.

      The conjugative experiments use very long conjugation times, making it harder to assess if the resulting transconjugants are the direct result of conjugation or just the growth of transconjugants obtained at earlier points in time. While this could be assessed from the obtained results, it is not a direct or precise measure.

      In the conjuga2on experiments we u2lized a reduced number of donor cells expressing the RSP protein and of recipient cells, as well as long conjuga2on 2mes, to reflect more accurately a situa2on that may occur naturally in the environment. Short conjuga2on 2mes are efficient in controlled laboratory condi2ons using high densi2es of donor and recipient cells, but these condi2ons are not commonly found in the environment. For the interference of the conjuga2ve transfer of the IncHI plasmid we used an E. coli strain displaying the nanobody binding RSP to simulate a process that could be also scaled-up in a natural environment (i.e., a probio2c strain in a livestock farm) and that could be cost effec2ve. See discussion sec2on, lanes 326-328.   

      While the potential outcomes of these experiments could be applied to any bacterial species carrying this type of plasmids, it is unclear why the authors use Salmonella strains to evaluate it. The introduction does a great job of explaining the importance of these plasmids but falls short in introducing their relevance in Salmonella.

      The prevalence of IncHI plasmids in Salmonella was indicated in the introduc2on sec2on, lanes 65-67. Nevertheless, we understand the reviewer’s cri2cisms and have modified both these sentences in the introduc2on sec2on and also added comments in the results sec2on (lanes 118-128).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I understand working with mice can be challenging in terms of repeating experiments to further support the study's claims. For this reason, I think the authors need to discuss more thoroughly the following things:

      Can the authors comment on why the presence of Ampicillin leads to a lower upregulation of proinflammatory cytokines in the spleen despite harboring resistance against ampicillin?

      At the intestinal level, physiological inflammatory responses play a crucial role in enabling the host to identify foreign and commensal bacterial antigens and initiate a highly regulated and "controlled" immune response (Fiocchi, 2008. Inflamm Bowel Dis. 2008, 14 Suppl 2:S77-8). The administration of antibiotics such as ampicillin, reduces the load of intestinal resident microbiota, thereby lowering the extent of intestinal immune activation. This decline in immune activation extends to systemic levels, potentially accounting for the reduced expression of proinflammatory cytokines observed in the spleen.

      There are inconsistent results in the survival rates in Figures 1A and 3A, please discuss how this could alter the observed differences in total and specific IgG and IgA, and pro-inflammatory cytokines.

      To address the reviewer concerns regarding the discrepancies in survival rates shown in Figures 1A and 3A, and how these differences might influence the observed variations in total and specific IgG and IgA, as well as pro-inflammatory cytokines, it is important to clarify the terminology used in our study. In our context, "survival" does not solely refer to mortality per se, but encompasses the endpoints defined by our animal welfare protocols, which are rigorously supervised by the Animal Experimentation Ethics Committee of the University of Barcelona. Our protocol mandates close daily monitoring of several health and behavioral parameters, each scored according to specific criteria. When an animal reaches a predefined score threshold indicating severe distress or suffering, euthanasia is conducted to prevent further distress, at which point we collect biological samples for analysis.

      In contrast to in vitro cell culture experiments, which often achieve high replicability thanks to the homogeneity of cell lines, in vivo animal studies frequently display greater variability. This variability stems not only from genetic differences within animal populations, even if originating from the same supplier, but also from environmental factors within the animal facility. These factors encompass variations in temperature, the presence of non-pathogenic microorganisms in the facility (capable of altering immune responses) and the density of animals, which can impact human traffic and potentially lead to disturbances. 

      The experiments depicted in Figs. 1A and 3A were separated in time, and hence may be influenced by environmental factors within the animal facility. Nevertheless, in the comparative analysis performed between immunized and non-immunized animals, experiments were performed simultaneously and hence under similar environmental conditions in the animal facility. For several parameters (i.e., immunoglobulins and proinflammatory cytokines) statistically significant differences were observed. 

      Regarding the conjugation assays, it is not entirely clear to me why the conjugation times are so long. It would be beneficial to have more data about the conjugation efficacy between the donor and recipient without any E. coli expressing the nanobodies at different time intervals. This would help to differentiate between transconjugants and transconjugants obtained from early conjugation events.

      This comment is par2ally answered in a previous response, regarding the numbers of donor and recipient cells and dura2on of conjuga2on.  We note here that in fig. 9, the requested experiment with donor and recipient cells without E. coli interferent cells is already present, corresponding to the label “none”. To avoid confusion, we have modified the legend in fig. 9.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study uses a novel experimental design to elegantly demonstrate how we exploit stimulus structure to overcome working memory capacity limits. While the behavioural evidence is convincing, the neural evidence is incomplete, as it only provides partial support for the proposed information compression mechanism. This study will be of interest to cognitive neuroscientists studying structure learning and memory.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Huang and Luo investigated whether regularities between stimulus features can be exploited to facilitate the encoding of each set of stimuli in visual working memory, improving performance. They recorded both behavioural and neural (EEG) data from human participants during a sequential delayed response task involving three items with two properties: location and colour. In the key condition ('aligned trajectory'), the distance between locations of successively presented stimuli was identical to their 'distance' in colour space, permitting a compression strategy of encoding only the location and colour of the first stimulus and the relative distance of the second and third stimulus (as opposed to remembering 3 locations and 3 colours, this would only require remembering 1 location, 1 colour, and 2 distances). Participants recalled the location and colour of each item after a delay.

      Consistent with the compression account, participants' location and colour recall errors were correlated and were overall lower compared to a non-compressible condition ('misaligned trajectory'). Multivariate analysis of the neural data permitted decoding of the locations and colours during encoding. Crucially, the relative distance could also be decoded - a necessary ingredient for the compression strategy.

      Strengths:

      The main strength of this study is a novel experimental design that elegantly demonstrates how we exploit stimulus structure to overcome working memory capacity limits. The behavioural results are robust and support the main hypothesis of compressed encoding across a number of analyses. The simple and well-controlled design is suited to neuroimaging studies and paves the way for investigating the neural basis of how environmental structure is detected and represented in memory. Prior studies on this topic have primarily studied behaviour only (e.g., Brady & Tenenbaum, 2013).

      Thanks for the positive comments and excellent summary.

      Weaknesses:

      The main weakness of the study is that the EEG results do not make a clear case for compression or demonstrate its neural basis. If the main aim of this strategy is to improve memory maintenance, it seems that it should be employed during the encoding phase. From then on, the neural representation in memory should be in the compressed format. The only positive evidence for this occurs in the late encoding phase (the re-activation of decoding of the distance between items 1 and 2, Fig. 5A), but the link to behaviour seems fairly weak (p=0.068).

      Thanks for raising this important concern. The reviewer is correct that in principle subjects should employ the compression strategy during the encoding phase when sequence stimuli are presented, yet our results show that the 1-2 trajectory could only be decoded during the late encoding phase.

      Meanwhile, subjects could not get enough information to form the compressed strategy for the location and color sequences until the appearance of the 3rd item. Specifically, based on the first two items, the 1st and 2nd item, they only learn whether the 1st-2nd trajectories are congruent between location and color features. However, they could not predict whether it would also apply to the incoming 2nd-3rd trajectory. This is exactly what we found in neural decoding results. The 1st-2nd trajectory could be decoded after the 2nd item presentation, and the 2nd-3rd trajectory appears after the 3rd item onset. Most critically, the 1st-2nd trajectory is reactivated after the 3rd item but only for alignment condition, implicating formation of the full-sequence compression strategy wherein the previously formed 1st-2nd trajectory is reactivated to be connected to the 2nd-3rd trajectory.

      Regarding the difference between higher- and lower-correlation groups, previously we used the time window based on the overall 2nd-3rd neural reactivations, which might not be sensitive to reactivation strength. We now re-chose the time window based on the higher-correlation group (bootstrap test, p = 0.037, two sides).

      Results have been updated (Figure 5; Results, Page 16). Interpretations about the formation of compression strategy during encoding phase have been added to Results (Page 15-16) and Discussion (Page 18).

      Stronger evidence would be showing decoding of the compressed code during memory maintenance or recall, but this is not presented. On the contrary, during location recall (after the majority of memory maintenance is already over), colour decoding re-emerges, but in the un-compressed item-by-item code (Fig. 4B). The authors suggest that compression is consolidated at this point, but its utility at this late stage is not obvious.

      Thank you for the important question we apologize for omitting previously - neural evidence for the compressive account.

      The reason we did not perform neural decoding during maintenance is that previous EEG/MEG studies including our own failed to reveal robust and sustained time-resolved memory decoding during this period. This is posited to arise from “activity-silent” WM states, wherein memories are not necessarily retained in sustained firing but silently stored within connection weights of WM networks (Stokes, Trends Cogn. Sci., 2015; Rose, Curr Dir Psychol Sci, 2020). Our previous work showed that by transiently perturbing the 'activity-silent' WM using a retrocue or neutral impulse, memories could be reactivated and robustly decoded from neural activities (Huang et al., eLife, 2021). However, due to the lack of transient events during retention in the current design, we do not expect robust decoding results during maintenance. As shown below (AB), this is indeed what we have observed, i.e., no robust neural decoding of trajectories during retention.

      We further used alpha-band (8-11 Hz) neural activities, which have been shown to carry WM information (de Vries et al., Trends Cogn. Sci, 2020; Foster et al., Curr. Biol, 2016; Fukuda et al., J. Neurophysiol, 2016; Sutterer et al., PLOS Biol., 2019) to perform decoding analysis of compression trajectories during maintenance. As shown below, the alpha-band decoding results are indeed stronger than raw activities. Importantly, as shown below (CD), the aligned condition indeed showed significant and long-lasting decoding of compression trajectories (1st-2nd, 2nd-3rd) during retention, while the misaligned condition only showed decoding at the beginning (GH), which might be due to the non-specific offset response of the 3rd item. The results, although not as clear as those during encoding and recalling periods, support the reviewer’s hypothesis that the compressive strategy, if exploited, would be demonstrated during both encoding and maintenance periods. New results and related discussion have been added (Page 16, Supplementary Figure 4).

      With regards to the observed item-by-item color replay during location recall, the reviewer was concerned that this was not consistent with the compressive account, given the lack of trajectory decoding.

      First, item sequences stored in compressive formats need to be converted to sequences during serial recall. In other words, even though color and location sequences are retained in a compressive format (i.e., common 1st-2nd, 2nd-3rd trajectories) throughout the encoding and retention phases, they should be transferred to two sequences as outputs. This is exactly why we performed decoding analysis on individual color and location items rather than trajectories.

      Second and most importantly, we observed serial replay of color sequences when recalling locations. In our view, these results constitute strong evidence for common structure, since the spontaneous color replay during location recall for aligned condition highlights the close bound between color and location sequences stored in WM. In fact, item-by-item serial replay has been well acknowledged as a critical neural index of cognitive maps, not only for spatial navigation but also for higher-order tasks (e.g., Liu et al., Cell, 2019; Liu et al., Science, 2021). Therefore, spontaneous color sequence replay during location sequence recall supports their shared underlying cognitive map.

      Finally, spontaneous serial replay is also correlated with the reactivation of compressive trajectories during encoding (Supplementary Figure 3). This further indicates that serial replay during recalling is associated with memory reorganization formed during encoding.

      Taken together, we posit that memories need to be converted to sequences as outputs, which leads to serial reactivations during recalling. Importantly, the observed spontaneous replay of color sequences for the aligned condition provides strong evidence supporting the associations between color and location sequences in WM.

      We have now added relevant interpretations and discussions (Page 11&13).

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors wanted to test if using a shared relational structure by a sequence of colors in locations can be leveraged to reorganize and compress information.

      Strength:

      They applied machine learning to EEG data to decode the neural mechanism of reinstatement of visual stimuli at recall. They were able to show that when the location of colors is congruent with the semantically expected location (for example, green is closer to blue-green than purple) the related color information is reinstated at the probed location. This reinstatement was not present when the location and color were not semantically congruent (meaning that x displacement in color ring location did not displace colors in the color space to the same extent) and semantic knowledge of color relationship could not be used for reducing the working memory load or to benefit encoding and retrieval in short term memory.

      Weakness:

      The experiment and results did not address any reorganization of information or neural mechanism of working memory (that would be during the gap between encoding and retrieval).

      We apologize for not presenting clear neural evidence for memory reorganization, particularly neural decoding during WM maintenance and retrieval, in the previous version. As below, we explain why the findings provide converging neural evidence for WM reorganization based on a shared cognitive map.

      First, during the encoding phase when location and color sequences are serially presented, our results reveal reactivation of the 1st-2nd trajectories upon the onset of the 3rd item when location and color sequences are aligned with each other. The reactivation of 1st-2nd trajectory right after the emergence of 2nd-3rd trajectory for aligned but not for misaligned sequences strongly supports WM reorganization, since only stimulus sequences that could be compressed based on shared trajectories (aligned condition) show the co-occurrence of 1st-2nd and 2nd-3rd trajectories. Moreover, the relevance of 1st-2nd reactivation to behavioral measurements of color-location reorganization (i.e., behavioral trajectory correlation, Figure 5D) further indicates its link to WM reorganization.

      Second, the reason we originally did not perform neural decoding during maintenance is that previous EEG/MEG studies including our own failed to reveal robust and sustained time-resolved memory decoding during this period. This is posited to arise from “activity-silent” WM states, wherein memories are not necessarily retained in sustained firing but silently stored within connection weights of WM networks (Stokes, Trends Cogn. Sci., 2015; Wolff et al., Nat. Neurosci, 2017; Rose et al., Curr Dir Psychol Sci, 2020). Our previous work showed that by transiently perturbing the 'activity-silent' WM using a retrocue or neutral impulse, memories could be reactivated and robustly decoded from neural activities (Huang et al., eLife, 2021). However, due to the lack of transient events during retention in the current design, we do not expect robust decoding results during maintenance. As shown in Supplementary Figure 4(AB), this is indeed what we have observed, i.e., no robust neural decoding of trajectories during retention.

      We then used alpha-band (8-11 Hz) neural activities, which have been found to carry WM information (de Vries et al., Trends Cogn. Sci, 2020; Foster et al., Curr. Biol, 2016; Fukuda et al., J. Neurophysiol, 2016; Sutterer et al., PLOS Biol., 2019) to perform decoding analysis of compression trajectories during maintenance. As shown below, the alpha-band decoding results are indeed stronger than raw activities. Importantly, as shown in Supplementary Figure 4(CD), the aligned condition indeed showed significant and long-lasting decoding of compression trajectories (1st-2nd, 2nd-3rd) during retention, while the misaligned condition only showed decoding at the beginning (GH), which might be due to the non-specific offset response of the 3rd item. The results, although not as clear as those during encoding and recalling periods, thus also support WM reorganization.

      Finally, during the recalling period, we observed automatic serial replay of color sequences when recalling locations. In our view, these results constitute strong evidence for common structure, since the spontaneous color replay during location recall for aligned condition highlights the close bound between color and location sequences stored in WM. In fact, item-by-item serial replay has been well acknowledged as a critical neural index of cognitive maps, not only for spatial navigation but also for higher-order tasks (e.g., Liu et al., Cell, 2019; Liu et al., Science, 2021). Therefore, spontaneous replay of color sequence during location recall supports their shared underlying cognitive map. Moreover, the spontaneous serial replay is correlated with the reactivation of compressive trajectories during encoding (Supplementary Figure 3). This further indicates that serial replay during recalling is associated with memory reorganization formed during encoding.

      Taken together, we have added updated results about the maintenance period (Page 16, Supplementary Figure 4) and included clarifications and interpretations about why the findings during the encoding and retrieval periods support the WM reorganization view (Page 15-16).

      There was also a lack of evidence to rule out that the current observation can be addressed by schematic abstraction instead of the utilization of a cognitive map.

      The likely impact of the initial submission of the study would be in the utility of the methods that would be helpful for studying a sequence of stimuli at recall. The paper was discussed in a narrow and focused context, referring to limited studies on cognitive maps and replay. The bigger picture and long history of studying encoding and retrieval of schema-congruent and schema-incongruent events is not discussed.

      We agree with the reviewer that cognitive map referred here could be understood as schematic abstraction. Cognitive map refers to the internal representation of spatial relations in a specific environment (Tolman 1948). Schematic abstraction denotes a more broad range of circumstances, whereby the gist or structure of multiple environments or episodes can be integrated (Bartlett, 1932; Farzanfar et al., Nat. Rev. Neurosci, 2023).

      In other words, schema refers to highly abstract framework of prior knowledge that captures common patterns across related experiences, which does not necessarily occur in a spatial framework as cognitive maps do. Meanwhile, in the current design, we specifically manipulate the consistency of spatial trajectory distance between color and location sequences. Therefore, we would argue that cognitive map is a more conservative and appropriate term to frame our findings.

      Relevant discussions have been added (Page 3&19).

      We apologize for the lack of more generalized discussion and have added schema-related literatures. Thanks for the suggestion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Do time-frequency-domain data (e.g., alpha-band power) in the delay provide evidence for delay-period decoding of trajectory lengths? This might strengthen the case for compression.

      Thanks for the suggestion. We now performed decoding analysis of the delay period based on alpha-band power. As shown in supplementary figure 4, both the 1st-2nd and 2nd-3rd trajectories could be decoded for the aligned condition.

      Added in supplementary figure 4 and Page 16.  

      (2) Do participants erroneously apply the compression strategy in the misaligned condition? This would not show up in the trajectory error correlation analysis, but might be visible when examining correlations between raw trajectory lengths.

      Thanks for raising this interesting suggestion. To test the hypothesis, we chose a typical misaligned condition where 1st-2nd trajectory distances are same between location and color sequences, while the 2nd-3rd trajectory distances are different between the two features.

      In this case, participants might exploit the compression strategy for the first two items and erroneously apply the strategy to the 3rd item. If so, we would expect better memory performance for the first two items but worse memory for the 3rd item, compared to the rest of misaligned trials. As shown below, the 1st-2nd aligned trials showed marginally significant higher performance than misaligned trials for the first two items (t(32) = 1.907, p = 0.066, Cohen’s d = 0.332) . Unfortunately, we did not find significant worse performance for the 3rd item between the two conditions (t(32) = -0.4847, p = 0.631, Cohen’s d = -0.084). We observed significant interactions between the last two items and the alignment effect (t(32) = 2.082, p = 0.045, Cohen’s d = 0.362), indicating a trend of applying wrong compression strategy to the 3nd item.

      Author response image 1.

      (3a) Some more detail on some of the methods might help readers. For instance, did trajectories always move in a clockwise direction? Could the direction reverse on the third item? If not, did this induce a response bias? Could such a bias possibly account for the trajectory error correlations

      Sorry for the unclear statement. For individual trial, both the color and location features of the three items are randomly selected from nine possible values without any constraint about the directions. That is to say, the trajectories can move in a clockwise or anticlockwise direction, and the direction can also reverse on the third item in some trials. Thus, we think the current design can actually help us to reduce the influence of response bias. Taking a step back, if trajectory error correlations are due to response bias, we should expect consistent significant correlation for all conditions, instead of only observing significant correlation for 1st-2nd and 2nd-3rd trajectories but not for 1st-3rd trajectory and only in aligned trajectory condition but not in misaligned condition. Therefore, we think the trajectory error correlations cannot be simply explained by response bias.

      Details have been added (Page 23).

      (3b) Is the colour wheel always oriented the same way for a participant? If so, given there are only nine colors, it seems possible that colors are mapped to locations and remembered in a location code instead. This does not seem to be a problem in principle for the behavioural findings, but might change the interpretation of what is being decoded from the EEG. If this is a possibility then this might be acknowledged.

      The color wheel is always oriented the same way for each participant. We agree with the reviewer that it is possible that participants tend to map colors to locations and remembered in a location code. We don’t have sufficient evidence to rule out this possibility. One possible way could be running another experiment with varied color wheel during response period. Meanwhile, we would like to point out that the underlying logic of the current design is based on the facts that thinking spatially is intuitive and spatial metaphors like “location” and “distance” is commonly used to describe world, e.g., the well-known mental number line (Dehaene et al., JEP: General, 1993). Therefore, we expected participants to associate or integrate location and color maps based on trajectory distance.

      The reviewer is correct that the color decoding would reflect spatial location rather than the genuine color feature. This is actually the point of the experimental design, whereby two irrelevant features could be possibly combined within a common cognitive map. Without the realignment of the two feature maps defined in space, subjects could not at all form the strategy to compress the two sequences. In other words, decoding of color sequences could be understood as neural representation of a series of corresponding locations along the ring that are independent of the physical locations of the items.

      Interpretations and clarifications have been added (Page 23&26).

      (4) Does the discretisation of the stimulus distribution (to only 9 possible locations) make the compression strategy easier to use? If the features had been continuously distributed across the location/colour circle, would participants still pick up on and use the shared trajectory structure?

      Thanks for the question. Without further data, it’s hard to say whether the discretization of the stimulus distribution would make the compression strategy easier to use or not, compared to continuous distribution. Both outcomes seem possible. On the one hand, discrete stimulus distribution would result in discrete trajectory distribution, which helps participants to realize the common trajectory strategy. On the other hand, discrete stimulus distribution would result in category or label representation, which may weaken the effectiveness of structure compression strategy. We postulate that our findings could be generalized to continuous trajectories in a cognitive map within certain resolution.

      (5a) Minor point: I disagree that avoiding the same points for location and colour for a given item allows them to be independently decoded. I would argue the contrary - this kind of constraint should create a small anti-correlation that in principle could lead to spurious decoding of one variable (although this seems unlikely here).

      We appreciate the concern. As mentioned above, with discrete stimulus distribution (9 possible values for both color and location domains), it is quite possible that a fraction of trials would share same values in location and color. Therefore, the neural decoding for one domain might be confounded by another domain. To dissociate their neural representations, we imposed constraints that color and location could not occupy the same value for a given item.

      We agree that this kind of constraint might create a small anti-correlation, even though it is not observed here. Future studies using continuous stimulus distribution would reduce the correlation or anti-correlation between stimuli.

      (5b) Very minor point: 1,000 permutations for significance testing seems on the low side. Since some of the p-values are close to 0.05 it may be worth running more permutations.

      Thanks for this suggestion. We got similar results using 1000 or 10000 permutations.

      (6) Missing reference: H. H. Li et al., 2021 (line 213) seems not to be on the list of references.

      Sorry for the mistake. Added.

      Reviewer #2 (Recommendations For The Authors):

      The study aimed to discuss the working memory mechanism, instead, it seems to be focused on the encoding and recall strategies after a short while, I recommend updating the manuscript to refer to the relevant cognitive mechanism.

      There was a strong voice on the effect of using the cognitive map in working memory, without any tests on if indeed a cognitive map was used (for example the novel link between stimuli and how a cognitive map can be used to infer shortcuts). Was the participant required to have any mental map beyond the schema of the shown color ring?

      In the current experiment, to discuss if the effect is driven by utilizing a cognitive map or schematic abstraction of color-relatedness, further analysis is required to possibly assess the effects of schema on neural activity and behavior. Namely,<br /> (1) Was there any reinstatement of schematically congruent (expected) colors that were probed by location 1, at locations 2 and 3 in the MAT condition?

      Thanks for pointing out this possibility. However, we don’t think there will be stable color expectations given location information under the MAT condition. First, as the trajectory distance varied on a trial-by-trial basis, no prior common trajectory knowledge could be used to make inference about the current stimuli in individual trial. Second, the starting points for color and location (1st item) were randomly and independently selected, such that color sequence could not be predicted based on the location sequence for both aligned and misaligned conditions.

      (2) Given that response time can be a behavioral marker of schematic conflict, was the response time faster for congruent than incongruent conditions?

      Thanks for this question. Unfortunately, due to the experimental design, the response time could not be used as a behavioral marker to infer mental conflicts, since participants were not required to respond as fast as possible. Instead, they took their own pace to reproduce sequences without time limit. They could even take a short break before submitting their response to initiate the next trial.

      (3) In case you cannot rule out that utilizing schema is the cognitive mechanism that supports working memory performance (the behavior), please add the classical literature (on the memory of schematically congruent and incongruent events) to the discussion.

      Thanks for this suggestion and we have added relevant literatures now (Page 3&19).

      (4) On page 6, 'common structure in the cognitive map' is the schema, isn't it?

      Correct. Based on our understanding, ‘common structure in the cognitive map’ is a spatial schema.

      (5) In Figure 2 EFG, would you please use a mixed effect model or show evidence that all participants demonstrated a correlation between the location trajectory error and color trajectory error?

      Thanks for the suggestion. We have added the mixed effect model results, which are consistent with Figure 2EFG (AT: 1st-2nd trajectory, β = 0.071, t = 4.215, p < 0.001; 2nd-3rd trajectory, β = 0.077, t = 3.570, p < 0.001; 1st-3rd trajectory, β = 0.019, t = 1.118, p = 0.264; MAT: 1st-2nd trajectory, β = 0.031, t = 1.572, p = 0.116; 2nd-3rd trajectory, β = 0.002, t = 0.128 , p = 0.898; 1st-3rd trajectory, β = -0.017, t = -1.024, p = 0.306).

      In general, doesn't such correlation just show that good participants/trials were good (some did well in the study and some did poorly throughout?)

      We don’t think the trajectory error correlation results just reveal that some participants did well and some participants did poorly. If that is the case, we shouldn’t observe significant correlation in Figure 2D, where we first run correlation for each participant and then test correlation significance at group level. Indeed, trajectory error correlation between color and location domains characterizes the consistent changes between the two domains.

      It is worth to note that the correlation was estimated with signed trajectory errors in color and location domains, which meant that we indeed cared about whether the errors in the two domains were consistently varied in the same direction, i.e., whether longer trajectory memory compared to the actual trajectory in location domain would predict longer trajectory memory in color domain.

      Moreover, as shown in Figure 2EFG, by dividing trials into 4 bins according to the location trajectory error for each participant and pooling the data across participants, we observed 4 clusters along x-axis (location trajectory error). This suggests that participants’ memory performance is rather consistent instead of being extremely good or bad. Besides, if trajectory error correlation is due to different overall memory performance between participants, we should observe significant trajectory error correlations both in AT and MAT conditions, instead of only under AT condition and for 1st-2nd and 2nd-3rd trajectories but not for 1st-3rd trajectory.

      In Figure 2 G, is the marginal error just too big to be sensitive? I am not sure what we are learning here, please clarify.

      Sorry for the confusion. To examine this possibility, we excluded errors which are beyond 2.5 * σ, and still observed non-significant 1st-3rd trajectory error correlation between color and location domains (r = 0.119, p = 0.167).

      The 1st-3rd trajectory showed nonsignificant behavioral correlation and neural representation, which suggests that the current sequential memory task would encourage participants to organize all information by relying more on the adjacent items and their distance. Thus, we think the 1st-3rd trajectory would serve as a control trajectory, which helps us not only exclude other possible explanation (e.g., systematic response bias), but also validate current findings both in behavioral and neural level.

      Results and statements (Page 10-11) added now.

      Author response image 2.

      (6) Regarding the first lines on page 11, did you do qualitative research to know if less information was encoded in congruent conditions?

      The current experimental design is inspired by the mental compression of spatial sequence studies from Dehaene’s lab (Amalric er al., 2017; Roumi et al., 2021), in which they propose that human brain compresses spatial sequence using an abstract language and formalize minimal description length of a sequence as the “language-of-thought complexity.” Based on this evidence, we think less information is required to describe congruent condition compared to incongruent condition. This idea is supported by better memory performance for congruent condition. Unfortunately, we couldn’t manage to quantify how less information was encoded in congruent condition.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 *

      1. The authors conclude that RFP-Ac expression is restricted to emerging SOPs and surroundings cells at 18h APF, indicating that Ac is activated later than Sc. Can the authors provide images for RFP-Ac expression at 10h and 16h APF similar to GFP-Sc as shown in their figures. Do the SOPs that contain high levels of both Ac and Sc (as some SOPs have Sc expression but not Ac) undergo fate divergence and SB faster than the SOPs containing higher levels of only Sc?

      We are now showing the expression pattern of GFP-SC and RFP-Ac/GFP-Ac in fixed samples stained also for E-cad at 13h, 16h and 18h APF (Fig 1I-K' and Fig S1E-G'). Ac and Sc were found to be activated around the same time. However, Ac appeared to accumulate at lower levels than Sc prior to SOP selection in the central domain of the ADHN (Fig 1J-K'). We also confirmed that Ac was more strongly expressed in SOPs. Additionally, SOPs appeared to accumulate both Ac and Sc, i.e. SOPs with high levels of GFP-Sc also showed a strong RFP-Ac signal (Fig S1H-H'). Finally, since RFP-Ac was not detectable in living pupae, possibly due to the rapid turn-over of Ac and the slow maturation of RFP, we could not study more precisely the relative dynamics of Ac and Sc. For the same reason, we could not address whether the rate of fate divergence (measured using GFP-Sc) varied with the level of Ac.

      2. It would be interesting to see the spatial and temporal dynamics of Ac and Sc in Notch mutants or even Notch dynamics in Sc and Ac mutants to better understand the progression fate divergence and its effect on lateral inhibition in real time.

      Following the reviewer's suggestion, we examined the expression pattern of NRE-deGFP, a Notch activity reporter, in ac sc double mutant pupae at 16h and 24h APF (Fig S3A-D). This showed that the initial pattern of NRE-deGFP at 16h APF (signal detected in posterior ADHN cells as well as in the ADHN) did not depend on Ac and Sc. By contrast, the second phase of NRE-deGFP expression (in cells of the proneural ADHN domain, around emerging SOPs) was found to depend on the activity of Ac and Sc. Thus, strong Notch activation observed in cells surrounding emerging SOPs was found to depend on the activity of Ac-Sc, presumably because Ac and Sc are required for SOP specification and SOPs produce Delta, serving as the local source to activate Notch (see also our response to reviewer 3, point #6). Thus, since NRE-deGFP was not up-regulated in the proneural ADHN domain of sc10-1 ac3 mutant pupae, a quantitative analysis of the dynamics of NRE-deGFP may not be informative.

      The reviewer also suggested us to study the dynamics of GFP-Sc in Notch mutants. One can easily predict that most Notch mutant cells would accumulate GFP-Sc, as observed in the notum (PMID: 28386027). Therefore, analysis of fate symmetry breaking is unlikely to be useful in that context. Likewise, a FDI analysis would not be relevant. From a technical point of view, live imaging of GFP-Sc would have to be performed in Notch mutant clones. This is because RNAi against Notch (strong 10xUAS-Notch hp2 construct, PMID: 19487563) driven by escargot-Gal4 to knock down Notch in larval histoblasts only led to a partial loss of Notch function (our unpublished data). Generation of Notch mutant clones in the abdomen would require constructing appropriate GFP-Sc Notch FRT recombinant chromosome as well as generating a new FRT GFP-Sc chromosome with an infrared marker (not currently available) to compare the relative dynamics of GFP-Sc in wild-type and mutant cells. In sum, this proposed experiment would take a significant amount of time and is unlikely to shed new light. Given that this experiment is not essential to support the claims of the paper and that it is not clear to us what would be learnt from this experiment, we opted for not performing this experiment.

      Minor comments * 1. In figure 1F and F', the authors mention GFP-Sc is not expressed prior to 14h, however, there is still GFP signal detected in their imaging. Can the authors comment what would be the cause of this GFP signal or was it due to non-specific background signal during their imaging analysis?*

      We thank the referee for raising this issue. Yes, a strong autofluorescence signal was detected prior to the onset of GFP-Sc expression. We provide below the results of our analysis of the autofluorescence signal (Fig R1) relative to the nuclear signal (Fig R2), and how normalization of the signal was used to measure the specific GFP-Sc signal.

      Analysis of the autofluorescence signal over time

      To estimate the autofluorescence signal, we measured the average intensity of the signal acquired in the GFP channel for each frame and plotted these values over time. The results are shown in Fig R1 below:

      *Fig R1: temporal profile of the autofluorescence signal *

      Each measurement corresponds to the average intensity measured in the GFP channel over the entire field at each z-section and for each time point. Mean and SD values of measured are shown over time in black and grey, respectively. Time is in frame number (dt is 2.5 min). The data shown above corresponds to movie 1 (see also Fig 2).

      This plot indicates that the autofluorescence signal was progressively bleached. We therefore excluded from our analysis the first 50 time points when the autofluorescence signal was initially strong. No nuclear GFP-Sc signal was detectable in these first 50 frames in the cells of the central area of the ADHN which are studied here (see Fig 2A', t=1:12, time frame #29).

      While revising the manuscript, we realized that t=0 corresponded to two distinct time points in the first version of our manuscript: it corresponded to the onset of imaging in Fig 2A-D', and to t=2:08 (time frame #51) in all other figures showing data following removal of the first 50 time points. We have now fixed this issue and are presenting all data with t=0 corresponding to the onset of imaging.

      Analysis of the nuclear fluorescence signal over time

      To detect the nuclear GFP-Sc signal, we measured the average intensity of the signal acquired in the GFP channel (raw intensity values corresponding to the sum of the GFP-Sc and autofluorescence signals) in segmented nuclei (in 3D, within the entire z-stack). These values were plotted over time (pink curve in Fig R2 below; the autofluorescence is plotted in black, as in Fig R1, for the sake of comparison). This showed that the intensity of the signal measured in nuclei was initially identical to the mean intensity measured across the entire field of view, indicative of autofluorescence only. A specific increase in signal intensity in nuclei (relative to the entire field of view) was detectable after 2h of imaging (time frame 48 in Fig R1; dt is 2.5 min). Importantly, mean intensity values of the autofluorescence signal appeared to be approximately 10-fold stronger than the mean intensity associated with the nuclear GFP-Sc signal.

      Fig R2: temporal profile of the GFP-Sc signal

      *The plot in pink corresponds to the average intensity in the GFP channel (raw intensity values corresponding to the GFP-Sc and/or autofluorescence signals) per nucleus (within the entire z-stack) for each time point. Mean and SD values measured in each nucleus are shown over time (in pink; these data correspond to movie 1; shown also in Fig 3). This plot (pink) should be compared with the plot shown in Fig R1 (also in black in Fig R2). The intensity difference between the pink and black curves was attributed to the specific GFP-Sc signal. *

      Signal normalization and analysis of the GFP-Sc signal

      In our study, we normalized the GFP-Sc signal by dividing the averaged value measured in each single nucleus (data corresponding to the pink curve in Fig R2) by the mean value of the signal measured at the same time point in the same channel in the entire image stack (data corresponding to the black curve in Fig R1/R2). Given the low intensity of the GFP-Sc signal, and the small number of pixels corresponding to Scute-expressing nuclei over the entire field of view, this value should closely reflect the autofluorescence noise. Thus, the background autofluorescence signal should be close to 1. This was experimentally verified by measuring the normalized intensity values of the PDHN nuclei that did not express Scute: a mean intensity value of 0.96 +/- 0.10 was measured (at time frame #51; see Fig R1 below). In contrast, the normalized GFP-Sc values measured several hours before SB were found to be close to 1.1 (see Fig 3D). Whether these values reflect very low levels of nuclear GFP-Sc that cannot be detected visually or result from imperfect normalization of the signal remain unclear. Given the intensity and non-uniformity of the autofluorescence signal, we cannot exclude the latter. For this reason, we chose to not over-interpret the initial low intensity values of GFP-Sc.

      In the materials and methods, the authors mention that prior to imaging the larvae and pupae are grown at 18, 21 or 25{degree sign}C. Is there a reason why the larvae and pupae are grown at different temperatures for different experiments? Can the authors specify (i.e. in the figure legends) in which experiments different temperatures were used?

      Larvae and pupae were grown at different temperatures for convenience, i.e. to adapt the time interval between staging at 0h APF and mounting for live imaging. Indeed, it is much easier to obtain 10-14h APF pupae by collecting staged pupae at 0h APF the day before and incubating them overnight at lower temperature to slow-down development. However, all live imaging experiments were performed at 23-25{degree sign}C, and we have no reason to think that this prior incubation would affect the process studied here.

      The citations need to have a better format as they show up as each citation within a single bracket which makes it a little hard to read when multiple references are cited in a single sentence. fixed

      In the abstract, the sentence 'Unexpectedly, we observed at low frequency (10%) pairs of cells that are in direct contact at the time of SB'. SB should be replaced with "Symmetry breaking", as it appeared for the first time in the manuscript and should be written out in full. fixed

      Throughout the manuscript there are instances where the abbreviations are written in full with the abbreviation in brackets after they have already been introduced in the introduction which can be changed to just the abbreviation itself. fixed

      In the discussion on page 11, 'our observation...', our needs to be changed to Our. fixed

      7. It would be nice to have arrow heads or dotted lines around the cells or areas on interest in both, all the figures and movies, so that it will be easier to follow the results. The videos have a lot of background due to fragmented apoptotic nuclei, etc. as mentioned by the authors, hence arrow heads or dotted lines would bring viewers focus on the areas of interest.

      fixed (see for instance Fig 1D, Fig 2A, Fig 5B, Fig 7A, Fig S3D, etc...)

      8. It would be helpful to have anterior - posterior axis (i.e. with an arrow) shown on top of all the figures.

      In our earlier version, we indicated that 'In this and all other figures, dorsal is up and anterior is left' in the legend of Fig 1B. We have now moved this sentence at the end of Fig 1 to have it more apparent. Additionally, the AP axis is now clearly indicated in Fig 1C. We believe that it is not necessary to repeat this orientation in all figures.

      Scale bars are missing in all figures, videos, and figure legends. Added

      Only movies 1 and 3 are referenced in the text. All movies are now referenced in the text

      Keeping the colors in the movies and figures consistent and same would be helpful. For example, Movie 2 Histone3.3-mIFP marker is in blue but in figure 3 it is in magenta. fixed (H3.3-mIFP in magenta in this movie, now numbered 3)

      As mentioned above, it would be helpful if the authors have arrow heads or dotted lines around the cells or areas of interest in both the figures and movies for better representation of their data. For example, movie 1 shows a larger area of imaging than shown in figure 2A, which makes it hard to follow the cells of interest in the movie.

      An additional movie corresponding to the SOP shown in Fig 2A is now provided (new movie 2).

      --

      Reviewer #2

      1. Despite "symmetry breaking" being the main focus of the paper, in the Introduction, the authors do not explain what this term means and do not provide any description of this process. This is a critical point that makes understanding of the goals of the paper difficult. Therefore, the authors are encouraged to provide more information and a clear description of this term/phenomenon. We thank the reviewer for this suggestion, we are now stating in the introduction what symmetry breaking means in the context of lateral inhibition: 'To describe and study the process of SOP selection, we studied fate SB. The latter refers to the transition point when one cell, the future SOP, starts to stably accumulate a higher level of GFP-Sc relative to its immediate neighbors.'

      The role of Achaete in the story is not clear. Even though both factors are required for SOP determination, the authors mainly focus on Scute, so it is not very clear what the role of Achaete in this process is, if there is any. As shown in the paper, Achaete is expressed later when heterogeneity is promoting cell fate divergence. Is Achaete maybe contributing to cell heterogeneity/ cell fate divergence?

      We thank the reviewer for raising this point. We now show in Fig S1A-D that abdominal bristles develop in a protein null allele of sc (scM6 ) as well as in an ac mutant corresponding to a 45 kb deletion that removes ac but not sc (PMID: 16216235)). Together with our analysis of sc10-1 ac3 __mutant flies, we can now conclude that __Sc and Ac act redundantly for SOP specification in the pupal abdomen. We have also further studied the expression of Ac relative to Sc and E(spl)HLH-m3 (see our response above to point #1 of reviewer 1). We fully agree with the reviewer that cell-to-cell variations in Ac expression might contribute to proneural heterogeneity and SB. This is now briefly discussed.

      Minor points: * * 1. Symmetry Breaking (SB) should be abbreviated in the Abstract. The authors initially use the full term without abbreviation, and only on page 5, the abbreviation is finally defined; however, it should be introduced much earlier.

      fixed

      The second-to-last sentence in the abstract, "These lateral inhibition defects were correlated via cellular rearrangements," is unclear regarding what defects the authors are referring to.

      This sentence was rewritten: 'Live imaging showed that these patterning defects were corrected via cellular rearrangements associated with global tissue fluidity, not via cell fate change.'

      For clarity, being more specific in the text in regards to description of the figure panels would be beneficial (e.g. page 3 Fig 1C-E); referring to C-E together makes it hard to understand what does each panel shows.

      fixed

      In many instances, the movies are not properly referenced (e.g. on page 5, third row simply states "movies"), making it difficult to discern which movie should be checked. On page 8, when authors refer to movie 3, they likely meant movie 5.

      fixed

      Figure S1 requires some corrections.

      We thank the reviewer for helping us improve the presentation of our results.

      The authors use the short name "scute" initially and then switch to the shortened version "sc'.

      fixed

      Additionally, the nlsRFP (blue) is difficult to see; adjusting the levels or changing colors/showing separate channels may improve visibility.

      The authors mention clone borders, but none are shown. It would greatly help to outline the borders in all figures.

      The ubiquitous nlsRFP marker is now shown in magenta in Fig S1I that now shows only 2 channels to outline the ADHN (white dotted line) and the clones (yellow dotted lines).

      We also outlined the clone borders in Fig 4C,C'.

      Genotypes of the samples should be indicated, and clarification is needed regarding what "n" represents (number of cells, clones, or flies).

      The genotype studied in Fig S1 and Fig 4 (which is the only complex genotype studied here) is now indicated in the Methods section. We have clarified what the different 'n' meant, in Fig 4 (see text) and elsewhere (see legend of Fig S2 for instance).

      What do the arrows in the panel B show?

      Thanks for pointing this out. The arrows in Fig S1I' indicate Cut/Hnt-positive cells (SOPs) within the clones (as now explained in the legend).

      It is also recommended to display important channels as separate black and white images.

      Separate channels are now shown in Fig S1 and S3.

      Additionally, the use of RNAi against GFP instead of RNAi against scute should be justified; using RNAi GFP as the genotype on the graph could be interpreted as a control genotype rather than downregulation of scute.

      A RNAi construct against GFP was used because this construct was known to very efficient and specific. Indeed, a strong knock-down of GFP-Sc was obtained by this approach (see Fig 4C'). We did not test sc RNAi constructs in the context of GFP-Sc. To avoid confusion, we are now indicating Sc downregulation (gfp RNAi) in Fig 4C'.

      In the Figure 2 Legend, the authors use "std" as an abbreviation to define standard deviation. Typically, this is abbreviated as SD.

      fixed

      In Figure 4E, the authors do not explain on why there are points on the x-axis that correspond to a decimal number of cells.

      Since heterogeneity was calculated over a 20 min interval, we likewise calculated the number of neighbors over the same time interval. Thus, the number of neighbors for each SOP corresponds to an averaged value calculated over this time interval. This is now explained in the legend.

      --

      Reviewer #3

      1. First and foremost, the authors should state in the first paragraph of the Results that scGFP is a CRISPR knockin and thus it's the only source of Sc protein in the animals imaged (this is stated only in the Methods section). Thanks for this comment, we agree that this is one of the strengths of our work that we should emphasize. We now state in the results section: 'GFP-Sc is produced from the endogenous locus such that all Sc molecules produced in these pupae are GFP-tagged'

      The magnitude of the Sc increase should be commented on. Based on the intensity and FDI plots in Fig. 3B-E, an increase of 15-17% in the amount of Sc is suggested (the FDI plateaus at 0.08, which gives 1.08/0.92 = 1.17x the level of Sc in the SOP vs the surrounding cells). However, in the stills shown in Fig. 2BCD and in Fig. 3A, the intensity differential between SOPs and neighbors seems at least 100% (ie at least double the intensity, which would yield an FDI of >1/3 =0.33). Why is this high contrast never seen in the quantitative measurements?

      Thanks for asking about the fold change of GFP-Sc levels in SOPs, from SB to its plateau. This fold change can be seen in Fig 3D: the normalized value of GFP-Sc is 1.12 at SB, and 1.26 three hours after SB (when the FDI plateaus), indicative of a 2.2x fold increase of GFP-Sc in SOPs (0.26/0.12= 2.2, following background subtraction; see our detailed response to reviewer #1, minor point 1, about background signal analysis and normalization of the signal). This fold-change value is now indicated in the legend of Fig 3D. Obviously, this fold-change value is highly sensitive to signal normalization. Since the autofluorescence signal was stronger than the GFP-Sc signal (see Fig R2 above) and varied over time (due to bleaching; see Figs R1 and R2 above), we feel that this fold-change value should be taken with a grain of salt.

      From Fig. 2A-D it appears that the ScGFP fluorescence intensity is at the same level or weaker than nearby autofluorescence. Please state (1) how you confirmed that the histoblast nest has lower autofluorescence than the larval epidermis and (2) how you corrected for histoblast nest autofluorescence in your quantifications.

      As detailed above (our response to reviewer #1, minor point 1), the specific GFP-Sc signal is ten-times lower than the autofluorescence signal. We did not compare the autofluorescence signal produce by larval and imaginal cells (but note that larval epidermal cells had a stronger autofluorescence signal; see the yellow dots in Fig 2A). Normalization of the signal to correct for autofluorescence was explained in the Methods (and is also detailed above in our response to reviewer #1, minor point 1).

      The paradoxical result of Fig. S1B should be discussed. On the one hand it is stated that "Ac and Sc specify the fate of the Sensory Organ Precursor cells (SOPs)" (p.2) and on the other S1B shows SOP specification in the absence of Sc. Are the SOPs shown in Fig S1B rare exceptions? Do the authors believe that these rare exceptions are there because of inefficient RNAi (since in comparison with S1A, in the null condition almost no SOPs should be formed)? Or they are the SOPs in RNAi clones as rare as the occasional bristles in S1A?

      We do not see the result of Fig S1B as paradoxical but interpreted this result assuming that Ac and Sc were redundant for SOP determination. We now provide clear genetic evidence in support of this view (see our response above to reviewer #2, point 2). Otherwise, we found that RNAi is efficient (see loss of the GFP signal in clone in Fig. 4C'). In adult males, the density of bristles appeared to be quite normal over clonal patches of gfp RNAi cells (not shown), consistent with Ac being redundant with Sc

      One figure that is not straightforward to interpret is Fig. 4E. It plots ScGFP heterogeneity vs. number of RNAi neighbors. Each point in the plot must be an individual SOP (165 total). Therefore, its neighbors (the x-axis) should take integral (not decimal) values. How can a single SOP have a decimal number of RNAi neighbors, especially since heterogeneity was sampled over a 10min time-window, when not much cell rearrangement can take place? Please explain.

      Since heterogeneity was calculated over a 20 min interval, we likewise calculated the number of neighbors over the same time interval. Thus, the number of neighbors for each SOP corresponds to an averaged value calculated over this time interval. This is now explained in the legend: 'Note that the number of neighbors was likewise calculated over this time interval, and the resulting number of neighbors may not take an integral value.'

      I found the discussion of the Notch reporter dynamics (Fig. 7) confusing in several places. * * (6a) Whereas it's clear that there is plenty of Notch signaling going on before SBN, the authors repeatedly imply that Notch signaling starts after SBN. For example, in the Results (p.9) they state "Thus, this quantitative approach failed to detect a phase of reciprocal Notch signaling during which proneural cluster cells would both send and receive a Delta-Notch signal prior to SOP emergence." The fact that the NRE-deGFP gave a robust signal before the start of the movies clearly means that mutual inhibition was going on for quite some time before SB. In fact, an FDI of 0 for >4h prior to SBN (Fig. 7G) means exactly this: that the level of Notch response among the cluster cells is equivalent ("mutual inhibition" lasts for at least 4h before SBN). (6b) In the first paragraph of this section (p.8) they comment that the pre-existence of Notch signaling is unexpected - why? I interpret it to simply be mutual inhibition (see above). Then they go on to quantitate the average Notch response intensity over the entire posterior ADHN (please define the borders the "posterior" ADHN). I question the informational value of this analysis (averaging over a large region), when Notch signaling is known to have intense local cell-to-cell variability (also evident in the stills shown in Fig. 7A,B,C).

      We apologize for not describing well enough the data shown in Fig 7E, and for not explaining clearly our interpretation of the NRE-deGFP signal.

      While the observation of a strong NRE-deGFP signal indeed indicates that Notch signaling had been active prior to the time of observation (in this sense, Notch is indeed active long before SBN), this does not necessarily imply that Notch is still active at that time. This is because the deGFP protein produced by the NRE-deGFP reporter is stable relative to the time scale of the studied process. Its measured half-life in S2 cells cultured at 25{degree sign}C is 2h (PMID: 31140975). Based on this data, the NRE-deGFP signal is likely to remain detectable several hours after Notch signaling has been switched off. If the rate of production of deGFP is lower than its rate of degradation, then the NRE-deGFP signal is expected to progressively decay over time. We believe that this is what we observed in our movies: while a strong signal was detected over the posterior half of the ADHN at 14-15h APF, this signal decreased over time (Fig 7D). To interpret the temporal dynamics of NRE-deGFP signal in terms of instantaneous Notch activity, we examined the Rate of Change (ROC): an increase of the NRE-deGFP signal over time (positive ROC) would indicate that Notch activity is increasing (more precisely, the production rate of deGFP is higher than its rate of degradation), whereas a decrease (negative ROC) indicates that Notch becomes less active (or inactive if the rate of decrease approximates the decay rate of the deGFP protein). Our data shown in Fig 7D showed that the NRE-deGFP signal (measured in the area indicated with a dotted line in Fig 7A,B; this area was defined by the initial pattern of NRE-deGFP) decreased over time (negative ROC) between t=1 and t=6.5h. We therefore conclude that Notch signaling is decreasing to reach a minimum at t=~3.5h, indicating that the level of Notch activity is at its lowest around the time of SB. At this minimum, the decay rate corresponds to a protein half-life of 4.4h, which is not so different from the measured half-time of deGFP in S2 cells (particularly if one assumes a 1.4x difference between the decay rates measured at 22 and 25{degree sign}C, based on the known temperature-dependent speed of development). This is why we conclude that Notch signaling is very low at this stage. Additionally, no NRE-deGFP signal was detected before t=4:30h (movie 7) in the initially NRE-deGFP negative cells (located anterior to the area indicated with a dotted line in Fig 7A). This indicated that Notch was activated late in this area. Together, our observations are not consistent with the view that Notch mediates a strong mutual inhibition signal over a prolonged time interval prior to SB.

      To further study the pattern of Notch activity, we have monitored over time the accumulation pattern of GFP-tagged E(spl)m3-HLH (GFP-m3) (PMID: 31375669) in fixed sample (Fig S3F-G'). This confirmed that Notch was active in posterior ADHN cells and in the PDHN prior to 14h APF, i.e. prior to the onset of Ac and Sc, and that Notch activation extended to the central ADHN domain at 17-18h APF (Fig S3E-E' and G-G', and Fig 7I-I''), coinciding with SOP emergence.

      Otherwise, the reviewer is correct when stating that a FDI value close to 0 indicates that the level of measured fluorescence among the different cells of the considered cluster is similar. Such a FDI value would be measured if cells did not express NRE-deGFP or had decreasing but similar levels of NRE-deGFP. This FDI value does not, per se, imply that Notch is active.

      And then they move on to a (much more informative) cell-by-cell analysis, without even changing paragraphs, making it hard for the reader to follow. (6c) The conclusion at the end of the second paragraph (p. 9) "It also showed that SB was detected soon after the onset of Notch-mediated inhibitory signaling." is nowhere supported by data. If I understand well, SB refers to Sc and "the onset of Notch-mediated inhibitory signaling" refers to SBN (which is the onset of ASYMMETRY in Notch signaling, not the onset of Notch signaling, which has been going on for hours earlier). I don't see any data comparing SB with SBN. In fact, this is an important question to address (see below - comment 10).

      We apologize for the lack of clarity in our writing, we meant: "It also showed that SBN was detected soon after the onset of Notch-mediated inhibitory signaling."

      Yes, SBN refers to the onset of asymmetry in Notch signaling, as measured using NRE-deGFP. As explained above (but see also our response to point #7 below), our data do not provide evidence for a detectable Notch signal prior to SBN.

      We agree that comparing SB and SBN would be nice. Unfortunately, our current tools do not permit a detailed comparison (see our detailed response below, point #10).

      Mutual inhibition amongst neighboring cells has been proposed to involve (besides mutual Notch signaling) an increase in Sc levels in 2-3 cells in a cluster before the singularization of a single SOP. The authors seem rather biased against such a transient Sc hike based on their results in Fig. 2D, where the neighboring cells stay at rather constant basal Sc levels for several hours, while the Sc SB event happens. However, looking at an individual SOP in Fig. 2B, I do detect a mild hike in the pink curve right around SB in the blue curve. Could the average result from 160 SOPs (in Fig 2D) simply blur such transient Sc hikes, if they happen with different kinetics for different SOPs? Couldn't the 10% of SOP twins (shown in Fig. 6) represent a special case of this transient "subcluster" Sc hike? I would appreciate some discussion on this point. [Whether Sc is transiently upregulated or not, however, does not change my firm conclusion - from the data presented - that Notch-mediated mutual inhibition has been going on long before SBN.]

      First, our data are consistent with the notion that a few proneural cells progressively accumulate higher level of Scute prior to SB (as proposed above). Indeed, the moderate increase in both GFP-Sc levels and coefficient of variation values (GFP-Sc heterogeneity) seen prior to SB correspond to what the reviewer has in mind (higher levels of GFP-Sc in a few proneural cluster cells). We also appreciate the reviewer's comment about the plot shown in Fig 2D. However, we strongly feel that our quantitative analysis of a large dataset is a strength. Thus, we do not find useful to discretize a continuous process by introducing the notion of 'subclusters' of 2-3 cells. Likewise, we believe that it is more informative to focus our analysis on the entire dataset using average and SD values and do not wish to base our interpretation of the process based on selected tracks (the one shown in Fig 2B only served as an illustration of how we performed our analysis and has no interpretation value).

      The reviewer also states that "mutual inhibition amongst neighboring cells has been proposed to involve an increase in Sc levels in 2-3 cells in a cluster before the singularization of a single SOP". Since there is no published description of the pattern of accumulation of Scute in abdominal histoblats (to our best knowledge), we hypothesize that this statement applies to the proneural clusters in the developing wing disc. This is because the accumulation pattern of Sc has been studied in detail in that context by the Modollel and Carroll labs (PMID: 2044965, PMID: 2044964). However, their description of the accumulation pattern of Scute (in fixed samples, using anti-Sc antibodies) did not refer to sub-clusters of 2-3 cells. We would appreciate if the reviewer could direct us to the relevant published observation.

      Finally, we are not sure to follow the reviewer when she/he firmly concluded from our data that Notch-mediated mutual inhibition has been going on long before SBN. Instead, our data clearly showed that the ADHN region that produced SOPs exhibited two distinct NRE-deGFP patterns, with Notch signaling being active prior to imaging (i.e. prior to 14h APF) and decreasing to reach a minimum of Notch activation around 17h APF (i.e. around the time of SB, as determined by GFP-Sc imaging) in the posterior area of the ADHN.

      Thus, our data do not show that mutual inhibition does not take place in this tissue but rather imply that the phase of mutual inhibition (or competition) must be relatively short, or transient, and that competition amongst proneural cluster cells operate at low Notch and Sc levels (probably contrary to what many people have in mind).

      Some minor points: * * 8. Please change Cad-GFP to Ecad-GFP or shg-GFP, as Cad misdirects to caudal.

      Thanks, changed into Ecad-GFP and Ecad-mKate

      What is c in "(x,y,z,c,t) movies"? (a fifth independent variable?)

      c stands for channel. This is relatively standard nomenclature.

      The authors show that Sc displays a SB event leading to FDI of 0.08 and the Notch reporter displays another SB (SBN) leading to a much more pronounced FDI of -0.2. Are these two events (the hike of Sc levels and the plummeting of Notch signal) contemporaneous or does one precede the other? Having both tagged with GFP makes it impossible to image simultaneously, but the authors could register each reporter's dynamics relative to the time of SOP division (as done in Fig. 5C) to get a sense of their relative order.

      We do agree with the reviewer that it would be nice to be able to align in time these two data sets. Unfortunately, the temporal correlation between SB and the SOP division is too variable (4.7 +/- 1.1) to confidently align these two datasets using this event as a time reference. New tools are needed (see our response to point #11 below).

      Where in the above timeline is the SOP fate definitively adopted? neur-nlsGFP, Ac-RFP, m3Cherry and Sens detection in Figs. 1 and 7 give us a rough idea that these other markers appear around the time of Sc FDI peaking, around 3h after the initial SB. But this is not presented in an organized fashion - the reader collects this information sporadically. A reanalysis of the already existing data attempting to place these various markers in an integrated timeline would be of great importance in understanding the details of this cell fate specification process. Which is the earliest SB event? sc, neur or Notch? How long does it take from that early SB until definitive SOP markers (Sens) first appear?

      We agree with the reviewer, it would be interesting to extend the approach reported here for Scute to characterize SB and rate of FDI for other key factors governing the selection of SOPs. As pointed out by the reviewer (point #10 above), it would also be important to register in time these various events. Unfortunately, the maturation time of RFP, mCherry, FP670, etc... appeared to be too slow relative to the rapid turnover of the Ac, Sc and E(spl)-HLH factors prevented us from performing two-color imaging. Hence, current tools do not permit to determine which is the earliest SB event.

      More genetic perturbations could be performed to solidify the model of cell-cell communication during lateral inhibition. Two obvious ones come to mind: (a) How would the Sc-GFP dynamics change in a Notch-RNAi background? (b) How would the NRE-deGFP dynamics change in a sc-RNAi background?

      See our detailed response to reviewer #1, major point #2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Your editorial guidance, reviews, and suggestions have led us to make substantial changes to our manuscript. While we detail point-by-point responses in typical fashion below, I wanted to outline, at a high level, what we’ve done.

      (1) Methods. Your suggestions led us to rethink our presentation of our methods, which are now described more cohesively in a new methods section in the main text.

      (2) Model Validation & Robustness. Reviewers suggested various validations and checks to ensure that our findings were not, for instance, the consequence of a particular choice of parameter. These can be found in the supplementary materials.

      (3) Data Cleaning & Inclusion/Exclusion. Finally, based on feedback, our new methods section fully describes the process by which we cleaned our original data, and on what grounds we included/excluded individual faculty records from analysis.

      eLife assessment

      Efforts to increase the representation of women in academia have focussed on efforts to recruit more women and to reduce the attrition of women. This study - which is based on analyses of data on more than 250,000 tenured and tenure-track faculty from the period 2011-2020, and the predictions of counterfactual models - shows that hiring more women has a bigger impact than reducing attrition. The study is an important contribution to work on gender representation in academia, and while the evidence in support of the findings is solid, the description of the methods used is in need of improvement.

      Reviewer #1 (Public Review):

      Summary and strengths

      This is an interesting paper that concludes that hiring more women will do more to improve the gender balance of (US) academia than improving the attrition rates of women (which are usually higher than men's). Other groups have reported similar findings but this study uses a larger than usual dataset that spans many fields and institutions, so it is a good contribution to the field.

      We thank the reviewer for their positive assessment of the contributions of our work.

      Weaknesses

      The paper uses a mixture of mathematical models (basically Leslie matrices, though that term isn't mentioned here) parameterised using statistical models fitted to data. However, the description of the methods needs to be improved significantly. The author should consider citing Matrix Population Models by Caswell (Second Edition; 2006; OUP) as a general introduction to these methods, and consider citing some or all of the following as examples of similar studies performed with these models:

      Shaw and Stanton. 2012. Proc Roy Soc B 279:3736-3741

      Brower and James. 2020. PLOS One 15:e0226392

      James and Brower. 2022. Royal Society Open Science 9:220785 Lawrence and Chen. 2015.

      [http://128.97.186.17/index.php/pwp/article/view/PWP-CCPR-2015-008]

      Danell and Hjerm. 2013. Scientometrics 94:999-1006

      We have expanded the description of methods in a new methods section of the paper which we hope will address the reviewer’s concerns.

      We agree that our model of faculty hiring and attrition resembles Leslie matrices. In results section B, we now mention Leslie matrices and cite Matrix Population Models by Caswell, noting a few key differences between Leslie matrices and the model of hiring and attrition presented in this work. Most notably, in the hiring and attrition model presented, the number of new hires is not based on per-capita fertility constants. Instead, population sizes are predetermined fixed values for each year, precluding exponential population growth or decay towards 0 that is commonly observed in the asymptotic behavior of linear Leslie Matrix models.

      We have additionally revised the main text to cite the listed examples of similar studies (we had already cited James and Brower, 2022). We thank the reviewer for bringing these relevant works to our attention.

      The analysis also runs the risk of conflating the fraction of women in a field with gender diversity! In female-dominated fields (e.g. Nursing, Education) increasing the proportion of women in the field will lead to reduced gender diversity. This does not seem to be accounted for in the analysis. It would also be helpful to state the number of men and women in each of the 111 fields in the study.

      We have carefully examined the manuscript and revised the text to correctly differentiate between gender diversity and women’s representation.

      We have additionally added a table to the supplemental materials (Tab. S3) that reports the estimated number of men and women in each of the 111 fields.

      Reviewer #2 (Public Review):

      Summary:

      This important study by LaBerge and co-authors seeks to understand the causal drivers of faculty gender demographics by quantifying the relative importance of faculty hiring and attrition across fields. They leverage historical data to describe past trends and develop models that project future scenarios that test the efficacy of targeted interventions. Overall, I found this study to be a compelling and important analysis of gendered hiring and attrition in US institutions, and one that has wide-reaching policy implications for the academy. The authors have also suggested a number of fruitful future avenues for research that will allow for additional clarity in understanding the gendered, racial, and socioeconomic disparities present in US hiring and attrition, and potential strategies for mitigating or eliminating these disparities.

      We thank the reviewer for their positive assessment of the contributions of our work.

      Strengths:

      In this study, LaBerge et al use data from over 268,000 tenured and tenure-track faculty from over 100 fields at more than 12,000 PhD-granting institutions in the US. The period they examine covers 2011-2020. Their analysis provides a large-scale overview of demographics across fields, a unique strength that allows the authors to find statistically significant effects for gendered attrition and hiring across broad areas (STEM, non-STEM, and topical domains).

      LaBerge et al. find gendered disparities in attrition-using both empirical data and their counterfactual model-that account for the loss of 1378 women faculty across all fields between 2011 and 2020. It is true that "this number is both a small portion of academia... and a staggering number of individual careers," as ." - as this loss of women faculty is comparable to losing more than 70 entire departments. I appreciate the authors' discussion about these losses-they note that each of these is likely unnecessary, as women often report feeling that they were pushed out of academic jobs.

      LaBerge et al. also find-by developing a number of model scenarios testing the impacts of hiring, attrition, or both-that hiring has a greater impact on women's representation in the majority of academic fields in spite of higher attrition rates for women faculty relative to men at every career stage. Unlike many other studies of historical trends in gender diversity, which have often been limited to institution-specific analyses, they provide an analysis that spans over 100 fields and includes nearly all US PhD-granting institutions. They are able to project the impacts of strategies focusing on hiring or retention using models that project the impact of altering attrition risk or hiring success for women. With this approach, they show that even relatively modest annual changes in hiring accumulate over time to help improve the diversity of a given field. They also demonstrate that, across the model scenarios they employ, changes to hiring drive the largest improvement in the long-term gender diversity of a field.

      Future work will hopefully - as the authors point out - include intersectional analyses to determine whether a disproportionate share of lost gender diversity is due to the loss of women of color from the professoriate. I appreciate the author's discussion of the racial demographics of women in the professoriate, and their note that "the majority of women faculty in the US are white" and thus that the patterns observed in this study are predominately driven by this demographic. I also highly appreciate their final note that "equal representation is not equivalent to equal or fair treatment," and that diversifying hiring without mitigating the underlying cause of inequity will continue to contribute to higher losses of women faculty.

      Weaknesses

      First, and perhaps most importantly, it would be beneficial to include a distinct methods section. While the authors have woven the methods into the results section, I found that I needed to dig to find the answers to my questions about methods. I would also have appreciated additional information within the main text on the source of the data, specifics about its collection, inclusion and exclusion criteria for the present study, and other information on how the final dataset was produced. This - and additional information as the authors and editor see fit - would be helpful to readers hoping to understand some of the nuance behind the collection, curation, and analysis of this important dataset.

      We have expanded upon the description of methods in a new methods section of the paper.

      We have also added a detailed description of the data cleaning steps taken to produce the dataset used in these analyses, including the inclusion/exclusion criteria applied. This detailed description is at the beginning of the methods section. This addition has substantially enhanced the transparency of our data cleaning methods, so we thank the reviewer for this suggestion.

      I would also encourage the authors to include a note about binary gender classifications in the discussion section. In particular, I encourage them to include an explicit acknowledgement that the trends assessed in the present study are focused solely on two binary genders - and do not include an analysis of nonbinary, genderqueer, or other "third gender" individuals. While this is likely because of the limitations of the dataset utilized, the focus of this study on binary genders means that it does not reflect the true diversity of gender identities represented within the professoriate.

      In a similar vein, additional context on how gender was assigned on the basis of names should be added to the methods section.

      We use a free, open-source, and open-data python package called nomquamgender (Van Buskirk et al, 2023) to estimate the strengths of (culturally constructed) name-gender associations. For sufficiently strong associations with a binary gender, we apply those labels to the names in our data. We have updated the main text to make this approach more apparent.

      We have also added language to the main text which explicitly acknowledges that our approach only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      I do think that some care might be warranted regarding the statement that "eliminating gendered attrition leads to only modest changes in field-level diversity" (Page 6). while I do not think that this is untrue, I do think that the model scenarios where hiring is "radical" and attrition is unchanged from present (equal representation of women and men among hires (ER) + observed attrition (OA)) shows that a sole focus on hiring dampens the gains that can otherwise be addressed via even modest interventions (see, e.g., gender-neutral attrition (GNA) + increasing representation of women among hires (IR)). I am curious as to why the authors did not include an additional scenario where hiring rates are equal and attrition is equalized (i.e., GNA + ER). The importance of including this additional model is highlighted in the discussion, where, on Page 7, the authors write: "In our forecasting analysis, we find that eliminating the gendered attrition gap, in isolation, would not substantially increase representation of women faculty in academia. Rather, progress towards gender parity depends far more heavily on increasing women's representation among new faculty hires, with the greatest change occurring if hiring is close to gender parity." I believe that this statement would be greatly strengthened if the authors can also include a comparison to a scenario where both hiring and attrition are addressed with "radical" interventions.

      Our rationale for omitting the GNA + ER scenario in the presented analysis is that we can reason about the outcomes of this scenario without the need for computation; if a field has equal inputs of women and men faculty (on average) and equal retention rates between women and men (on average), then, no matter the field’s initial age and gender distribution of faculty, the expected value for the percentage of women faculty after all of the prior faculty have retired (which may take 40+ years) is exactly 50%. We have updated the main text to discuss this point.

      Reviewer #3 (Public Review):

      This manuscript investigates the roles of faculty hiring and attrition in influencing gender representation in US academia. It uses a comprehensive dataset covering tenured and tenure-track faculty across various fields from 2011 to 2020. The study employs a counterfactual model to assess the impact of hypothetical gender-neutral attrition and projects future gender representation under different policy scenarios. The analysis reveals that hiring has a more significant impact on women's representation than attrition in most fields and highlights the need for sustained changes in hiring practices to achieve gender parity.

      Strengths:

      Overall, the manuscript offers significant contributions to understanding gender diversity in academia through its rigorous data analysis and innovative methodology.

      The methodology is robust, employing extensive data covering a wide range of academic fields and institutions.

      Weaknesses:

      The primary weakness of the study lies in its focus on US academia, which may limit the generalizability of its findings to other cultural and academic contexts.

      We agree that the U.S. focus of this study limits the generalizability of our findings. The findings that we present in this work will only generalize to other populations–whether it be to an alternate industry, e.g., tech workers, or to faculty in different countries–to the extent that these other populations share similar hiring patterns, retention patterns, and current demographic representation. We have added a discussion of this limitation to the manuscript.

      Additionally, the counterfactual model's reliance on specific assumptions about gender-neutral attrition could affect the accuracy of its projections.

      Our projection analysis is intended to illustrate the potential gender representation outcomes of several possible counterfactual scenarios, with each projection being conditioned on transparent and simple assumptions. In this way, the projection analysis is not intended to predict or forecast the future.

      To resolve this point for our readers, we now introduce our projections in the context of the related terms of prediction and forecast, noting that they have distinct meanings as terms of art: On one hand, prediction and forecasting involve anticipating a specific outcome based on available information and analysis, and typically rely on patterns, trends, or historical data to make educated guesses about what will happen. Projections are based on assumptions and are often presented in a panel of possible future scenarios. While predictions and forecasts aim for precision, projections (which we make in our analysis) are more generalized and may involve a range of potential outcomes.

      Additionally, the study assumes that whoever disappeared from the dataset is attrition in academia. While in reality, those attritions could be researchers who moved to another country or another institution that is not included in the AARC (Academic Analytics Research Centre) dataset.

      In our revision, we have elevated this important point, and clarified it in the context of the various ways in which we count hires and attritions. We now explicitly state that “We define faculty hiring and faculty attrition to include all cases in which faculty join or leave a field or domain within our dataset.” Then, we enumerate the number of situations that could be counted as hires and attritions, including the reviewer’s example of faculty who move to another country.

      Reviewer #1 (Recommendations For The Authors):

      Section B: The authors use an age structured Leslie matrix model (see Caswell for a good reference to these) to test the effect of making the attrition rates or hiring rates equal for men and women. My main concern here is the fitting techniques for the parameters. These are described (a little too!) briefly in section S1B. Some specific questions that are left hanging include:

      A 5th order polynomial is an interesting choice. Some statistical evidence as to why it was the best fit would be useful. What other candidate models were compared? What was the "best fit" judgement made with: AIC, r^2? What are the estimates for how good this fit is? How many data points were fitted to? Was it the best fit choice for all of the 111 fields for men and women?

      We use a logistic regression model for each field to infer faculty attrition probabilities across career ages and time, and we include the career age predictor up to its fifth power to capture the career-age correlations observed in Spoon et. al., Science Advances, 2023. For ease of reference, we reproduce the attrition risk curves in Fig S4.

      We note that faculty attrition rates start low and then reach a peak around 5-7 years after earning PhD, and then decline until around 15-20 years post-PhD, after which, attrition rates increase as faculty approach retirement.

      This function shape starts low and ends high, and includes at least one local minimum, which indicates that career age should be odd-ordered in the model and at least order-3, but only including career age up to its 3rd order term tended to miss some of the overserved career-age/attrition correlations. We evaluated the fit using 5-fold cross validation with a Brier score loss metric, and among options of polynomials of degree 1, 3, 5, or 7, we found that 5th order performed well overall on average over all fields (even if it was not the best for every field), without overfitting in fields with fewer data. Example fits, reminiscent of the figure from Spoon et al, are now provided in Figs S4 and S5.

      While the model fit with fifth order terms may not be the best fit for all 111 fields (e.g., 7th order fits better in some cases), we wanted to avoid field-specific curves that might be overfitted to the field-specific data, especially due to low sample size (and thus larger fluctuations) on the high career age side of the function. Our main text and supplement now includes justifications for our choice to include career age up to its fifth order terms.

      You used the 5th order logistic regression (bottom of page 11) to model attrition at different ages. The data in [24] shows that attrition increases sharply, then drops then increases again with career age. A fifth order polynomial on its own could plausibly do this but I associate logistic regression models like this as being monotonically increasing (or decreasing!), again more details as to how this worked would be useful.

      Our first submission did not explain this point well, but we hope that Supplementary Figures S4 and S5 provide clarity. In short, we agree of course that typical logistic regression assumes a linear relationship between the predictor variables and the log odds of the outcome variable. This means that the relationship between the predictor variables and the probability of the outcome variable follows a sigmoidal (S-shaped) curve. However, the relationship between the predictor variables and the outcome variable may not be linear.

      To capture more complex relationships, like the increasing, decreasing and then increasing attrition rates as a function of career age, higher-order terms can be added to the logistic regression model. These higher-order terms allow the model to capture nonlinear relationships between the predictor variables and the outcome variable — namely the non-monotonic relationship between rates of attrition and career age — while staying within a logistic regression framework.

      "The career age of new hires follows the average career age distribution of hires" did you use the empirical distribution here or did you fit a standard statistical distribution e.g. Gamma?

      We used the empirical distribution. This information has been added to the updated methods section in the main text.

      How did you account for institution (presumably available)? Your own work has shown that institution types plays a role which could be contributing to these results.

      See below.

      What other confounding variables could be at play here, what is available as part of the data and what happens if you do/don't account for them?

      A number of variables included in our data have been shown to correlate with faculty attrition, including PhD prestige, current institution prestige, PhD country, and whether or not an individual is a “self-hire,” i.e., trained and hired at the same institution (Wapman et. al., Nature, 2022). Additional factors that faculty self-report as reasons for leaving academia include issues of work-life balance, workplace climate, and professional reasons, and in some cases to varying degrees between men and women faculty (Spoon et. al., Sci. Adv., 2023).

      Our counterfactual analysis aims to address a specific question: how would women’s representation among faculty be different today if men and women were subjected to the same attrition patterns over the past decade? To answer this question, it is important to account for faculty career age, which we accept as a variable that will always correlate strongly with faculty attrition rates, as long as the tenure filter remains in place and faculty continue to naturally progress towards retirement age. On the other hand, it is less clear why PhD country, self-hire status, or any of the other mentioned variables should necessarily correlate with attrition rates and with gendered differences in attrition rates more specifically. While some or all of these variables may underlie the causal roots of gendered attrition rates, our analysis does not seek to answer causal questions about why faculty leave their jobs (e.g., by testing the impact of accounting for these variables in simulations per the reviewers suggestion). This is because we do not believe the data used in this analysis is sufficient to answer such questions, lacking comprehensive data on faculty stress (Spoon et. al., Sci. Adv., 2023), parenthood status, etc.

      What career age range did the model use?

      The career age range observed in model outcomes are a function of the empirically derived attrition rates for faculty across academic fields. The highest career age observed in the AARC data was 80, and the faculty career ages that result from our model simulations and projections do not exceed 80.

      We have also added the distribution of faculty across career ages for the projection scenario model outputs in the supplemental materials Fig. S3 (see response to your later comment regarding career age for further details). Looking at these distributions, it is observed that very few faculty have career age > 60, both in observation and in our simulations.

      What was the initial condition for the model?

      Empirical 2011 Faculty rosters are used as the initial conditions for the counterfactual analysis, and 2020 faculty rosters are these as the initial conditions for the projections analysis. This information has been added to the descriptions of methods in the main text.

      Starting the model in 2011 how well does it fit the available data up to 2020?

      Thank you for this suggestion. We ran this analysis for each field starting in 2011, and found that model outcomes were statistically indistinguishable from the observed 2020 faculty gender compositions for all 111 academic fields. This finding is not surprising, because the model is fit to the observed data, but it serves to validate the methods that we used to extract the model's parameters. We have added these results to the supplement (Fig. S2).

      What are the sensitivity analysis results for the model? If you have made different fitting decisions how much would the results change? All this applied to both the hiring and attrition parameters estimates.

      We model attrition and hiring using logistic regression, with career age included as an exogenous variable up to its fifth power. A natural question follows: what if we used a model with career age only to its first or third power? Or to higher powers? We performed this sensitivity analysis, and added three new figures to the supplement to present these findings:

      First, we show the observed attrition probabilities at each career age, and four model fits to attrition data (Supplementary Figs S4 and S5). The first model includes career age only to its first power, and this model clearly does not capture the full career age / attrition correlation structure. The second model includes career age to its third power, which does a better job of fitting to the observed patterns. The third model includes career age up to its fifth power, which appears to very modestly improve upon the former model. The fourth model includes career age up to its seventh power, and the patterns captured by this model are largely the same as the 5th-power model up to career age 50, beyond which there are some notable differences in the inferred attrition probabilities. These differences would have relatively little impact on model outcomes because the vast majority of faculty have a career age below 50.

      Second, we show the observed probability that hires are women, conditional on the career age of the hire. Once again, we fit four models to the data, and find that career age should be included at least up to its fifth order in order to capture the correlation structures between career age and the gender of new hires. However, limited differences result from including career age up to the 7th degree in the model (relative to the 5th degree).

      As a final sensitivity analysis, we reproduce Fig. 2, but rather than including career age as an exogenous variable up to its fifth power in our models for hiring and attrition, we include career age up to its third power. Findings under this parameterization are qualitatively very similar to those presented in Fig. 2, indicating that the results are robust to modest changes to model parameterization (shown in supplement Fig. S6).

      Far more detail in this and some interim results from each stage of the analysis would make the paper far more convincing. It currently has an air of "black box" too much of the analysis which would easily allow an unconvinced reader to discard the results.

      We have added more detailed descriptions of the methods to the main text. We hope that the changes made will address these concerns.

      Section C: You use the Leslie model to predict the future population. As the model is linear the population will either grow exponentially (most likely) or dwindle to zero. You mention you dealt with this by scaling the average value of H to keep the population at 2020 levels? This would change the ratio of hiring to attrition. How did this affect the timescale of the results. If a field had very minimal attrition (and hence grew massively over the time period of the dataset) the hiring rate would have to be very small too so there would be very little change in the gender balance. Did you consider running the model to steady state instead?

      We chose the 40 year window (2020-2060) for this projection analysis because 40 years is roughly the timespan of a full-length faculty career. In other words, it will take around 40 years for most of the pre-existing faculty from 2020 to retire, such that the new, simulated faculty will have almost entirely replaced all former faculty by 2060.

      For three out of five of our projection scenarios (OA, GNA, OA+ER), the point at which observed faculty are replaced by simulated faculty represents steady state. One way to check this intuition is to observe the asymptotic behavior of the trajectories in Fig. 3B; the slopes for these 3 scenarios nearly level out within 40 years.

      The other two scenarios (OA + IR, GNA+IR) represent situations where women’s representation among new hires is increasing each year. These scenarios will not reach steady state until women represent 100% of faculty. Accordingly, the steady state outcomes for these scenarios would yield uninteresting results; instead, we argue that it is the relative timescales that are interesting.

      What did you do to check that your predictions at least felt realistic under the fitted parameters? (see above for presenting the goodness of fit over the 10 years of the data).

      We ran the analysis suggested in a prior comment (Starting the model in 2011 how well does it fit the available data up to 2020?) and found that model outcomes were statistically indistinguishable from the observed 2020 faculty gender compositions for all 111 academic fields, plus the “All STEM” and “All non-STEM” aggregations.

      You only present the final proportion of women for each scenario. As mentioned earlier, models of this type have a tendency to lead to strange population distributions with wild age predictions and huge (or zero populations). Presenting more results here would assuage any worries the reader had about these problems. What is the predicted age distribution of men and women in the long term scenarios? Would a different method of keeping the total population in check have yielded different results? Interim results, especially from a model as complex as this one, rather than just presenting a final single number answer are a convincing validation that your model is a good one! Again, presenting this result will go a long way to convincing readers that your results are sound and rigorous.

      Thank you for this suggestion. We now include a figure that presents faculty age distributions for each projection scenario at 2060 against the observed faculty age distribution in 2020 (pictured below, and as Fig. S3 in the supplementary materials). We find that the projected age distributions are very similar to the observed distributions for natural sciences (shown) and for the additional academic domains. We hope this additional validation will inspire confidence in our model of faculty hiring and attrition for the reviewer, and for future readers.

      In Fig S3, line widths for the simulated scenarios span the central 95% of simulations.

      Other people have reached almost identical conclusions (albeit it with smaller data sets) that hiring is more important than attrition. It would be good to compare your conclusions with their work in the Discussion.

      We have revised the main text to cite the listed examples of similar studies. We thank the reviewer for bringing these relevant works to our attention.

      General comments:

      What thoughts have you given to non-binary individuals?

      Be careful how you use the term "gender diversity"! In many countries "Gender diverse" is a term used in data collection for non-binary individuals, i.e. Male, female, gender diverse. The phrase "hiring more gender diverse faculty" can be read in different ways! If you are only considering men and women then gender balance may be a better framework to use.

      We have added language to the main text which explicitly acknowledges that our analysis focuses on men and women due to limitations in our name-based gender tool, which only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      We have also taken additional care with referring to “gender diversity,” per reviewer 1’s point in their public review.

      Reviewer #2 (Recommendations For The Authors):

      Data availability: I did not see an indication that the dataset used here is publicly available, either in its raw format or as a summary dataset. Perhaps this is due to the sensitive nature of the data, but regardless of the underlying reason, the authors should include a note on data availability in the paper.

      The dataset used for these analyses were obtained under a data use agreement with the Academic Analytics Research Center (AARC). While these data are not publicly available, researchers may apply for data access here: https://aarcresearch.com/access-our-data.

      We also added a table to the supplemental materials (Tab. S3) that reports the estimated number of men and women in each of the 111 fields.

      Additionally, a variety of summary statistics based on this dataset are available online, here: https://github.com/LarremoreLab/us-faculty-hiring-networks/tree/main

      Gender classification: Was an existing package used to classify gender from names in the dataset, or did the authors develop custom code to do so? Either way, this code should be cited. I would also be curious to know what the error rate of these classifications are, and suggest that additional information on potential biases that might result from automated classifications be included in the discussion, under the section describing data limitations. The reliability of name-based gender classification is particularly of interest, as external gender classifications such as those applied on the basis of an individual's name - may not reflect the gender with which an individual self-identifies. In other words, while for many people their names may reflect their true genders, for others those names may only reflect their gender assigned at birth and not their self-perceived or lived gender identity. Nonbinary faculty are in particular invisibilized here (and through any analysis that assigns binary gender on the basis of name). While these considerations do not detract from the main focus of the study - which was to utilize an existing dataset classified only on the basis of binary gender to assess trends for women faculty-these limitations should be addressed as they provide additional context for the interpretation of the results and suggest avenues for future research.

      We use a free, open-source, and open-data python package called nomquamgender (Van Buskirk et al, 2023) to estimate the strengths of (culturally constructed) name-gender associations. For sufficiently strong associations with a binary gender, we apply those labels to the names in our data. We have updated the main text to make this approach more apparent.

      We have also added language to the main text which explicitly acknowledges that our approach only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      As we mentioned in response to the public review, we use a free and open source python package called nomquamgender to estimate the strengths of name-gender associations, and we apply gender labels to the names with sufficiently strong associations with a binary gender. This package is based on a paper by Van Buskirk et. al. 2023, “An open-source cultural consensus approach to name-based gender classification,” which documents error rates and potential biases.

      We have also added language to the main text which explicitly acknowledges that our approach only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      Page 1: The sentence beginning "A trend towards greater women's representation could be caused..." is missing a conjunction. It should likely read: "A trend towards greater women's representation could be caused entirely by attrition, e.g., if relatively more men than women leave a field, OR entirely by hiring..."

      We have edited the paragraph to remove the sentence in question.

      Pages 1-2: The sentence beginning "Although both types of strategy..." and ending with "may ultimately achieve gender parity" is a bit of a run-on; perhaps it would be best to split this into multiple sentences for ease of reading.

      We have revised this run-on sentence.

      Page 2: See comments in the public review about a methods section, the addition of which may help to improve clarity for the readers. Within the existing descriptions of what I consider to be methods (i.e., the first three paragraphs currently under "results"), some minor corrections could be added here. First, consider citing the source of the dataset in the line where it is first described (in the sentence "For these analyses, we exploit a census-level dataset of employment and education records for tenured and tenure-track faculty in 12,112 PhD-granting departments in the United States from 2011-2020.") It also may be helpful to include context here (or above, in the discussion about institutional analyses) about how "departments" can be interpreted. For example, how many institutions are represented across these departments? More information on how the authors eliminated the gendered aspect of patterns in their counterfactual model would be helpful as well; this is currently hinted at on page 4, but could instead be included in the methods section with a call-out to the relevant supplemental information section (S2B).

      We have added a citation to Academic Analytics Research Center’s (AARC) list of available data elements to the data’s introduction sentence. We hope this will allow readers to familiarize themselves with the data used in our analysis.

      Faculty department membership was determined by AARC based on online faculty rosters. 392 institutions are represented across the 12,112 departments present in our dataset. We have updated the main text to include this information.

      Finally, we have added a methods section to the main text, which includes information on how the gendered aspect of attrition patterns were eliminated in the counterfactual model.

      Page 2: Perhaps some indication of how many transitions from an out-of-sample institution might be helpful to readers hoping to understand "edge cases."

      In our analysis, we consider all transitions from out-of-sample institutions to in-sample institutions as hires, and all transitions away from in-sample institutions–whether it be to an out of sample institution, or out of academia entirely–as attritions. We choose to restrict our analysis of hiring and attrition to PhD granting institutions in the U.S. in this way because our data do not support an analysis of other, out-of-sample institutions.

      I also would have liked additional information on how many faculty switched institutions but remained "in-sample and in the same field" - and the gender breakdowns of these institutional changes, as this might be an interesting future direction for studies of gender parity. (For example, readers may be spurred to ask: if the majority of those who move institutions are women, what are the implications for tenure and promotion for these individuals?)

      While these mid-career moves are not counted as attritions in the present analysis, a study of faculty who switch institutions but remain (in-sample) as faculty could shed light on issues of gendered faculty retention at the level of institutions. We share the reviewer’s interest in a more in depth study of mid-career moves and how these moves impact faculty careers, and we now discuss the potential value of such a study towards the end of the paper. In fact, this subject is the topic of a current investigation by the authors!

      Page 3: I was confused by the statement that "of the three types of stable points, only the first point represents an equitable steady-state, in which men and women faculty have equal average career lengths and are hired in unchanging proportions." Here, for example, computer science appears to be close to the origin on Figure 1, suggesting that hiring has occurred in "unchanging proportions" over the study interval. However, upon analysis of Table S2, it appears that changes in hiring in Computer Science (+2.26 pp) are relatively large over the study interval compared to other fields. Perhaps I am reading too literally into the phrase that "men and women faculty are hired in unchanging proportions" - but I (and likely others) would benefit from additional clarity here.

      We had created an arrow along with the computer science label in Fig. 1, but it was difficult to see, which is likely the source of this confusion. This was our fault, and we have moved the “Comp. Sci.” label and its corresponding arrow to be more visible in Figure 1.

      Changes in women’s representation in Computer Science due to hiring over 2011 - 2020 was +2.26 pp as the reviewer points out, but, consulting Fig. 1 and the corresponding table in the supplement, we observe that this is a relatively small amount of change compared to most fields.

      Page 3: If possible it may be helpful to cite a study (or multiple) that shows that "changes in women's representation across academic fields have been mostly positive." What does "positive" mean here, particularly when the changes the authors observe are modest? Perhaps by "positive" you mean "perceived as positive"?

      We used the term positive in the mathematical sense, to mean greater than zero. We have reworded the sentence to read “women's representation across academic fields has been mostly increasing…” We hope this change clarifies our meaning to future readers.

      Page 3: The sentence that ends with "even though men are more likely to be at or near retirement age than women faculty due to historical demographic trends" may benefit from a citation (of either Figure S3 or another source).

      We now cite the corresponding figure in this sentence.

      Page 4: The two sentences that begin with "The empirical probability that a person leaves their academic career" would benefit from an added citation.

      We have added a citation to the sentences.

      Figure 3: Which 10 academic domains are represented in Panel 3B? The colors in appear to correspond to the legend in Panel 3A, but no indication of which fields are represented is provided. If possible, please do so - it would be interesting and informative to be able to make these comparisons.

      This was not clear in the initial version of Fig. 3B, so we now label each domain. For reference, the domains represented in 3B are (from top to bottom):

      ● Health

      ● Education

      ● Journalism, Media, Communication

      ● Humanities

      ● Social Sciences

      ● Public Administration and Policy

      ● Medicine

      ● Business

      ● Natural Sciences

      ● Mathematics and Computing

      ● Engineering

      Page 6: Consider citing relevant figure(s) earlier up in paragraph 2 of the discussion. For example, the first sentence could refer to Figure 1 (rather than waiting until the bottom of the paragraph to cite it).

      Thank you for this suggestion, we now cite Fig. 1 earlier in this discussion paragraph.

      Page 10: A minor comment on the fraction of women faculty in any given year-the authors assume that the proportion of women in a field can be calculated from knowing the number of women in a field and the number of men. This is, again, true if assuming binary genders but not true if additional gender diversity is included. It is likely that the number of nonbinary faculty is quite low, and as such would not cause a large change in the overall proportions calculated here, but additional context within the first paragraph of S1 might be helpful for readers.

      We have added additional context in the first paragraph of S1, explaining that an additional term could be added to the equation to account for nonbinary faculty representation if our data included nonbinary gender annotations. Thank you for making this point.

      Page 10: Please include a range of values for the residual terms of the decomposition of hiring and attrition in the sentence that reads "In Figure S1 we show that the residual terms are small, and thus the decomposition is a good approximation of the total change in women's representation."

      These residual terms range from -0.51pp to 1.14pp (median = 0.2pp). We have added this information to the sentence in question.

      Page 12: It may be helpful to readers to include a description of the information contained in Table S2 in the supplemental text under section S3.

      We refer to table S2 twice in the main text (once in the observational findings, and once for the counterfactual analysis), and the contents of table S2 are described thoroughly in the table caption.

      Reviewer #3 (Recommendations For The Authors):

      (1) There is a potential limitation in the generalizability of the findings, as the study focuses exclusively on US academia. Including international perspectives could have provided a more global understanding of the issues at hand.

      The U.S. focus of this study limits the generalizability of our findings, as non-U.S. other faculty may exhibit differences in hiring patterns, retention patterns, and current demographic representations. We have added a discussion of this limitation to the manuscript. Unfortunately, our data do not support international analyses of hiring and attrition.

      (2) I am not sure that everyone who disappeared from the AARC dataset could be count as "attrition" from academia. Indeed, some who disappeared might have completely left academia once they disappeared from the AARC dataset. Yet, there's also the possibility that some professors left for academic positions in countries outside of the US, or US institutions that are not included in the AARC dataset. These individuals didn't leave academia. Furthermore, it is also possible that these scholars who moved to an institution outside of US or not indexed by AARC are gender specific. Therefore, analyses that this study conducts should find a way to test whether the assumption that anyone who disappeared from AARC is indeed valid. If not, how will this potentially challenge the current conclusions?

      The reviewer makes an important point: faculty who move to faculty positions in other countries and faculty who move to non-PhD granting institutions, or to institutions that are otherwise not included in the AARC data are all counted as attritions in our analysis. We intentionally define hiring and attrition broadly to include all cases in which faculty join or leave a field or domain within our dataset.

      The types of transitions that faculty make out of the tenure track system at PhD granting institutions in the U.S. may correlate with faculty attributes, like gender. For example, women or men may be more likely to transition to tenure track positions at non-U.S. institutions. Nevertheless, these types of career transition represent an attrition for the system of study, and a hire for another system. Following this same logic, faculty who transition from one field to another field in our analysis are treated as an attrition from the first field and a hire into the new field.

      By focusing on “all-cause” attrition in this way, we are able to make robust insights for the specific systems we consider (e.g.,, STEM and non-STEM faculty at U.S. PhD granting institutions), without being roadblocked by the task of annotating faculty departures and arbitrating which should constitute “valid” attritions.

      (3) It would be very interesting to know how much of the attribution was due to tenure failure. Previous studies have suggested that women are less likely to be granted tenure, which makes me wonder about the role that tenure plays in the gendered patterns of attrition in academia.

      We note that faculty attrition rates start low and then reach a peak around 5-7 years after earning PhD, and then decline until around 15-20 years post-PhD, after which, attrition rates increase as faculty approach retirement. The first local maximum appears to coincide roughly with the tenure clock timing, but we can only speculate that these attritions are tenure related. Our dataset is unfortunately not equipped to determine the causal mechanisms driving attrition.

      We reproduce the attrition risk curve in the supplementary materials, Fig. S4:

      (4) The dataset used doesn't fully capture the complexities of academic environments, particularly smaller or less research-intensive institutions (regional universities, historically black colleges and universities, and minority-serving institutions). This could be potentially added to the manuscript for discussions.

      We have added this point to the description of this study’s limitations in the discussion.

    1. Reviewer #1 (Public Review):

      Summary:

      In "Changes in wing morphology..." Roy et al investigate the potential allometric scaling in wing morphology and wing kinematics in 8 different hoverfly species. Their study nicely combines different new and classic techniques, investigating flight in an important, yet understudied alternative pollinator. I want to emphasize that I have been asked to review this from a hoverfly biology perspective, as I do not work on flight kinematics. I will thus not review that part of the work.

      Strengths:

      The paper is well-written and the figures are well laid out. The methods are easy to follow, and the rationale and logic for each experiment are easy to follow. The introduction sets the scene well, and the discussion is appropriate. The summary sentences throughout the text help the reader.

      Weaknesses:

      The ability to hover is described as useful for either feeding or mating. However, several of the North European species studied here would not use hovering for feeding, as they tend to land on the flowers that they feed from. I would therefore argue that the main selection pressure for hovering ability could be courtship and mating. If the authors disagree with this, they could back up their claims with the literature. On that note, a weakness of this paper is that the data for both sexes are merged. If we agree that hovering may be a sexually dimorphic behaviour, then merging flight dynamics from males and females could be an issue in the interpretation. I understand that separating males from females in the movies is difficult, but this could be addressed in the Discussion, to explain why you do not (or do) think that this could cause an issue in the interpretation.

      The flight arena is not very big. In my experience, it is very difficult to get hoverflies to fly properly in smaller spaces, and definitely almost impossible to get proper hovering. Do you have evidence that they were flying "normally" and not just bouncing between the walls? How long was each 'flight sequence'? You selected the parts with the slowest flight speed, presumably to get as close to hovering as possible, but how sure are you that this represented proper hovering and not a brief slowdown of thrust?

      Your 8 species are evolutionarily well-spaced, but as they were all selected from a similar habitat (your campus), their ecology is presumably very similar. Can this affect your interpretation of your data? I don't think all 6000 species of hoverflies could be said to have similar ecology - they live across too many different habitats. For example, on line 541 you say that wingbeat kinematics were stable across hoverfly species. Could this be caused by their similar habitat?

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank you and the two Reviewers for the thoughtful evaluation of the manuscript and the support for publication. We have addressed all points raised by the two Reviewers.

      - We have extensively streamlined the manuscript. Repetitive passages regarding the respective kinase cascades have been removed.

      - We improved the presentation of the main Figures (mainly labeling and font size):

      - Figure 1: C, D, E, F o Figure 2: C, E, F, G, I, o Figure 3: D o Figure 4: F

      - Figure 5: A, B, C, D, E

      - We integrated new SI-data related to kinase functions, expression and the ‘cell-type comparisons’ of the KinCon reporter system (Figure Supplement 4, 5).

      Below you will find a detailed point-by-point response.

      Reviewer #1 (Recommendations For The Authors):

      Regarding the issue of the use of the word "dynamics," as described in the public review, here are a few examples of ambiguous use in different sentences: o Line 27: dynamics of full-length protein kinases. Is this referring to the dynamics of conformational interconversion between inactive and active states?

      - Line 138: dynamic functioning of kinases. It is not clear what this means. o Line 276: ... alters KinCon dynamics. Not clear if they are measuring time-dependent process or a single point. 

      - Figure legend 4F: dynamics of CDK4/6 reporters. Again, not clear how the assay is measuring dynamics.

      In my opinion, the authors use proper terminology that describes their assay in which the term dynamics is not used: Title: "... impact of protein and small molecule interactions on kinase conformations" and Line 89 "... reporter can be used to track conformational changes of kinases...".

      We have replaced the “dynamics” sections. 

      - Line 27: The understanding of the structural dynamics of…

      - Line 91: This reporter can be used to track dynamic changes of kinases conformations…

      - Line 139: Conventional methods often fall short in capturing the dynamics of kinases within their native cellular environments…

      - Line 146: Such insights into the molecular structure dynamics of kinases in intact cells…

      - Line 199: In order to enhance our understanding of kinase structure dynamics…

      - Line 276: These findings underline that indeed the trimeric complex formation alters….

      - Figure Legend 4F: Quantification of alterations of CDK4/6 KinCon reporter bioluminescence signals…

      The authors state that KinCon has predictive capabilities (abstract and line 142). What do  the authors mean by this?

      Previously we have benchmarked the suitability of the KinCon reporter for target engagement assays of wt and mutated kinase activities. With this we determined specificities of melanoma drugs for mutated BRAF variants (Mayrhofer 2020, PNAS). 

      The authors indicate that KinCon is a highly sensitive assay. Can the authors elaborate on what high sensitivity means?  

      With sensitivity we mean that we can detect conformation dynamics of the reporter at low expression levels of the hybrid protein expressed in the cell line of choice.

      - Line 209: Immunoblotting of cell lysates following luminescence measurements showed expression levels of the reporters in the range and below the endogenous expressed kinases (Figure 1E).  …

      - Line 219:   Using this readout, we showed that at expression levels of the BRAF KinCon reporter below the immunoblotting detection limit, one hour of drug exposure exclusively converted BRAF-V600E to the more closed conformation (Figure 1F, G, Figure Supplement 1B). 

      - Line 221: These data underline that at expression levels far below the endogenous kinase, protein activity conformations can be tracked in intact cells. …

      For example, can they discuss how other fluorescence-based approaches that are less sensitive would not be able to accomplish the same type of results or derive similar conclusions? Can they provide a resolution metric both in space and time? Given that the authors state that this is a technical report, this information is of relevance.

      We highlight the key pros & cons of the KinCon reporter technology in following sections:

      -Line 529: The KinCon technology, introduced here, seeks to address the previously mentioned challenges. It has the potential to become a valuable asset for tracking kinase functions in living cells which are hard to measure solely via phosphotransferase activities. Overall, it offers an innovative solution for understanding kinase activity conformations, which could pave the way for more novel intervention strategies for kinase entities with limited pharmaceutical targeting potential. So far, this relates to the tracking of kinase-scaffold and pseudo-kinase functions.

      - Line 535: Key advantages of the KinCon reporter technology is the robustness of the system to track kinase conformations at varying expression levels. However, in contrast to fluorescence-based reporter read-outs subcellular analysis and cell sorting are still challenging due to comparable low levels of light emission

      The authors nicely describe how KinCon works in Figure 1B and part of 1C. I do think that the bottom of panel 1C needs to be revised, as well as the text describing the potential scenarios of potency, efficacy, and synergism.

      One issue with this part of Figure 1C is that it is not clear what the x-axis in the 3 plots refers to. Is this time? Is this concentration of a small molecule, inhibitor, or binding partner? This was confusing also in the context of the term dynamics used throughout the text. The terms potency, efficacy, and synergism should be subtitles, or the panels and the x-axis should be better defined, especially for a non-specialized reader.

      Related to this part of Figure 1C is the text. The authors mention potency, effectiveness, and synergy (Line 195). Can the authors use more fundamental terminology related to these three scenarios, for example, changes in activation constant, and percent of protein activates? Also, why synergy is only related to effectiveness? Can synergy also be associated with potency?

      Thank you for bringing this up, we have revised Figure 1C to better reflect the mentioned effects of potency. To avoid confusion, we removed the illustration for drug synergism. Accordingly, we have integrated the axis descriptions for the presented dose-response curves.   

      Thus, we have further streamlined the text in the introduction – examples are shown below:

      - Line 195: Light recordings and subsequent calculations of time-dependent dosage variations of bioluminescence signatures of parallel implemented KinCon configurations aid in establishing dose-response curves. These curves are used for discerning pharmacological characteristics such as drug potency, effectiveness of drug candidates, and potential drug synergies (Figure 1C)

      - Figure 1C:  Shown is the workflow for the KinCon reporter construct engineering and analyses using KinCon technology. The kinase gene of interest is inserted into the multiple cloning site of a mammalian expression vector which is flanked by respective PCA fragments (-F[1], -F[2]) and separated with interjacent flexible linkers. Expression of the genetically encoded reporter in indicated multi-well formats allows to vary expression levels and define a coherent drug treatment plan. Moreover, it is possible to alter the kinase sequence (mutations) or to co-express or knock-down the respective endogenous kinase, interlinked kinases or proteinogenic regulators of the respective pathway. After systematic administration of pathway modulating drugs or drug candidates, analyses of KinCon structure dynamics may reveal alterations in potency, efficacy, and potential synergistic effects of the tested bioactive small molecules (schematic dose response curves are depicted)

      Lastly, the use of these three cartoons gives the impression that the experimental results to come will follow a similar representation. Instead, the results are presented in bar plots for many different conditions. I think this will lead to confusion for a broad audience.

      The bottom panel of Figure 1C is not the depiction of real experiments but rather an illustration of fitted dose-response curves. We would like to present previous demonstrations of doseresponse curves using BRAF KinCon data and ERK phosphorylation (Röck 2019, Sci. Advances) 

      We further agree with the reviewer and have therefore added a new part in the methods section addressing the evaluation of data extensively. 

      - Line 668: In Figure 1 E and F, a representative experiment of n=4 independent experiments is shown. In these cases, absolute bioluminescence values without any normalization are shown. Otherwise, data was indicated as RLU (relative light unit) fold change. This means the data was normalized on the indicated control condition (either with normalization of the western blot or without; as indicated.

      For a non-expert reader, can the authors clarify the use of tracking basal conformations vs. transient over-expression of the various KinCon constructs? Moreover, the authors use the term transient over-expression for 10, 16, 24, and 48 h (Line 203). This, to a non-expert reader, does not seem transient.

      We have revised the manuscript to clarify it:

      - Line 207: We showed that transient over-expression of these KinCon reporters for a time frame of 10h, 16h, 24h or 48h in HEK293T cells delivers consistently increasing signals for all KinCon reporters (Figure 1E, Figure Supplement 1A). 

      - Figure 1E) Representative KinCon experiments of time-dependent expressions of indicated KinCon reporter constructs in HEK293T cells are shown (mean ±SEM). Indicated KinCon reporters were transiently over-expressed in 24-well format in HEK293T cells for 10h, 16h, 24h and 48h each.

      Regarding Figure 1E and similar graphical representations: Why is the signal (RLU) nonlinear with time? If the fluorescence of the KinCon construct is linearly related to its expression or concentration inside the cell, one would expect a linear increase. Have the authors plotted RLU/Expression band intensity to account for changes in protein concentration? For instance, some of the results within Figure 3 are normalized to concentration on reporter expression level.

      Out intention was to show that varying expression levels can be used for the illustrated target engagement assays.Indeed, the represented elevations of RLU might be  due to factors such as: 

      - Doubling times of cells

      - Cell density

      - Media composition (which changes over time)

      - Reporter protein stabilities

      - Abundance of interactors of kinases

      For the results with LKB1, the authors claim that intermediate fold change in fluorescence (Figure 2E) is due to a partially closed intermediate state (Line 262). Can the authors discard the possibility by which there is a change in populations of active and inactive that on average give intermediate values?

      Based on our experience with KinCon reporter conformation states of kinases we tested so far, we assume that the presented data reflects an intermediate state. We agree that it needs further validation. We have changed the text accordingly:

      - Line 264: Upon interaction with LKB1 this conformation shifts to a partially closed intermediate state.

      The authors claim in Line 274 that mutations located at the interface of the LKB1/STRADalpha complex affect interactions and hypothesize that allosteric communication between LKB1 and STRADalpha is essential for function. Given that these mutations are at the interaction interface, why would the authors postulate an allosteric mechanism that evokes an effect distant from the interaction/active site? Could it be that function requires surface contacts alone that are disrupted by the mutations?

      We agree with the reviewer and changed our argumentation for this point:

      - Line 276: These findings underline that indeed the trimeric complex formation alters the opening and closing of the tested full-length kinase structures using the applied KinCon reporter read out

      I was unable to find text to explain the following: Figure 2I shows the mutation R74A as n.s., but in the text, only W308C is mentioned to not change fluorescence. Could the authors clarify why R74A is not discussed in the text?  Maybe this reviewer missed the text in which it was discussed.

      We adapted the manuscript and include the R74A mutation as followed:

      - Line 296: Among these mutations, only the W308C and R74A mutation prevented significant closing of the LKB1 conformation when co-expressed with STRAD𝛼 and MO25 (Figure 2I).

      In Figure 2I where the individual measurements of the LKB1-R74A KinCon are highlighted in red to better emphasize the deviations. In the case of the R74A mutation the effect seen might be due to the high deviation between the experiments (Highlighted in red). These deviations are much higher when compared to either the wt or the W308 mutant, and can also be seen in the LKB1-R74A-KinCon only condition (white). Even though no significant closing of the LKB1 conformation could be observed in the case of R74A, we believe, since the trend of the conformation closing upon complex formation is still visible that the effect is still there. Further replicates would be necessary to validate this theory. 

      Similarly, the authors state in line 326 that the study included an analysis of RIPK2. However, I was unable to find results, graphs, or additional text discussing RIPK2.

      The RIPK2 conformation was analyzed in Figure 3C (page 12).

      Some figures of RLU use absolute values, percentages, and fold change. Is there are reason why the authors use different Y-axis values? These should be explained and justified in Methods. Similarly, bars for wt in Figures 3D, G, or 4D, E, F show no errors. How are the authors normalizing the data and repeats so that there is no error, and are they treating the rest of the data (i.e., mutants and/or treated with small molecules) in the same way?

      We have changed the Y-axis values. Now, throughout the manuscript we show that there is a RLU fold-change. Except are selected experiments when solely absolute RLU values are shown (such as Figure 1E, F). We have also decided to integrate a paragraph into the methods section (Line 655). Figure 3D was changed as well.

      - Line 668: In Figure 1 E and F, a representative experiment of n=4 independent experiments is shown.  In these cases absolute bioluminescence values without any normalisation are shown.  Otherwise, data was indicated as RLU fold change. This means the data was normalized on the indicated control condition (either with normalization of the western blot or without; as indicated).

      The data is generally normalized on wt or untreated conditions, when the cells were treated with small molecules for target engagement assays. 

      Lastly, the section starting in Line 472 reads more like a discussion of results from different types of inhibitors used in this study that results on its own. The authors should consider a new subtitle such as results or make this section a discussion.

      We agree with the reviewer and this part of the results was split into a new section of the result:

      - Line 455: “Effect of different kinase inhibitor types on the KinCon reporter system”.

      Reviewer #2 (Recommendations For The Authors):

      I have a few suggestions, since the paper is a distillation of a vast amount of work and tells a useful story.

      (1) The work is very solid, uses examples from the literature, and also extends into new experimental space. An obvious weakness is mentioned by the authors for the CKD data, in that measurements with Cyclin D (the activating subunit) are not characterized, although Cyclin D might be assumed to be present. 

      We performed experiments with the CDK4/6 KinCon reporters and co-expressed CyclinD with a ratio of 1:3 (HEK293T cells, expression for 48h). However, in the context of inhibitor treatments we could not track conformation changes in these initial experiments. The cells were treated with the indicated CDK4/6i [1µM] for 3h. This seems to not impact the conformation of CDK4/6 wt or mutated KinCon reporters. There is a tendency that CyclinD co-expression promotes CDK4/6 conformation opening (data not shown).

      Author response image 1.

      Bioluminescence signal of CDK4/6 KinCon reporters with co-expressed CyclinD3 (HEK293T, expression for 48h) upon exposure to indicated CDK4/6i [1µM] or DMSO for 3h (mean ±SEM, n=3 ind. experiments). No significant changes using the current setting.

      (2) The work with the trimeric LKB1 complex involves pseudokinase, STRADalpha, whose conformation is also examined as a function of LKB1 status; since STRAD is an activator of LKB1. A future goal should be the evaluation of the complex in the presence of STRAD inhibitory/activating small molecules.

      Thank you for this great idea, we are currently compiling a FWF grant application to get support for such a R&D project.

      Minor points

      • Have any of the data been repeated in a different cell background? This came to mind because HeLa cells lack LKB1, which might be a useful place to test the LKB1 data in a different context.

      This experiment was performed and we show it in Figure Supplement 5. Further, we followed the advice of the reviewer and performed suggested experiments. We integrated the colon cancer cell line SW480 into the experimental setup. Overall, three cell settings showed the same pattern of KinCon reporter analyses for LKB1-STRADα-MO25 complex formation utilizing the LKB1- and STRADα-KinCon reporters.  

      • The study picks up the PKA Cushings Syndrome field, which makes sense, and data are presented for L206R. PMID 35830806 explains how different patient mutations drive different signaling outcomes through distinct complex formations, and it would be interesting to discuss how mutations in KinCon complexes, especially those with mutations, could affect sub-cellular localization. Could the authors explain if this was done for any of the proteins, whose low experimental expression is a clear advantage, but is presumably hard to maintain across experiments?

      The feedback of the reviewer motivated us to perform subcellular fractionation experiments. They were performed with PKAc wt and L206R KinCon reporters as well as BRAF wt and V600E reporters. We were not able to see major differences between the wt and mutated reporter constructs in respect to their nucleus: cytoplasm localizations (Figure Supplement 4). For your information, in a R+D project with the mitochondrial kinase PINK1 we see localization of the reporter as expected almost exclusively at the mitochondria fraction. 

      - Line 495: In this context of activating kinase mutations we showed that using PKAc (wt and L206R) and BRAF (wt and V600E) reporters as example we could not track alterations of cytoplasmic and nuclear localization (Figure Supplement 4). Furthermore, subcellular localization of PKAc KinCon reporters did not change when L206R mutant was introduced (Figure Supplement 4). As a control BRAF wt and V600E KinCon reporters were used and also no changes in localization was observed.

      • I suggest changing PMs (Figure 2 and others) simply to mutation, I read this as plasma membrane constantly.

      We agree and we have changed it to “patient mutation” in Figure 2C, Figure 3E, Figure 4B.

    1. Author response:

      Reviewer 1:

      Summary:

      In this manuscript by Bimbard et al., a new method to perform stable recordings over long periods of time with neuropixels, as well as the technical details on how the electrodes can be explanted for follow-up reuse, is provided. I think the description of all parts of the method is very clear, and the validation analyses (n of units per day over time, RMS over recording days...) are very convincing. I however missed a stronger emphasis on why this could provide a big impact on the ephys community, by enabling new analyses, new behavior correlation studies, or neurophysiological mechanisms across temporal scales

      Strengths:

      Open source method. Validation across laboratories. Across species (mice and rats) demonstration of its use and in different behavioral conditions (head-fixed and freely moving).

      Weaknesses:

      Weak emphasis on what can be enabled with this new method that didn't exist before.

      We thank the reviewer for highlighting the limited discussion around scientific impact. Our implant has several advantages which combine to make it much more accessible than previous solutions. This enables a variety of recording configurations that would not have been possible with previous designs, facilitating recordings from a wider range of brain regions, animals, and experimental setups. In short, there are three key advances:

      (1) Adaptability: The CAD files can be readily adapted to a wide range of configurations (implantation depth, angle, position of headstage, etc.). Labs have already, modified the design to optimise for their needs, and re-shared with the community.

      (2) Weight:  Because of the lightweight design, experimenters can i) perform complex and demanding freely moving tasks as we exemplify in the manuscript, and ii) implant female and water restricted mice while respecting animal welfare weight limitations.

      (3) Cost: At ~$10, our implant is significantly cheaper than published alternatives, which makes it affordable to more labs and means that testing modifications is cost-effective.

      We will make these features clearer in the manuscript.

      Reviewer 2:

      Summary:

      This work by Bimbard et al., introduces a new implant for Neuropixels probes. While Neuropixels probes have critically improved and extended our ability to record the activity of a large number of neurons with high temporal resolution, the use of these expensive devices in chronic experiments has so far been hampered by the difficulty of safely implanting them and, importantly, to explant and reuse them after conclusion of the experiment. The authors present a newly designed two-part implant, consisting of a docking and a payload module, that allows for secure implantation and straightforward recovery of the probes. The implant is lightweight, making it amenable for use in mice and rats, and customizable. The authors provide schematics and files for printing of the implants, which can be easily modified and adapted to custom experiments by researchers with little to no design experience. Importantly, the authors demonstrate the successful use of this implant across multiple use cases, in head-fixed and freely moving experiments, in mice and rats, with different versions of Neuropixels probes, and across 8 different labs. Taken together, the presented implants promise to make chronic Neuropixel recordings and long-term studies of neuronal activity significantly easier and attainable for both current and future Neuropixels users.

      Strengths:

      - The implants have been successfully tested across 8 different laboratories, in mice and rats, in head-fixed and freely moving conditions, and have been adapted in multiple ways for a number of distinct experiments.

      - Implants are easily customizable and the authors provide a straightforward approach for customization across multiple design dimensions even for researchers not experienced in design.

      - The authors provide clear and straightforward descriptions of the construction, implantation, and explant of the described implants.

      - The split of the implant into a docking and payload module makes reuse even in different experiments (using different docking modules) easy.

      - The authors demonstrate that implants can be re-used multiple times and still allow for high-quality recordings.

      - The authors show that the chronic implantations allow for the tracking of individual neurons across days and weeks (using additional software tracking solutions), which is critical for a large number of experiments requiring the description of neuronal activity, e.g. throughout learning processes.

      - The authors show that implanted animals can even perform complex behavioral tasks, with no apparent reduction in their performance.

      Weaknesses:

      - While implanted animals can still perform complex behavioral tasks, the authors describe that the implants may reduce the animals' mobility, as measured by prolonged reaction times. However, the presented data does not allow us to judge whether this effect is specifically due to the presented implant or whether any implant or just tethering of the animals per se would have the same effects.

      The reviewer is correct: some of the differences in mouse reaction time could be due to the tether rather than the implant. As these experiments were also performed in water-restricted female mice with the heavier Neuropixels 1.0 implant, our data represent the maximal impact of the implant, and we will highlight this in the revision.

      - While the authors make certain comparisons to other, previously published approaches for chronic implantation and re-use of Neuropixels probes, it is hard to make conclusive comparisons and judge the advantages of the current implant. For example, while the authors emphasize that the lower weight of their implant allows them to perform recordings in mice (and is surely advantageous), the previously described, heavier implants they mention (Steinmetz et al., 2021; van Daal et al., 2021), have also been used in mice. Whether the weight difference makes a difference in practice therefore remains somewhat unclear.

      The reviewer is correct: without a direct comparison, we cannot be certain that our smaller, lighter implant improves behavioural results (although this is supported by the literature, e.g. Newman et al, 2023). However, the reduced weight of our implant is critical for several laboratories represented in this manuscript due to animal welfare requirements. Indeed, in Daal et al the authors “recommend a [mouse] weight of >25 g for implanting Neuropixels 1.0 probes.” This limit precludes using (the vast majority of) female mice, or water-restricted animals. Conversely, our implant can be routinely used with lighter, water-restricted male and female mice. We will emphasise this point in the revision.

      - The non-permanent integration of the headstages into the implant, while allowing for the use of the same headstage for multiple animals in parallel, requires repeated connections and does not provide strong protection for the implant. This may especially be an issue for the use in rats, requiring additional protective components as in the presented rat experiments.

      We apologise for not clarifying the various headstage options in the manuscript and we will address this in the revision. Our repository has headplate holder designs (in the XtraModifications/Mouse_FreelyMoving folder). This allows leaving the headstage on the implant, and thus minimize the number of connections (albeit increasing the weight for the mouse). Indeed, mice recorded while performing the task described in our manuscript had the head-stage semi-permanently integrated to the implant, and we will highlight this in the revision.

      Reviewer 3:

      Summary:

      In this manuscript, Bimbard and colleagues describe a new implant apparatus called "Apollo Implant", which should facilitate recording in freely moving rodents (mice and rats) using Neuropixels probes. The authors collected data from both mice and rats, they used 3 different versions of Neuropixels, multiple labs have already adopted this method, which is impressive. They openly share their CAD designs and surgery protocol to further facilitate the adaptation of their method.

      Strengths:

      Overall, the "Apollo Implant" is easy to use and adapt, as it has been used in other laboratories successfully and custom modifications are already available. The device is reproducible using common 3D printing services and can be easily modified thanks to its CAD design (the video explaining this is extremely helpful). The weight and price are amazing compared to other systems for rigid silicon probes allowing a wide range of use of the "Apollo Implant".

      Weaknesses:

      The "Apollo Implant" can only handle Neuropixels probes. It cannot hold other widely used and commercially available silicon probes. Certain angles and distances are not possible in their current form (distance between probes 1.8 to 4mm, implantation depth 2-6.5 mm, or angle of insertion up to 20 degrees).

      We appreciate the reviewer’s points, but as we will discuss in the revised manuscript, one implant accommodating the diversity of the existing probes is beyond the scope of this project. However, because the design is adaptable, groups should be able to modify the current version of the implant to adapt to their electrodes’ size and format (and can highlight any issues in the Github “Discussions” area).

      With Neuropixels, the current range of depths covers practically all trajectories in the mouse brain. In rats, where deeper penetrations may be useful, the experimenter can attach the probe at a lower point in the payload module to increase the length of exposed shank. We now specify this in the Github repository.

      We have now extended the range of inter-probe distances from a maximum of 4 mm to 6.5 mm, and this will be reflected in the revised manuscript. Distances beyond this may be better served by 2 implants, and smaller distances could be achieved by attaching two probes on the same side of the docking module. In the next revision, we will add these points to the discussion.

    1. Author response:

      eLife assessment

      This study is a detailed investigation of how chromatin structure influences replication origin function in yeast ribosomal DNA, with focus on the role of the histone deacetylase Sir2 and the chromatin remodeler Fun30. Convincing evidence shows that Sir2 does not affect origin licensing but rather affects local transcription and nucleosome positioning which correlates with increased origin firing. However, the evidence remains incomplete as the methods employed do not rigorously establish a key aspect of the mechanism, fully address some alternative models, or sufficiently relate to prior results. Overall, this is a valuable advance for the field that could be improved to establish a more robust paradigm.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper presents a mechanistic study of rDNA origin regulation in yeast by SIR2. Each of the ~180 tandemly repeated rDNA gene copies contains a potential replication origin. Early-efficient initiation of these origins is suppressed by Sir2, reducing competition with origins distributed throughout the genome for rate-limiting initiation factors. Previous studies by these authors showed that SIR2 deletion advances replication timing of rDNA origins by a complex mechanism of transcriptional de-repression of a local PolII promoter causing licensed origin proteins (MCMcomplexes) to re-localize (slide along the DNA) to a different (and altered) chromatin environment. In this study, they identify a chromatin remodeler, FUN30, that suppresses the sir2∆ effect, and remarkably, results in a contraction of the rDNA to about one-quarter it's normal length/number of repeats, implicating replication defects of the rDNA. Through examination of replication timing, MCM occupancy and nucleosome occupancy on the chromatin in sir2, fun30, and double mutants, they propose a model where nucleosome position relative to the licensed origin (MCM complexes) intrinsically determines origin timing/efficiency. While their interpretations of the data are largely reasonable and can be interpreted to support their model, a key weakness is the connection between Mcm ChEC signal disappearance and origin firing. While the cyclical chromatin association-dissociation of MCM proteins with potential origin sequences may be generally interpreted as licensing followed by firing, dissociation may also result from passive replication and as shown here, displacement by transcription and/or chromatin remodeling.

      While it is true that both transcription and passive replication can cause the signal of MCM-ChEC to disappear, neither can cause selective disappearance of the displaced complex without affecting the non-displaced complex.  Indeed, in the case of transcription, RNA polymerase transcribing C-pro would have to first dislodge the normally positioned MCM complex before even reaching the displaced complex.  Furthermore, deletion of FUN30 leads to both more C-pro transcription and less disappearance of the displaced MCM complex.  It is important to keep in mind that this cannot somehow reflect continuous replenishment of displaced MCMs with newly loaded MCMs, since the cells are in S phase and licensing is restricted to G1. 

      Moreover, linking its disappearance from chromatin in the ChEC method with such precise resolution needs to be validated against an independent method to determine the initiation site(s). Differences in rDNA copy number and relative transcription levels also are not directly accounted for, obscuring a clearer interpretation of the results.

      Copy number reduction of the magnitude caused by deletion of SIR2 and FUN30 does not suppress the sir2D effect (i.e. early replication of the rDNA), but rather exacerbates it.  In particular, deletion of SIR2 and FUN30 causes the rDNA to shrink to approximately 35 copies.  Kwan et al., 2023 (PMID: 36842087) have shown that reduction of rDNA copy number to 35 causes a dramatic acceleration of rDNA replication in a SIR2 strain.  Thus, the effect of rDNA size on replication timing reinforces our conclusion that deletion of FUN30 suppresses rDNA replication.

      However, to address this concern directly, in the revision we will include 2 D gels in fob1 strains with equal number of repeats that allows to conclude that the effect of FUN30 deletion in suppressing rDNA origin firing is independent of either rDNA size or FOB1. The figure of the critical 2 D gels is shown below in the reply to reviewer 2.

      Nevertheless, this paper makes a valuable advance with the finding of Fun30 involvement, which substantially reduces rDNA repeat number in sir2∆ background. The model they develop is compelling and I am inclined to agree, but I think the evidence on this specific point is purely correlative and a better method is needed to address the initiation site question. The authors deserve credit for their efforts to elucidate our obscure understanding of the intricacies of chromatin regulation. At a minimum, I suggest their conclusions on these points of concern should be softened and caveats discussed. Statistical analysis is lacking for some claims.

      Strengths are the identification of FUN30 as suppressor, examination of specific mutants of FUN30 to distinguish likely functional involvement. Use of multiple methods to analyze replication and protein occupancies on chromatin. Development of a coherent model.

      Weaknesses are failure to address copy number as a variable; insufficient validation of ChEC method relationship to exact initiation locus; lack of statistical analysis in some cases. 

      The two potential initiation sites that one would monitor (non-displaced and displaced) are separated by less than 150 base pairs, and other techniques simply do not have the resolution necessary to distinguish such differences.  Furthermore, as we suggest in the manuscript, our results are consistent with a model in which it is only the displaced MCM complex that is activated, whether in sir2 or WT.  If no genotype-dependent difference in initiation sites is even expected, it would be hard to interpret even the most precise replication-based assays.  However, the reviewer is correct that this is a novel technique and that confirmation with a well-established technique is comforting, therefore we are performing ChIP experiments to corroborate, to the extent possible, the conclusions that we reached with ChEC. 

      We appreciate the reviewer pointing out that some statistical analyses were lacking, and we will correct this in a revised manuscript.

      Additional background and discussion for public review:

      This paper broadly addresses the mechanism(s) that regulate replication origin firing in different chromatin contexts. The rDNA origin is present in each of ~180 tandem repeats of the rDNA sequence, representing a high potential origin density per length of DNA (9.1kb repeat unit). However, the average origin efficiency of rDNA origins is relatively low (~20% in wild-type cells), which reduces the replication load on the overall genome by reducing competition with origins throughout the genome for limiting replication initiation factors. Deletion of histone deacetylase SIR2, which silences PolII transcription within the rDNA, results in increased early activation or the rDNA origins (and reduced rate of overall genome replication). Previous work by the authors showed that MCM complexes loaded onto the rDNA origins (origin licensing) were laterally displaced (sliding) along the rDNA, away from a well-positioned nucleosome on one side. The authors' major hypothesis throughout this work is that the new MCM location(s) are intrinsically more efficient configurations for origin firing. The authors identify a chromatin remodeling enzyme, FUN30, whose deletion appears to suppress the earlier activation of rDNA origins in sir2∆ cells. Indeed, it appears that the reduction of rDNA origin activity in sir2∆ fun30∆ cells is severe enough to results in a substantial reduction in the rDNA array repeat length (number of repeats); the reduced rDNA length presumably facilitates it's more stable replication and maintenance.

      Analysis of replication by 2D gels is marginally convincing, using 2D gels for this purpose is very challenging and tricky to quantify. The more quantitative analysis by EdU incorporation is more convincing of the suppression of the earlier replication caused by SIR2 deletion.

      To address the mechanism of suppression, they analyze MCM positioning using ChEC, which in G1 cells shows partial displacement of MCM from normal position A to positions B and C in sir2∆ cells and similar but more complete displacement away from A to positions B and C in sir2fun30 cells. During S-phase in the presence of hydroxyurea, which slows replication progression considerably (and blocks later origin firing) MCM signals redistribute, which is interpreted to represent origin firing and bidirectional movement of MCMs (only one direction is shown), some of which accumulate near the replication fork barrier, consistent with their interpretation. They observe that MCMs displaced (in G1) to sites B or C in sir2∆ cells, disappear more rapidly during S-phase, whereas the similar dynamic is not observed in sir2∆fun30∆. This is the main basis for their conclusion that the B and C sites are more permissive than A. While this may be the simplest interpretation, there are limitations with this assay that undermine a rigorous conclusion (additional points below). The main problem is that we know the MCM complexes are mobile so disappearance may reflect displacement by other means including transcription which is high is the sir2∆ background. Indeed, the double mutant has greater level of transcription per repeat unit which might explain more displaced from A in G1. Thus, displacement might not always represent origin firing. Because the sir2 background profoundly changes transcription, and the double mutant has a much smaller array length associated with higher transcription, how can we rule out greater accessibility at site A, for example in sir2∆, leading to more firing, which is suppressed in sir2 fun30 due to greater MCM displacement away from A?

      I think the critical missing data to solidly support their conclusions is a definitive determination of the site(s) of initiation using a more direct method, such as strand specific sequencing of EdU or nascent strand analysis. More direct comparisons of the strains with lower copy number to rule out this facet. As discussed in detail below, copy number reduction is known to suppress at least part of the sir2∆ effect so this looms over the interpretations. I think they are probably correct in their overall model based on the simplest interpretation of the data but I think it remains to be rigorously established. I think they should soften their conclusions in this respect.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors follow up on their previous work showing that in the absence of the Sir2 deacetylase the MCM replicative helicase at the rDNA spacer region is repositioned to a region of low nucleosome occupancy. Here they show that the repositioned displaced MCMs have increased firing propensity relative to non-displaced MCMs. In addition, they show that activation of the repositioned MCMs and low nucleosome occupancy in the adjacent region depend on the chromatin remodeling activity of Fun30.

      Strengths:

      The paper provides new information on the role of a conserved chromatin remodeling protein in the regulation of origin firing and in addition provides evidence that not all loaded MCMs fire and that origin firing is regulated at a step downstream of MCM loading.

      Weaknesses:

      The relationship between the author's results and prior work on the role of Sir2 (and Fob1) in regulation of rDNA recombination and copy number maintenance is not explored, making it difficult to place the results in a broader context. Sir2 has previously been shown to be recruited by Fob1, which is also required for DSB formation and recombination-mediated changes in rDNA copy number. Are the changes that the authors observe specifically in fun30 sir2 cells related to this pathway? Is Fob1 required for the reduced rDNA copy number in fun30 sir2 double mutant cells? 

      Strains lacking SIR2 have unstable rDNA size, and FOB1 deletion stabilizes rDNA size in sir2 background. Likewise, FOB1 deletion influences the kinetics  rDNA size reduction in sir2 fun30 cells. However, the main effect of Fun30 in sir2 cells we were interested in, suppression of rDNA replication, is preserved in fob1 background, arguing that the observed effect is independent of Fob1 (see figure below). Given that the main focus of the paper is regulation of rDNA origins activity and that these changes were independent of Fob1, we had elected not to include these results in the original manuscript but will gladly include them in the revision.

      Besides refuting the possible role of Fob1 in the FUN30-mediated activation of rDNA origin firing in sir2 cells, the use of fob1 background enabled us compare the activation of rDNA origins in the sir2 and sir2 fun30 strains with equally short rDNA size. The 2-D gels demonstrate a dramatic suppression of rDNA origin activity upon deletion of FUN30 in the sir2 fob1 strains with 35 rDNA copies.

      Author response image 1.

      The deletion of FUN30 diminishes the replication bubble signal in a fob1 sir2 strain with 35 rDNA copies by more than tenfold. The single rARS signal, marked with the arrow, originates from the rightmost rDNA repeat. This specific rightmost rDNA NheI fragment is approximately 25 kb in size, distinctly larger than the 4.7 kb NheI 1N rARS-containing fragments that originate from the internal rDNA repeats.

      Reviewer #3 (Public Review):

      Summary:

      Heterochromatin is characterized by low transcription activity and late replication timing, both dependent on the NAD-dependent protein deacetylase Sir2, the founding member of the sirtuins. This manuscript addresses the mechanism by which Sir2 delays replication timing at the rDNA in budding yeast. Previous work from the same laboratory (Foss et al. PLoS Genetics 15, e1008138) showed that Sir2 represses transcription-dependent displacement of the Mcm helicase in the rDNA. In this manuscript, the authors show convincingly that the repositioned Mcms fire earlier and that this early firing partly depends on the ATPase activity of the nucleosome remodeler Fun30. Using read-depth analysis of sorted G1/S cells, fun30 was the only chromatin remodeler mutant that somewhat delayed replication timing in sir2 mutants, while nhp10, chd1, isw1, htl1, swr1, isw2, and irc5 had not effect. The conclusion was corroborated with orthogonal assays including two-dimensional gel electrophoresis and analysis of EdU incorporation at early origins. Using an insightful analysis with an Mcm-MNase fusion (Mcm-ChEC), the authors show that the repositioned Mcms in sir2 mutants fire earlier than the Mcm at the normal position in wild type. This early firing at the repositioned Mcms is partially suppressed by Fun30. In addition, the authors show Fun30 affects nucleosome occupancy at the sites of the repositioned Mcm, providing a plausible mechanism for the effect of Fun30 on Mcm firing at that position. However, the results from the MNAse-seq and ChEC-seq assays are not fully congruent for the fun30 single mutant. Overall, the results support the conclusions providing a much better mechanistic understanding how Sir2 affects replication timing at rDNA.

      The reason that the results for the fun30 single mutant appear incongruent, with a larger signal of the +2 nucleosome in the MNase-seq plot but a negligible signal in the ChEC-seq plot is the paucity of displaced Mcm in the fun30 single mutant. Given the relative absence of displaced MCMs, the MCM-MNase fusion protein can't "light up" the +2 nucleosome.  We will comment on this in the revision to clarify this. 

      Strengths

      (1) The data clearly show that the repositioned Mcm helicase fires earlier than the Mcm in the wild type position.

      (2) The study identifies a specific role for Fun30 in replication timing and an effect on nucleosome occupancy around the newly positioned Mcm helicase in sir2 cells.

      Weaknesses

      (1) It is unclear which strains were used in each experiment.

      (2) The relevance of the fun30 phospho-site mutant (S20AS28A) is unclear.

      (3) For some experiments (Figs. 3, 4, 6) it is unclear whether the data are reproducible and the differences significant. Information about the number of independent experiments and quantitation is lacking. This affects the interpretation, as fun30 seems to affect the +3 nucleosome much more than let on in the description.

      We appreciate the reviewer pointing out places in which our manuscript omitted key pieces of information (items 1 and 3), and we will fix these oversights in our revision. 

      With regard to point 2, we had written: 

      “Fun30 is also known to play a role in the DNA damage response; specifically, phosphorylation of Fun30 on S20 and S28 by CDK1 targets Fun30 to sites of DNA damage, where it promotes DNA resection (Chen et al. 2016; Bantele et al. 2017). To determine whether the replication phenotype that we observed might be a consequence of Fun30's role in the DNA damage response, we tested non-phosphorylatable mutants for the ability to suppress early replication of the rDNA in sir2; these mutations had no effect on the replication phenotype (Figure 2B), arguing against a primary role for Fun30

      in DNA damage repair that somehow manifests itself in replication.”

      We will expand on this to clarify our point in the revision.

    2. Reviewer #1 (Public Review):

      Summary:

      This paper presents a mechanistic study of rDNA origin regulation in yeast by SIR2. Each of the ~180 tandemly repeated rDNA gene copies contains a potential replication origin. Early-efficient initiation of these origins is suppressed by Sir2, reducing competition with origins distributed throughout the genome for rate-limiting initiation factors. Previous studies by these authors showed that SIR2 deletion advances replication timing of rDNA origins by a complex mechanism of transcriptional de-repression of a local PolII promoter causing licensed origin proteins (MCMcomplexes) to re-localize (slide along the DNA) to a different (and altered) chromatin environment. In this study, they identify a chromatin remodeler, FUN30, that suppresses the sir2∆ effect, and remarkably, results in a contraction of the rDNA to about one-quarter it's normal length/number of repeats, implicating replication defects of the rDNA. Through examination of replication timing, MCM occupancy and nucleosome occupancy on the chromatin in sir2, fun30, and double mutants, they propose a model where nucleosome position relative to the licensed origin (MCM complexes) intrinsically determines origin timing/efficiency. While their interpretations of the data are largely reasonable and can be interpreted to support their model, a key weakness is the connection between Mcm ChEC signal disappearance and origin firing. While the cyclical chromatin association-dissociation of MCM proteins with potential origin sequences may be generally interpreted as licensing followed by firing, dissociation may also result from passive replication and as shown here, displacement by transcription and/or chromatin remodeling. Moreover, linking its disappearance from chromatin in the ChEC method with such precise resolution needs to be validated against an independent method to determine the initiation site(s). Differences in rDNA copy number and relative transcription levels also are not directly accounted for, obscuring a clearer interpretation of the results. Nevertheless, this paper makes a valuable advance with the finding of Fun30 involvement, which substantially reduces rDNA repeat number in sir2∆ background. The model they develop is compelling and I am inclined to agree, but I think the evidence on this specific point is purely correlative and a better method is needed to address the initiation site question. The authors deserve credit for their efforts to elucidate our obscure understanding of the intricacies of chromatin regulation. At a minimum, I suggest their conclusions on these points of concern should be softened and caveats discussed. Statistical analysis is lacking for some claims.

      Strengths are the identification of FUN30 as suppressor, examination of specific mutants of FUN30 to distinguish likely functional involvement. Use of multiple methods to analyze replication and protein occupancies on chromatin. Development of a coherent model.

      Weaknesses are failure to address copy number as a variable; insufficient validation of ChEC method relationship to exact initiation locus; lack of statistical analysis in some cases.

      Additional background and discussion for public review:

      This paper broadly addresses the mechanism(s) that regulate replication origin firing in different chromatin contexts. The rDNA origin is present in each of ~180 tandem repeats of the rDNA sequence, representing a high potential origin density per length of DNA (9.1kb repeat unit). However, the average origin efficiency of rDNA origins is relatively low (~20% in wild-type cells), which reduces the replication load on the overall genome by reducing competition with origins throughout the genome for limiting replication initiation factors. Deletion of histone deacetylase SIR2, which silences PolII transcription within the rDNA, results in increased early activation or the rDNA origins (and reduced rate of overall genome replication). Previous work by the authors showed that MCM complexes loaded onto the rDNA origins (origin licensing) were laterally displaced (sliding) along the rDNA, away from a well-positioned nucleosome on one side. The authors' major hypothesis throughout this work is that the new MCM location(s) are intrinsically more efficient configurations for origin firing. The authors identify a chromatin remodeling enzyme, FUN30, whose deletion appears to suppress the earlier activation of rDNA origins in sir2∆ cells. Indeed, it appears that the reduction of rDNA origin activity in sir2∆ fun30∆ cells is severe enough to results in a substantial reduction in the rDNA array repeat length (number of repeats); the reduced rDNA length presumably facilitates it's more stable replication and maintenance.

      Analysis of replication by 2D gels is marginally convincing, using 2D gels for this purpose is very challenging and tricky to quantify. The more quantitative analysis by EdU incorporation is more convincing of the suppression of the earlier replication caused by SIR2 deletion.

      To address the mechanism of suppression, they analyze MCM positioning using ChEC, which in G1 cells shows partial displacement of MCM from normal position A to positions B and C in sir2∆ cells and similar but more complete displacement away from A to positions B and C in sir2fun30 cells. During S-phase in the presence of hydroxyurea, which slows replication progression considerably (and blocks later origin firing) MCM signals redistribute, which is interpreted to represent origin firing and bidirectional movement of MCMs (only one direction is shown), some of which accumulate near the replication fork barrier, consistent with their interpretation. They observe that MCMs displaced (in G1) to sites B or C in sir2∆ cells, disappear more rapidly during S-phase, whereas the similar dynamic is not observed in sir2∆fun30∆. This is the main basis for their conclusion that the B and C sites are more permissive than A. While this may be the simplest interpretation, there are limitations with this assay that undermine a rigorous conclusion (additional points below). The main problem is that we know the MCM complexes are mobile so disappearance may reflect displacement by other means including transcription which is high is the sir2∆ background. Indeed, the double mutant has greater level of transcription per repeat unit which might explain more displaced from A in G1. Thus, displacement might not always represent origin firing. Because the sir2 background profoundly changes transcription, and the double mutant has a much smaller array length associated with higher transcription, how can we rule out greater accessibility at site A, for example in sir2∆, leading to more firing, which is suppressed in sir2 fun30 due to greater MCM displacement away from A?

      I think the critical missing data to solidly support their conclusions is a definitive determination of the site(s) of initiation using a more direct method, such as strand specific sequencing of EdU or nascent strand analysis. More direct comparisons of the strains with lower copy number to rule out this facet. As discussed in detail below, copy number reduction is known to suppress at least part of the sir2∆ effect so this looms over the interpretations. I think they are probably correct in their overall model based on the simplest interpretation of the data but I think it remains to be rigorously established. I think they should soften their conclusions in this respect.

    1. On BBT, all traditional and metacognitive accounts of the human are the product of extreme informatic poverty. Ironically enough, many have sought intentional asylum within that poverty in the form of apriori or pragmatic formalisms, confusing the lack of information for the lack of substantial commitment, and thus for immunity against whatever the sciences of the brain may have to say. But this just amounts to a different way of taking refuge in obscurity. What are ‘rules’? What are ‘inferences’? Unable to imagine how science could answer these questions, they presume either that science will never be able to answer them, or that it will answer them in a manner friendly to their metacognitive intuitions. Taking the history of science as its cue, BBT entertains no such hopes. It sees these arguments for what they happen to be: attempts to secure the sufficiency of low-dimensional, metacognitive information, to find gospel in a peephole glimpse.

      This describes the approach of Sellars, Brandom, and Brassier, all of which Bakker has criticized in the blog.

      They admit that science has priority in the scientific realm, but what we think we are is not something that can be true or false, but are games, rules, things we play, a game of "pretend as if we are persons".

      This is a much better position. It does not attempt to tell science that science is a building founded upon the ground of philosophy (unlike Kant, or Heidegger), and does not try to make scientifically testable predictions and get embarrassed in the process (unlike those who sought to study the "quantum of consciousness" because they thought free will is real, thus something quantum-mechanical must be true of the brain, or that philosopher who argued that Anton syndrome is impossible because it is philosophically impossible, or those psychoanalysts that try to interpret Cotard's syndrome as some manifestation of childhood trauma).

      The problem with this position is as follows:

      1. Is science really based on a game of giving and taking reasons? If not, then there's no guarantee that science would protect the game of "let's pretend we are persons who make decisions, has plans, hopes for love, etc". The juggernaut of science may eventually crush the "manifest image of man" under its wheels, migrate to a society of unconscious biorobots, and run even faster as a result!

      2. Philosophers are unable to figure out what rules, games, normativity, etc, are! They can't agree, after centuries of disputation. Any working consensus will have to come from science, and what if science finally shows that rules and games are nothing like what Sellars, Brandom, etc, thought they are? If not, then not only is the manifest image not the scientific image, not only is it unnecessary for working scientists, it is even not what the philosophers say it is. It is as if the philosophers have been stuck in Plato's Cave, mistaking the shadow-play for optical-science.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment

      This study presents an important finding on the influence of visual uncertainty and Bayesian cue combination on implicit motor adaptation in young healthy participants, hereby linking perception and action during implicit adaptation. The evidence supporting the claims of the authors is convincing. The normative approach of the proposed PEA model, which combines ideas from separate lines of research, including vision research and motor learning, opens avenues for future developments. This work will be of interest to researchers in sensory cue integration and motor learning.

      Thank you for the updated assessment. We are also grateful for the insightful and constructive comments from the reviewers, which have helped us improve the manuscript again. We made necessary changes following their comments (trimmed tests, new analysis results, etc) and responded to the comments in a point-by-point fashion below. We hope to publish these responses alongside the public review. Thank you again for fostering the fruitful discussion here.

      Public Reviews:

      Reviewer #1 (Public Review):

      I appreciate the normative approach of the PEA model and am eager to examine this model in the future. However, two minor issues remain:

      (1) Clarification on the PReMo Model:

      The authors state, "The PReMo model proposes that this drift comprises two phases: initial proprioceptive recalibration and subsequent visual recalibration." This description could misinterpret the intent of PReMo. According to PReMo, the time course of the reported hand position is merely a read-out of the *perceived hand position* (x_hat in your paper). Early in adaptation, the perceived hand position is biased by the visual cursor (x_hat in the direction of the cursor); towards the end, due to implicit adaptation, x_hat reduces to zero. This is the same as PEA. I recommend that the authors clarify PReMo's intent to avoid confusion.

      Note, however, the observed overshoot of 1 degree in the reported hand position. In the PReMo paper, we hypothesized that this effect is due to the recalibration of the perceived visual target location (inspired by studies showing that vision is also recalibrated by proprioception, but in the opposite direction). If the goal of implicit adaptation is to align the perceived hand position (x_hat) with the perceived target position (t_hat), then there would be an overshoot of x_hat over the actual target position.

      PEA posits a different account for the overshoot. It currently suggests that the reported hand position combines x_hat (which takes x_p as input) with x_p itself. What is reasoning underlying the *double occurrence* of x_p?

      There seem to be three alternatives that seem more plausible (and could lead to the same overshooting): 1) increasing x_p's contribution (assuming visual uncertainty increases when the visual cursor is absent during the hand report phase), 2) decreasing sigma_p (assuming that participants pay more attention to the hand during the report phase), 3) it could be that the perceived target position undergoes recalibration in the opposite direction to proprioceptive recalibration. All these options, at least to me, seem equally plausible and testable in the future.

      For clarification of the PReMo model’s take on Fig4A, we now write:

      “The PReMo model proposes that the initial negative drift reflects a misperceived hand location, which gradually reduces to zero, and the late positive drift reflects the influence of visual calibration of the target (Tsay, Kim, Saxena, et al., 2022). ”

      However, we would like to point out that the PEA model does not predict a zero (perceived hand location) even at the late phase of adaptation: it remains negative, though not as large as during initial adaptation (see Figure 4A, red line). Furthermore, we have not seen any plausible way to use a visually biased target to explain the overshoot of the judged hand location (see below when we address the three alternative hypotheses the reviewer raised).

      We don’t think the “double” use of xp is a problem, simply because there are TWO tasks under investigation when the proprioceptive changes are measured along with adaptation. The first is the reaching adaptation task itself: moving under the influence of the clamped cursor. This task is accompanied by a covert estimation of hand location after the movement (). Given the robustness of implicit adaptation, this estimation appears mandatory and automatic. The second task is the hand localization task, during which the subject is explicitly asked to judge where the hand is. Here, the perceived hand is based on the two available cues, one is the actual hand location xp, and the other is the influence from the just finished reaching movement (i.e., ). For Bayesian modeling from a normative perspective, sensory integration is based on the available cues to fulfill the task. For the second task of reporting the hand location, the two cues are xp and (with a possible effect of the visual target, which is unbiased since it is defined as 0 in model simulation; thus, its presence does not induce any shift effect). xp is used sequentially in this sense. Thus, its dual use is well justified.

      Our hypothesis is that the reported hand position results from a combination of from the previous movement and the current hand position xp. However, specifically for the overshoot of the judged hand location in the late part of the adaptation (Fig4A), the reviewer raised three alternative explanations by assuming that the PReMo model is correct. Under the PReMo model, the estimated hand location is only determined by , and xp is not used in the hand location report phase. In addition, (with xp used once) and a visual recalibration of the target can explain away the gradual shift from negative to positive (overshoot).

      We don’t think any of them can parsimoniously explain our findings here, and we go through these three hypotheses one by one:

      (1) increasing xp's contribution (assuming visual uncertainty increases when the visual cursor is absent during the hand report phase)

      (2) decreasing σp (assuming that participants pay more attention to the hand during the report phase)

      The first two alternative explanations basically assume that xp has a larger contribution (weighting in Bayesian terms) in the hand location report phase than in the adaptation movement phase, no matter due to an increase in visual uncertainty (alternative explanation 1) or a reduction in proprioceptive uncertainty (alternative explanation 2). Thus, we assume that the reviewer suggests that a larger weight for xp can explain why the perceived hand location changes gradually from negative to positive. However, per the PReMo model, a larger weight for the xp will only affect , which is already assumed to change from negative to zero. More weight in  in the hand report phase (compared to the adaptation movement phase) would not explain away the reported hand location from negative to positive. This is because no matter how much weight the xp has, the PReMo model assumes a saturation for the influence of xp on . Thus would not exceed zero in the late adaptation. Then, the PReMo model would rely on the so-called visual shift of the target to explain the overshoot. This leads us to the third alternative the reviewer raised:

      (3) it could be that the perceived target position undergoes recalibration in the opposite direction to proprioceptive recalibration.

      The PReMo model originally assumed that the perceived target location was biased in order to explain away the positive overshoot of the reported hand location. We assume that the reviewer suggests that the perceived target position, which is shifted to the positive direction, also “biases” the perceived hand position. We also assume that the reviewer suggests that the perceived hand location after a clamp trial () is zero, and somehow the shifted perceived target position “biases” the reported hand location after a clamp trial. Unfortunately, we did not see any mathematical formulation of this biasing effect in the original paper (Tsay, Kim, Haith, et al., 2022). We are not able to come up with any formulation of this hypothesized biasing effect based on Bayesian cue integration principles. Target and hand are two separate perceived items; how one relates to another needs justification from a normative perspective when discussing Bayesian models. Note this is not a problem for our PEA models, in which both cues used are about hand localization, one is and the other is xp.

      We believe that mathematically formulating the biasing effect (Figure 4A) is non-trivial since the reported hand location changes continuously from negative to positive. Thus, quantitative model predictions, like the ones our PEA model presents here, are needed.

      To rigorously test the possible effect of visual recalibration of the target, there are two things to do: 1) use the psychometric method to measure the biased perception of the target, and 2) re-do Tsay et al. 2020 experiment without the target. For 2), compared to the case with the target, the PEA model would predict a larger overshoot, while the PReMo would predict a smaller overshoot or even zero overshoot. This can be left for future studies.

      (2) Effect of Visual Uncertainty on Error Size:

      I appreciate the authors' response about methodological differences between the cursor cloud used in previous studies and the Gaussian blob used in the current study. However, it is still not clear to me how the authors reconcile previous studies showing that visual uncertainty reduced implicit adaptation for small but not large errors (Tsay et al, 2021; Makino, et al 2023) with the current findings, where visual uncertainty reduced implicit adaptation for large but not small errors.

      Could the authors connect the dots here: I could see that the cursor cloud increases potential overlap with the visual target when the visual error is small, resulting in intrinsic reward-like mechanisms (Kim et al, 2019), which could potentially explain attenuated implicit adaptation for small visual errors. However, why would implicit adaptation in response to large visual errors remain unaffected by the cursor cloud? Note that we did verify that sigma_v is increased in (Tsay et al. 2021), so it is unlikely due to the cloud simply failing as a manipulation of visual uncertainty.

      In addition, we also reasoned that testing individuals with low vision could offer a different test of visual uncertainty (Tsay et al, 2023). The advantage here is that both control and patients with low vision are provided with the same visual input-a single cursor. Our findings suggest that uncertainty due to low vision also shows reduced implicit adaptation in response to small but not large errors, contrary to the findings in the current paper. Missing in the manuscript is a discussion related to why the authors' current findings contradict those of previous results.

      For connecting the dots for two previous studies (Tsay et al., 2021, 2023); Note Makino et al., 2023 is not in this discussion since it investigated the weights of multiple cursors, as opposed to visual uncertainty associated with a cursor cloud):

      First, we want to re-emphasize that using the cursor cloud to manipulate visual uncertainty brings some confounds, making it not ideal for studying visuomotor adaptation. For example, in the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) and the sensory estimate of angles (the error estimate, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (σv) in our model), but it additionally affects the mean of the distribution (µ). This unnecessary confound is neatly avoided by using cursor blurring, which is still a cursor with its center (µ) unchanged from a single cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2020, the cursor cloud often overlaps with the visual target; this "target hit" would affect adaptation, possibly via a reward learning mechanism (Kim et al., 2019). This is a second confound that accompanies the cursor cloud. Yes, the cursor cloud was verified as associated with high visual uncertainty (Tsay et al., 2021); this verification was done with a psychophysics method with a clean background, not in the context of a hand reaching a target that is needed. Thus, despite the cursor cloud having a sizeable visual uncertainty, our criticisms for it still hold when used in error-clamp adaptation.

      Second, bearing these confounds of the cursor cloud in mind, we postulate one important factor that has not been considered in any models thus far that might underlie the lack of difference between the single-cursor clamp and the cloud-cursor clamp when the clamp size is large: the cursor cloud might be harder to ignore than a single cursor. For Bayesian sensory integration, the naive model is to consider the relative reliability of cues only. Yes, the cloud is more uncertain in terms of indicating the movement direction than a single cursor. However, given its large spread, it is probably harder to ignore during error-clamp movements. Note that ignoring the clamped cursor is the task instruction, but the large scatter of the cursor cloud is more salient and thus plausible and harder to ignore. This might increase the weighting of the visual cue despite its higher visual uncertainty. This extra confound is arguably minimized by using the blurred cursor as in our Exp4 since the blurred cursor did not increase the visual angle much (Figure 5D; blurred vs single cursor: 3.4mm vs 2.5mm in radius, 3.90o vs  2.87o in spread). In contrast, the visual angle of the dot cloud is at least a magnitude larger (cursor cloud vs. single cursor: at least 25o vs. 2.15o in the spread, given a 10o standard deviation of random sampling).

      Third, for the low-vision study (Tsay et al., 2023), the patients indeed show reduced implicit adaptation for a 3 o clamp (consistent with our PEA model) but an intact adaptation for 30-degree clamp (not consistent). Though this pattern appears similar to what happens for normal people whose visual uncertainty is upregulated by cursor cloud (Tsay et al., 2021), we are not completely convinced that the same underlying mechanism governs these two datasets. Low-vision patients indeed have higher visual uncertainty about color, brightness, and object location, but their visual uncertainty about visual motion is still unknown. Due to the difference in impairment among low vision people (e.g., peripheral or central affected) and the different roles of peripheral and central vision in movement planning and control (Sivak & Mackenzie, 1992), it is unclear about the overall effect of visual uncertainty in low vision people. The direction of cursor movement that matters for visuomotor rotation here is likely related to visual motion perception. Unfortunately, the original study did not measure this uncertainty in low-vision patients. We believe our Exp1 offers a valid method for this purpose for future studies. More importantly, we should not expect low-vision patients to integrate visual cues in the same way as normal people, given their long-term adaptation to their vision difficulties. Thus, we are conservative about interpreting the seemingly similar findings across the two studies (Tsay et al., 2021, 2023) as revealing the same mechanism.

      A side note: these two previous studies proposed a so-called mis-localization hypothesis, i.e., the cursor cloud was mislocated for small clamp size (given its overlapping with the target) but not for large clamp size. They suggested that the lack of uncertainty effect at small clamp sizes is due to mislocalization, while the lack of uncertainty effect at large clamp sizes is because implicit adaptation is not sensitive to uncertainty at large angles. Thus, these two studies admit that cursor cloud not only upregulates uncertainty but also generates an unwanted effect of so-called “mis-localization” (overlapping with the target). Interestingly, their hypothesis about less sensitivity to visual uncertainty for large clamps is not supported by a model or theory but merely a re-wording of the experiment results.

      In sum, our current study cannot offer an easy answer to "connect the dots" in the aforementioned two studies due to methodology issues and the specialty of the population. However, for resolving conflicting findings, our study suggests solutions include using a psychometric test to quantify visual uncertainty for cursor motion (Exp1), a better uncertainty-manipulation method to avoid a couple of confounds (Exp4, blurred cursor), and a falsifiable model. Future endeavors can solve the difference between studies based on the new insights from the current.

      Reviewer #2 (Public Review):

      Summary:

      The authors present the Perceptual Error Adaptation (PEA) model, a computational approach offering a unified explanation for behavioral results that are inconsistent with standard state-space models. Beginning with the conventional state-space framework, the paper introduces two innovative concepts. Firstly, errors are calculated based on the perceived hand position, determined through Bayesian integration of visual, proprioceptive, and predictive cues. Secondly, the model accounts for the eccentricity of vision, proposing that the uncertainty of cursor position increases with distance from the fixation point. This elegantly simple model, with minimal free parameters, effectively explains the observed plateau in motor adaptation under the implicit motor adaptation paradigm using the error-clamp method. Furthermore, the authors experimentally manipulate visual cursor uncertainty, a method established in visuomotor studies, to provide causal evidence. Their results show that the adaptation rate correlates with perturbation sizes and visual noise, uniquely explained by the PEA model and not by previous models. Therefore, the study convincingly demonstrates that implicit motor adaptation is a process of Bayesian cue integration

      Strengths:

      In the past decade, numerous perplexing results in visuomotor rotation tasks have questioned their underlying mechanisms. Prior models have individually addressed aspects like aiming strategies, motor adaptation plateaus, and sensory recalibration effects. However, a unified model encapsulating these phenomena with a simple computational principle was lacking. This paper addresses this gap with a robust Bayesian integration-based model. Its strength lies in two fundamental assumptions: motor adaptation's influence by visual eccentricity, a well-established vision science concept, and sensory estimation through Bayesian integration. By merging these well-founded principles, the authors elucidate previously incongruent and diverse results with an error-based update model. The incorporation of cursor feedback noise manipulation provides causal evidence for their model. The use of eye-tracking in their experimental design, and the analysis of adaptation studies based on estimated eccentricity, are particularly elegant. This paper makes a significant contribution to visuomotor learning research.

      The authors discussed in the revised version that the proposed model can capture the general implicit motor learning process in addition to the visuomotor rotation task. In the discussion, they emphasize two main principles: the automatic tracking of effector position and the combination of movement cues using Bayesian integration. These principles are suggested as key to understanding and modeling various motor adaptations and skill learning. The proposed model could potentially become a basis for creating new computational models for skill acquisition, especially where current models fall short.

      Weaknesses:

      The proposed model is described as elegant. In this paper, the authors test the model within a limited example condition, demonstrating its relevance to the sensorimotor adaptation mechanisms of the human brain. However, the scope of the model's applicability remains unclear. It has shown the capacity to explain prior data, thereby surpassing previous models that rely on elementary mathematics. To solidify its credibility in the field, the authors must gather more supporting evidence.

      Indeed, our model here is based on one particular experimental paradigm, i.e., the error-clamp adaptation. We used it simply because 1) this paradigm is one rare example that implicit motor learning can be isolated in a clean way, and 2) there are a few conflicting findings in the literature for us to explain away by using a unified model.

      For our model’s broad impact, we believe that as long as people need to locate their effectors during motor learning, the general principle laid out here will be applicable. In other words, repetitive movements with a Bayesian cue combination of movement-related cues can underlie the implicit process of various motor learning. To showcase its broad impact, in upcoming studies, we will extend this model to other motor learning paradigms, starting from motor adaptation paradigms that involve both explicit and implicit processes.

      Reviewer #3 (Public Review):

      (2.1) Summary

      In this paper, the authors model motor adaptation as a Bayesian process that combines visual uncertainty about the error feedback, uncertainty about proprioceptive sense of hand position, and uncertainty of predicted (=planned) hand movement with a learning and retention rate as used in state space models. The model is built with results from several experiments presented in the paper and is compared with the PReMo model (Tsay, Kim et al., 2022) as well as a cue combination model (Wei & Körding, 2009). The model and experiments demonstrate the role of visual uncertainty about error feedback in implicit adaptation.

      In the introduction, the authors notice that implicit adaptation (as measured in error-clamp based paradigms) does not saturate at larger perturbations, but decreases again (e.g. Moorehead et al., 2017 shows no adaptation at 135{degree sign} and 175{degree sign} perturbations). They hypothesized that visual uncertainty about cursor position increases with larger perturbations since the cursor is further from the fixated target. This could decrease importance assigned to visual feedback which could explain lower asymptotes.

      The authors characterize visual uncertainty for 3 rotation sizes in a first experiment, and while this experiment could be improved, it is probably sufficient for the current purposes. Then the authors present a second experiment where adaptation to 7 clamped errors are tested in different groups of participants. The models' visual uncertainty is set using a linear fit to the results from experiment 1, and the remaining 4 parameters are then fit to this second data set. The 4 parameters are 1) proprioceptive uncertainty, 2) uncertainty about the predicted hand position, 3) a learning rate and 4) a retention rate. The authors' Perceptual Error Adaptation model ("PEA") predicts asymptotic levels of implicit adaptation much better than both the PReMo model (Tsay, Kim et al., 2022), which predicts saturated asymptotes, or a causal inference model (Wei & Körding, 2007) which predicts no adaptation for larger rotations. In a third experiment, the authors test their model's predictions about proprioceptive recalibration, but unfortunately compare their data with an unsuitable other data set (Tsay et al. 2020, instead of Tsay et al. 2021). Finally, the authors conduct a fourth experiment where they put their model to the test. They measure implicit adaptation with increased visual uncertainty, by adding blur to the cursor, and the results are again better in line with their model (predicting overall lower adaptation), than with the PReMo model (predicting equal saturation but at larger perturbations) or a causal inference model (predicting equal peak adaptation, but shifted to larger rotations). In particular the model fits for experiment 2 and the results from experiment 4 show that the core idea of the model has merit: increased visual uncertainty about errors dampens implicit adaptation.

      (2.2) Strengths

      In this study the authors propose a Perceptual Error Adaptation model ("PEA") and the work combines various ideas from the field of cue combination, Bayesian methods and new data sets, collected in four experiments using various techniques that test very different components of the model. The central component of visual uncertainty is assessed in a first experiment. The model uses 4 other parameters to explain implicit adaptation. These parameters are: 1) a learning and 2) a retention rate, as used in popular state space models and the uncertainty (variance) of 3) predicted and 4) proprioceptive hand position. In particular, the authors observe that asymptotes for implicit learning do not saturate, as claimed before, but decrease again when rotations are very large and that this may have to do with visual uncertainty (e.g. Tsay et al., 2021, J Neurophysiol 125, 12-22). The final experiment confirms predictions of the fitted model about what happens when visual uncertainty is increased (overall decrease of adaptation). By incorporating visual uncertainty depending on retinal eccentricity, the predictions of the PEA model for very large perturbations are notably different from, and better than, the predictions of the two other models it is compared to. That is, the paper provides strong support for the idea that visual uncertainty of errors matters for implicit adaptation.

      (2.3) Weaknesses

      Although the authors don't say this, the "concave" function that shows that adaptation does not saturate for larger rotations has been shown before, including in papers cited in this manuscript.

      For a proper citation of the “concave” adaptation function: we assume the reviewer is referring to the study by Morehead, 2017 which tested large clamp sizes up to 135 o and 175 o. Unsurprisingly, the 135 o and 175 o conditions lead to nearly zero adaptation, possibly due to the trivial fact that people cannot even see the moving cursor. We have quoted this seminar study from the very beginning. All other error-clamp studies with a block design emphasized an invariant or saturated implicit adaptation with large rotations (e.g., Kim, et al., 2019).

      The first experiment, measuring visual uncertainty for several rotation sizes in error-clamped paradigms has several shortcomings, but these might not be so large as to invalidate the model or the findings in the rest of the manuscript. There are two main issues we highlight here. First, the data is not presented in units that allow comparison with vision science literature. Second, the 1 second delay between movement endpoint and disappearance of the cursor, and the presentation of the reference marker, may have led to substantial degradation of the visual memory of the cursor endpoint. That is, the experiment could be overestimating the visual uncertainty during implicit adaptation.

      For the issues related to visual uncertainty measurement in Exp1:

      First, our visual uncertainty is about cursor motion direction in the display plane, and the measurement in Exp1 has never been done before. Thus, we do not think our data is comparable to any findings in visual science about fovea/peripheral comparison. We quoted Klein and others’ work (Klein & Levi, 1987; Levi et al., 1987) in vision science since their studies showed that the deviation from the fixation is associated with an increase in visual uncertainty. Their study thus inspired us to conduct Exp1 to probe how our concerned visual uncertainty (specifically for visual motion direction) changes with an increasing deviation from the fixation. Any model and its model parameters should be specifically tailored to the task or context it tries to emulate. In our case, motion direction in a center-out-reaching setting is the modeled context, and all the relevant model parameters should be specified in movement angles. This is particularly important since we need to estimate parameters from one experiment to predict behaviors in another experiment.

      Second, the 1s delay of the reference cursor has minimal impact on the estimate of visual uncertainty based on previous vision studies. Our Exp1 used a similar visual paradigm by (White et al., 1992), which shows that delay does not lead to an increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6).

      These two problems have been addressed in the revised manuscript, with proper citations listed.

      The paper's third experiment relies to a large degree on reproducing patterns found in one particular paper, where the reported hand positions - as a measure of proprioceptive sense of hand position - are given and plotted relative to an ever present visual target, rather than relative to the actual hand position. That is, 1) since participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to, and 2) if the reports are converted to a difference between the real and reported hand position (rather than the difference between the target and the report), those would be on the order of ~20° which is roughly two times larger than any previously reported proprioceptive recalibration, and an order of magnitude larger than what the authors themselves find (1-2°) and what their model predicts. Experiment 3 is perhaps not crucial to the paper, but it nicely provides support for the idea that proprioceptive recalibration can occur with error-clamped feedback.

      Reviewer 3 thinks Tsay 2020 dataset is not appropriate for our theorization, but we respectfully disagree. For the three points raised here, we would like to elaborate:

      (1) As we addressed in the previous response, the reported hand location in Figure 4A (Tsay et al., 2020) is not from a test of proprioceptive recalibration as conventionally defined. In the revision, we explicitly state that this dataset is not about proprioceptive recalibration and also delete texts that might mislead people to think so (see Results section). Instead, proprioceptive recalibration is measured by passive movement, as in our Exp3 (Figure 4E). For error-clamp adaptation here, "the remembered position of the target" is the target. Clearly, the participants did not report the target position, which is ever-present. Instead, their reported hand location shows an interestingly continuous change with ongoing adaptation.

      (2) Since the Tsay 2020 dataset is not a so-called proprioceptive recalibration, we need not take the difference between the reported location and the actual hand location. Indeed, the difference would be ~20 degrees, but comparing it to the previously reported proprioceptive recalibration is like comparing apples to oranges. In fact, throughout the paper, we refer to the results in Fig 4A as “reported hand location”, not proprioceptive recalibration. The target direction is defined as zero degree thus its presence will not bias the reported hand in the Bayesian cue combination (as this visual cue has a mean value of 0). Using the target as the reference also simplifies our modeling.

      (3) Exp3 is crucial for our study since it shows our model and its simple Bayesian cue combination principle are applicable not only to implicit adaptation but also to proprioceptive measures during adaptation. Furthermore, it reproduced the so-called proprioceptive recalibration and explained it away with the same Bayesian cue combination as the adaptation. We noticed that this field has accumulated an array of findings on proprioceptive changes induced by visuomotor adaptation. However, currently, there is a lack of a computational model to quantitatively explain them. Our study at least made an initial endeavor to model these changes.

      Perhaps the largest caveat to the study is that it assumes that people do not look at the only error feedback available to them (and can explicitly suppress learning from it). This was probably true in the experiments used in the manuscript, but unlikely to be the case in most of the cited literature. Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world, such that our brains may not be very good at this. So the question remains to what degree - if any - the ideas behind the model generalize to experiments without fixation control, and more importantly, to real life situations.

      The largest caveat raised by the reviewer appears to be directed to the error-clamp paradigm in general, not only to our particular study. In essence, this paradigm indeed requires participants to ignore the clamped error; thus, its induced adaptive response can be attributed to implicit adaptation. The original paper that proposed this paradigm (Morehead et al., 2017) has been cited 220 times (According to Google Scholar, at the time of this writing, 06/2024), indicating that the field has viewed this paradigm in a favorable way.

      Furthermore, we agree that this kind of instruction and feedback (invariant clamp) differ from daily life experience, but it does not prevent us from gaining theoretical insights by studying human behaviors under this kind of "artificial" task setting. Thinking of the saccadic adaptation (Deubel, 1987; Kojima et al., 2004): jumping the target while the eye moves towards it, and this somewhat artificial manipulation again makes people adapt implicitly, and the adaptation itself is a "disastrous" strategy for real-life situations. However, scientists have gained an enormous understanding of motor adaptation using this seemingly counterproductive adaptation in real life. Also, think of perceptual learning of task-irrelevant stimuli (Seitz & Watanabe, 2005, 2009): when participants are required to learn to discriminate one type of visual stimuli, the background shows another type of stimuli, which people gradually learn even though they do not even notice its presence. This "implicit" learning can be detrimental to our real life, too, but the paradigm itself has advanced our understanding of the inner workings of the cognitive system.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      L101: There is a typo: (Tsay et al., 2020), 2020) should be corrected to (Tsay et al., 2020).

      Thanks for pointing it out, we corrected this typo.

      L224-228: It would be beneficial to evaluate the validity of the estimated sigma_u and sigma_p based on previous reports.

      We can roughly estimate σu by evaluating the variability of reaching angles during the baseline phase when no perturbation is applied. The standard deviation of the reaching angle in Exp 2 is 5.128o±0.190o, which is close to the σu estimated by the model (5.048o). We also used a separate perceptual experiment to test the proprioceptive uncertainty (n = 13, See Figure S6), σp from this experiment is 9.737o±5.598o, also close to the σp extracted by the model (11.119o). We added these new analysis results to the final version of the paper.

      L289-298: I found it difficult to understand the update equations of the proprioceptive calibration based on the PEA model. Providing references to the equations or better explanations would be helpful.

      We expanded the process of proprioceptive calibration in Supplementary Text 1 with step-by-step equations and more explanations. 

      Reviewer #3 (Recommendations For The Authors):

      Suggestions (or clarification of previous suggestions) for revisions

      The authors persist on using the Tsay et al 2020 paper despite its many drawbacks which the authors attempt to address in their reply. But the main drawback is that the results in the 2020 paper is NOT relative to the unseen hand but to the visual target the participants were supposed to move their hand to. If the results were converted so to be relative to the unseen hand, the localization biases would be over 20 deg in magnitude.

      The PEA simulations are plotted relative to the unseen hand which makes sense. If the authors want to persist using the Tsay 2020 dataset despite any issues, they at least need to make sure that the simulations are mimicking the same change. That is, the data from Tsay 2020 needs to be converted to the same variable used in the current paper.

      If the main objection for using the Tsay 2021 is that the design would lead to forgetting, we found that active localization (or any intervening active movements like no-cursor reach) does lead to some interference or forgetting (a small reduction in overall magnitude of adaptation) this is not the case for passive localization, see Ruttle et al, 2021 (data on osf). This was also just a suggestion, there may of course also be other, more suitable data sets.

      As stated above, changing the reference system is not necessary, nor does it affect our results. Tsay et al 2020 dataset is unique since it shows the gradual change of reported hand location along with error-clamp adaptation. The forgetting (or reduction in proprioceptive bias), even if it exists, would not affect the fitting quality of our model for the Tsay 2020 dataset: if we assume that forgetting is invariant over the adaptation process, the forgetting would only reduce the proprioceptive bias uniformly across trials. This can be accounted for by a smaller weight on . The critical fact is that the model can explain the gradual drift of the proprioceptive judgment of the hand location.

      By the way, Ruttle et al.'s 2021 dataset is not for error-clamp adaptation, and thus we will leave it to test our model extension in the future (after incorporating an explicit process in the model).

      References

      Deubel, H. (1987). Adaptivity of gain and direction in oblique saccades. Eye Movements from Physiology to Cognition. https://www.sciencedirect.com/science/article/pii/B9780444701138500308

      Kim, H. E., Parvin, D. E., & Ivry, R. B. (2019). The influence of task outcome on implicit motor learning. ELife, 8. https://doi.org/10.7554/eLife.39882

      Klein, S. A., & Levi, D. M. (1987). Position sense of the peripheral retina. JOSA A, 4(8), 1543–1553.

      Kojima, Y., Iwamoto, Y., & Yoshida, K. (2004). Memory of learning facilitates saccadic adaptation in the monkey. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 24(34), 7531–7539.

      Levi, D. M., Klein, S. A., & Yap, Y. L. (1987). Positional uncertainty in peripheral and amblyopic vision. Vision Research, 27(4), 581–597.

      Morehead, J. R., Taylor, J. A., Parvin, D. E., & Ivry, R. B. (2017). Characteristics of implicit sensorimotor adaptation revealed by task-irrelevant clamped feedback. Journal of Cognitive Neuroscience, 29(6), 1061–1074.

      Seitz, & Watanabe. (2005). A unified model for perceptual learning. Trends in Cognitive Sciences, 9(7), 329–334.

      Seitz, & Watanabe. (2009). The phenomenon of task-irrelevant perceptual learning. Vision Research, 49(21), 2604–2610.

      Sivak, B., & Mackenzie, C. L. (1992). Chapter 10 The Contributions of Peripheral Vision and Central Vision to Prehension. In L. Proteau & D. Elliott (Eds.), Advances in Psychology (Vol. 85, pp. 233–259). North-Holland.

      Tsay, J. S., Avraham, G., Kim, H. E., Parvin, D. E., Wang, Z., & Ivry, R. B. (2021). The effect of visual uncertainty on implicit motor adaptation. Journal of Neurophysiology, 125(1), 12–22.

      Tsay, J. S., Kim, H. E., Saxena, A., Parvin, D. E., Verstynen, T., & Ivry, R. B. (2022). Dissociable use-dependent processes for volitional goal-directed reaching. Proceedings. Biological Sciences / The Royal Society, 289(1973), 20220415.

      Tsay, J. S., Kim, H., Haith, A. M., & Ivry, R. B. (2022). Understanding implicit sensorimotor adaptation as a process of proprioceptive re-alignment. ELife, 11, e76639.

      Tsay, J. S., Parvin, D. E., & Ivry, R. B. (2020). Continuous reports of sensed hand position during sensorimotor adaptation. Journal of Neurophysiology, 124(4), 1122–1130.

      Tsay, J. S., Tan, S., Chu, M. A., Ivry, R. B., & Cooper, E. A. (2023). Low Vision Impairs Implicit Sensorimotor Adaptation in Response to Small Errors, But Not Large Errors. Journal of Cognitive Neuroscience, 35(4), 736–748.

      White, J. M., Levi, D. M., & Aitsebaomo, A. P. (1992). Spatial localization without visual references. Vision Research, 32(3), 513–526.

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the influence of visual uncertainty and Bayesian cue combination on implicit motor adaptation in young healthy participants. The evidence supporting the claims of the authors is solid, although a better discussion of the link between the model variables and the outcomes of related behavioral experiments would strengthen the conclusions. The work will be of interest to researchers in sensory cue integration and motor learning.

      Public Reviews:

      Reviewer #1 (Public Review):

      This valuable study demonstrates a novel mechanism by which implicit motor adaptation saturates for large visual errors in a principled normative Bayesian manner. Additionally, the study revealed two notable empirical findings: visual uncertainty increases for larger visual errors in the periphery, and proprioceptive shifts/implicit motor adaptation are non-monotonic, rather than ramp-like. This study is highly relevant for researchers in sensory cue integration and motor learning. However, I find some areas where statistical quantification is incomplete, and the contextualization of previous studies to be puzzling.

      Thank you for your feedback and the positive highlights of our study. We appreciate your insights and will address the concerns in our revisions.

      Issue #1: Contextualization of past studies.

      While I agree that previous studies have focused on how sensory errors drive motor adaptation (e.g., Burge et al., 2008; Wei and Kording, 2009), I don't think the PReMo model was contextualized properly. Indeed, while PReMo should have adopted clearer language - given that proprioception (sensory) and kinaesthesia (perception) have been used interchangeably, something we now make clear in our new study (Tsay, Chandy, et al. 2023) - PReMo's central contribution is that a perceptual error drives implicit adaptation (see Abstract): the mismatch between the felt (perceived) and desired hand position. The current paper overlooks this contribution. I encourage the authors to contextualize PReMo's contribution more clearly throughout. Not mentioned in the current study, for example, PReMo accounts for the continuous changes in perceived hand position in Figure 4 (Figure 7 in the PReMo study).

      There is no doubt that the current study provides important additional constraints on what determines perceived hand position: Firstly, it offers a normative Bayesian perspective in determining perceived hand position. PReMo suggests that perceived hand position is determined by integrating motor predictions with proprioception, then adding a proprioceptive shift; PEA formulates this as the optimal integration of these three inputs. Secondly, PReMo assumed visual uncertainty to remain constant for different visual errors; PEA suggests that visual uncertainty ought to increase (but see Issue #2).

      Thank you for the comments and suggestions. We have now incorporated the citation for (Tsay et al., 2024), to acknowledge their clarification on the terms of perceptual error. We also agree that our model differs in two fundamental ways. One is to ditch the concept of proprioceptive shift and its contribution to the perceived hand location; instead, we resort to a “one-shot” integration of three types of cues with Bayesian rules. This is a more elegant and probably more ecological way of processing hand location per Occam's Razor. The second essential change is to incorporate the dependency of visual uncertainty on perturbation size into the model, as opposed to resorting to a ramp function of proprioceptive changes relative to perturbation size. The ramp function is not well grounded in perception studies. Yes, we acknowledged that PReMo is the first to recognize the importance of perceptual error, but highlighted the model differences in our Discussion.

      We also think the PReMo model has the potential to explain Fig 4A. But the Tsay et al., 2022 paper assumes that “a generic shift in visual space” explains the gradual proprioceptive changes from negative to positive (see page 17 in Tsay et al., 2022). We do not think that evoking this visual mechanism is necessary to explain Fig 4A; instead, the proprioceptive change is a natural result of hand deviations during implicit adaptation. As the hand moves away from the target (in the positive direction) during adaptation, the estimated hand location goes alone with it. We believe this is the correct way of explaining Fig4A results. As we played around with the PReMo model, we found it is hard to use visual shift to explain this part of data without additional assumptions (at least not with the ones published in Tsay et al., 2022). Furthermore, our PEA model also parsimoniously explains away the proprioceptive shift observed in a completely different setting, i,e., the proprioceptive changes measured by the passive method as a function of perturbation size in Exp 3.

      We expanded the discussion about the comparison between the two models, especially about their different views for explaining Fig4A.

      Issue #2: Failed replication of previous results on the effect of visual uncertainty.

      (2a) A key finding of this paper is that visual uncertainty linearly increases in the periphery; a constraint crucial for explaining the non-monotonicity in implicit adaptation. One notable methodological deviation from previous studies is the requirement to fixate on the target: Notably, in the current experiments, participants were asked to fixate on the target, a constraint not imposed in previous studies. In a free-viewing environment, visual uncertainty may not attenuate as fast, and hence, implicit adaptation does not attenuate as quickly as that revealed in the current design with larger visual errors. Seems like this current fixation design, while important, needs to be properly contextualized considering how it may not represent most implicit adaptation experiments.

      First, we don’t think there is any previous study that examined visual uncertainty as a function of perturbation size. Thus, we do not have a replication problem here. Secondly, our data indicate that even without asking people to fixate on the target, people still predominantly fixate on the target during error-clamp adaptation (when they are “free” viewing). For our Exp 1, the fixation on the straight line between the starting position and the target is 86%-95% (as shown in Figure S1 now, also see below). We also collected eye-tracking data in Exp 4, which is a typical error-clamp experiment. More than 95% fall with +/- 50 pixels around the center of the screen, even slightly higher than Exp 1. This is well understandable: the typical error-clamp adaptation requires people to ignore the cursor and move the hand towards the target. To minimize the interference of the concurrently moving cursor, people depend on the fixation on the target, the sole task-relevant visual marker in the workspace, to achieve the task goal.

      In sum, forcing the participants to fixate on the target is not because we aimed to make up the linear dependency of visual uncertainty; we required them to do so to mimic the eye-tracking pattern in typical error-clamp learning, which has been revealed in our pilot experiment. The visual uncertainty effect is sound, our study is the first to clearly demonstrate it.

      Author response image 1.

      On a side note (but an important one), the high percentage of fixation on the aiming target is also true for conventional visuomotor rotation, which involves strategic re-aiming (shown in Bromberg et al., 2019; de Brouwer et al., 2018, we have an upcoming paper to show this). This is one reason that our new theory would also be applicable to other types of motor adaptation.

      (2b) Moreover, the current results - visual uncertainty attenuates implicit adaptation in response to large, but not small, visual errors - deviates from several past studies that have shown that visual uncertainty attenuates implicit adaptation to small, but not large, visual errors (Tsay, Avraham, et al. 2021; Makino, Hayashi, and Nozaki, n.d.; Shyr and Joshi 2023). What do the authors attribute this empirical difference to? Would this free-viewing environment also result in the opposite pattern in the effect of visual uncertainty on implicit adaptation for small and large visual errors?

      We don’t think all the mentioned previous studies manipulated the visual uncertainty in a parametric way, and none of them provided quantitative measures of visual uncertainty. As we detailed in our Exp4 and in our Discussion, we don’t think Tsay et al., 2021 paper’s manipulation of visual uncertainty is appropriate (see below for 2d). Makino et al., 2023 study used multiple clamped cursors to perturb people, and its effect is not easily accountable since additional processes might be invoked given this kind of complex visual feedback. More importantly, we do not think this is a direct way of modulating visual uncertainty, nor did they provide any evidence.

      (2c) In the current study, the measure of visual uncertainty might be inflated by brief presentation times of comparison and referent visual stimuli (only 150 ms; our previous study allowed for a 500 ms viewing time to make sure participants see the comparison stimuli). Relatedly, there are some individuals whose visual uncertainty is greater than 20 degrees standard deviation. This seems very large, and less likely in a free-viewing environment.

      For our 2AFC, the reference stimulus is the actual clamped cursor, which lasts for 800 ms. The comparison stimulus is a 150-ms dot representation appearing near the reference. For measuring perception of visual motion, this duration is sufficient as previous studies used similar durations (Egly & Homa, 1984; Owsley et al., 1995). We think the 20-degree standard deviation is reasonable given that people fixate on the target, with only peripheral vision to process the fast moving cursor. The steep linear increase in visual uncertainty about visual motion is well documented. The last author of this paper has shown that the uncertainty of visual motion speed (though not about angels) follows the same steep trend (Wei et al., 2010). It is noteworthy that without using our measured visual uncertainty in Exp1, if we fit the adaptation data in Exp2 to “estimate” the visual uncertainty, they are in fact well aligned with each other (see Figure S7 and Supplementary Text 2). This is a strong support that our estimation is valid and accurate. We think this high visual uncertainty is an important message to the field. Thus we now highlighted its magnitude in our Discussion.

      (2d) One important confound between clear and uncertain (blurred) visual conditions is the number of cursors on the screen. The number of cursors may have an attenuating effect on implicit adaptation simply due to task-irrelevant attentional demands (Parvin et al. 2022), rather than that of visual uncertainty. Could the authors provide a figure showing these blurred stimuli (gaussian clouds) in the context of the experimental paradigm? Note that we addressed this confound in the past by comparing participants with and without low vision, where only one visual cursor is provided for both groups (Tsay, Tan, et al. 2023).

      Thank you for raising this important point about types of visual stimuli for manipulating uncertainty. We used Gaussian blur of a single cursor (similar to Burge et al., 2008) instead of a cloud of dots. We now added a figure inset to show how this blur looks.

      Using a cursor cloud Makino et al., 2023; Tsay et al., 2021 to modulate visual uncertainty has inherent drawbacks that make it unsuitable for visuomotor adaptation. For the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) and the sensory estimate of angles (the error estimate, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (sigma_v       in         our       model), but it additionally affects the mean of the distribution (mu). This unnecessary confound is avoided by using cursor blurring, which is still a cursor with its center (mu) unchanged from a single cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2021, the cursor cloud often overlaps with the visual target, this “target hit” would affect adaptation, possibly via a reward learning mechanism (See Kim et al., 2019). This is a second confound that accompanies the cursor cloud.

      Issue #3: More methodological details are needed.

      (3a) It's unclear why, in Figure 4, PEA predicts an overshoot in terms of perceived hand position from the target. In PReMo, we specified a visual shift in the perceived target position, shifted towards the adapted hand position, which may result in overshooting of the perceived hand position with this target position. This visual shift phenomenon has been discovered in previous studies (e.g., (Simani, McGuire, and Sabes 2007)).

      Visual shift, as it is called in Simani et al., 2007, is irrelevant for our task here. The data we are modeling are motor adaptation (hand position changes) and so-called proprioceptive changes (hand localization changes), both are measured and referenced in the extrinsic coordinate, not referenced to a visual target. For instance, the proprioceptive changes are either relative to the actual hand location (Exp 3) or relative to the goal (Fig 4A). We also don’t think visual shift is necessary in explaining the perceptual judgment of an unseen hand (the target shown during the judgment indeed has an effect of reducing the biasing effect of PE, see below for responses to reviewer 3).

      In the PEA model, the reported hand angle is the result of integrating cues from the actual hand position and the estimated hand position (x_hand_hat) from previous movements. This integration process leads to the combined reported hand position potentially overshooting or undershooting, depending on the degree of adaptation. It is the changed proprioceptive cue (because the actively moved hand slowly adapted to the error clamp) leading to the overshoot of the perceived hand position.

      In Results, we now explain these value changes with parentheses. Model details about the mechanisms of cue combination and model predictions can be found in Supplementary Text 1. We believe these detailed explanations can make this apparent.

      (3b) The extent of implicit adaptation in Experiment 2, especially with smaller errors, is unclear. The implicit adaptation function seems to be still increasing, at least by visual inspection. Can the authors comment on this trend, and relatedly, show individual data points that help the reader appreciate the variability inherent to these data?

      Indeed, the adaptation for small errors appears not completely saturated with our designated number of trials. However, this will not affect our model analysis. Our model fitting for PEA and other competing models is done on the time-series of adaptation, not on the saturated adaptation extent (see Fig 3A). Thus, despite that some conditions might not produce the full range of adaptation, the data is sufficient to constrain the models. We now mention this concern in Results; we also emphasize that the model not only explains the adaptation magnitude (operationally defined as adaptation extent measured at the same time, i.e., the end of the adaptation phase) but also the full learning process.

      In response, we have included individual data points in the revised Figure 3B-D to provide a clear illustration of the extent of implicit adaptation, particularly for small perturbations.

      (3c) The same participants were asked to return for multiple days/experiments. Given that the authors acknowledge potential session effects, with attenuation upon re-exposure to the same rotation (Avraham et al. 2021), how does re-exposure affect the current results? Could the authors provide clarity, perhaps a table, to show shared participants between experiments and provide evidence showing how session order may not be impacting results?

      Thank you for raising the issue of session and re-exposure effects. First, we don’t think Exp1 has an effect on Exp4. Exp1 is a perceptual task and Exp4 is a motor adaptation task. Furthermore, Exp1 used random visual stimuli on both sides, thus it did not lead to any adaptation effect on its own. Second, Exp4 indeed had three sessions performed on three days, but the session effect does not change our main conclusion about the visual uncertainty. We used a 3-way repeated-measures anova (3 day x 3 perturbation x 2 visual uncertainty) revealed a significant main effect of day (F(2,36) = 17.693, p<0.001), indicating changes in performance across sessions (see Figure below). Importantly, the effects of perturbation and visual uncertainty (including their interactions) remain the same. The day factor did not interact with them. The main effect of day shows that the overall adaptation effect is reduced across days. Post-hoc pairwise comparisons elucidated that single-trial learning (STL) performance on Day 1 was significantly higher than on Day 2 (p = 0.004) and Day 3 (p < 0.001), with no significant difference between Day 2 and Day 3 (p = 0.106). Other ANOVA details: significant main effects for perturbation (F(1,36) = 8.872, p<0.001) and visual uncertainty (F(1,18) = 49.164, p<0.001), as well as a significant interaction between perturbation size and visual uncertainty (F(2,36) = 5.160, p = 0.013). There were no significant interactions involving the day factor with any other factors (all p > 0.182). Thus, the overall adaptation decreases over the days, but the day does not affect our concerned interaction effect of visual uncertainty and perturbation. The fact that their interaction preserved over different sessions strengthened our conclusion about how visual uncertainty systematically affects implicit adaptation.

      Author response image 2.

      (3d) The number of trials per experiment should be detailed more clearly in the Methods section (e.g., Exp 4). Moreover, could the authors please provide relevant code on how they implemented their computational models? This would aid in future implementation of these models in future work. I, for one, am enthusiastic to build on PEA.

      We have clarified the number of trials conducted in each experiment, with detailed information now readily available in the Methods section of the main text. In addition, we have made the code for data analysis and modeling publicly accessible. These resources can be found in the updated "Data Availability" section of our paper.

      (3f) In addition to predicting a correlation between proprioceptive shift and implicit adaptation on a group level, both PReMo and PEA (but not causal inference) predict a correlation between individual differences in proprioceptive shift and proprioceptive uncertainty with the extent of implicit adaptation (Tsay, Kim, et al. 2021). Interestingly, shift and uncertainty are independent (see Figures 4F and 6C in Tsay et al, 2021). Does PEA also predict independence between shift and uncertainty? It seems like PEA does predict a correlation.

      Thank you for addressing this insightful question. Our PEA model indeed predicts a positive correlation (although not linear) between the proprioceptive uncertainty and the amplitude of the estimated hand position (x_hand_hat). This prediction is consistent with the simulations conducted, using the same parameters that were applied to generate the results depicted in

      Figure 4B of our manuscript (there is a sign flip as x_hand_hat is negative).

      Author response image 3.

      Regarding the absence of a correlation observed in Tsay et al., 2021, we offer several potential explanations for this discrepancy. First, the variability observed in passive hand localization during motor adaptation (as in Tsay et al., 2021) does not directly equal proprioceptive uncertainty, which typically requires psychophysical testing to accurately assess. Second, our study showed that the proprioceptive bias attenuates during the repetitive measurements; in our Exp3, it decreased within a block of three trials. We noticed that Tsay et al., 2021 study used 36 measurements in a row without interleaving adaptation trials. Thus, the “averaged” proprioceptive bias in Tsay’s study might not reflect the actual bias during adaptation. We also noticed that that study showed large individual differences in both proprioceptive bias and proprioceptive variability (not uncertainty), thus getting a positive result, if it were really there, would require a large number of participants, probably larger than their n=30ish sample size. These putative explanations are not put in the revision, which already has a long discussion and has no space for discussing about a null result.

      Reviewer #2 (Public Review):

      Summary:

      The authors present the Perceptual Error Adaptation (PEA) model, a computational approach offering a unified explanation for behavioral results that are inconsistent with standard state-space models. Beginning with the conventional state-space framework, the paper introduces two innovative concepts. Firstly, errors are calculated based on the perceived hand position, determined through Bayesian integration of visual, proprioceptive, and predictive cues. Secondly, the model accounts for the eccentricity of vision, proposing that the uncertainty of cursor position increases with distance from the fixation point. This elegantly simple model, with minimal free parameters, effectively explains the observed plateau in motor adaptation under the implicit motor adaptation paradigm using the error-clamp method. Furthermore, the authors experimentally manipulate visual cursor uncertainty, a method established in visuomotor studies, to provide causal evidence. Their results show that the adaptation rate correlates with perturbation sizes and visual noise, uniquely explained by the PEA model and not by previous models. Therefore, the study convincingly demonstrates that implicit motor adaptation is a process of Bayesian cue integration

      Strengths:

      In the past decade, numerous perplexing results in visuomotor rotation tasks have questioned their underlying mechanisms. Prior models have individually addressed aspects like aiming strategies, motor adaptation plateaus, and sensory recalibration effects. However, a unified model encapsulating these phenomena with a simple computational principle was lacking. This paper addresses this gap with a robust Bayesian integration-based model. Its strength lies in two fundamental assumptions: motor adaptation's influenced by visual eccentricity, a well-established vision science concept, and sensory estimation through Bayesian integration. By merging these well-founded principles, the authors elucidate previously incongruent and diverse results with an error-based update model. The incorporation of cursor feedback noise manipulation provides causal evidence for their model. The use of eye-tracking in their experimental design, and the analysis of adaptation studies based on estimated eccentricity, are particularly elegant. This paper makes a significant contribution to visuomotor learning research.

      Weaknesses:

      The paper provides a comprehensive account of visuomotor rotation paradigms, addressing incongruent behavioral results with a solid Bayesian integration model. However, its focus is narrowly confined to visuomotor rotation, leaving its applicability to broader motor learning paradigms, such as force field adaptation, saccadic adaptation, and de novo learning paradigms, uncertain. The paper's impact on the broader fields of neuroscience and cognitive science may be limited due to this specificity. While the paper excellently demonstrates that specific behavioral results in visuomotor rotation can be explained by Bayesian integration, a general computational principle, its contributions to other motor learning paradigms remain to be explored. The paper would benefit from a discussion on the model's generality and its limitations, particularly in relation to the undercompensating effects in other motor learning paradigms.

      Thank you for your thoughtful review and recognition of the contributions our work makes towards understanding implicit motor adaptation through the Perceptual Error Adaptation (PEA) model. We appreciate your suggestion to broaden the discussion about the model's applicability beyond the visuomotor rotation paradigm, a point we acknowledge was not sufficiently explored in our initial discussion.

      Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating movement feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.

      We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least should consider using the two principles we laid out above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.

      We added more discussion on the possible broad implications of our model in the revision.

      Reviewer #3 (Public Review):

      Summary

      In this paper, the authors model motor adaptation as a Bayesian process that combines visual uncertainty about the error feedback, uncertainty about proprioceptive sense of hand position, and uncertainty of predicted (=planned) hand movement with a learning and retention rate as used in state space models. The model is built with results from several experiments presented in the paper and is compared with the PReMo model (Tsay, Kim, et al., 2022) as well as a cue combination model (Wei & Körding, 2009). The model and experiments demonstrate the role of visual uncertainty about error feedback in implicit adaptation.

      In the introduction, the authors notice that implicit adaptation (as measured in error-clamp-based paradigms) does not saturate at larger perturbations, but decreases again (e.g. Moorehead et al., 2017 shows no adaptation at 135{degree sign} and 175{degree sign} perturbations). They hypothesized that visual uncertainty about cursor position increases with larger perturbations since the cursor is further from the fixated target. This could decrease the importance assigned to visual feedback which could explain lower asymptotes.

      The authors characterize visual uncertainty for 3 rotation sizes in the first experiment, and while this experiment could be improved, it is probably sufficient for the current purposes. Then the authors present a second experiment where adaptation to 7 clamped errors is tested in different groups of participants. The models' visual uncertainty is set using a linear fit to the results from experiment 1, and the remaining 4 parameters are then fit to this second data set. The 4 parameters are 1) proprioceptive uncertainty, 2) uncertainty about the predicted hand position, 3) a learning rate, and 4) a retention rate. The authors' Perceptual Error Adaptation model ("PEA") predicts asymptotic levels of implicit adaptation much better than both the PReMo model (Tsay, Kim et al., 2022), which predicts saturated asymptotes, or a causal inference model (Wei & Körding, 2007) which predicts no adaptation for larger rotations. In a third experiment, the authors test their model's predictions about proprioceptive recalibration, but unfortunately, compare their data with an unsuitable other data set. Finally, the authors conduct a fourth experiment where they put their model to the test. They measure implicit adaptation with increased visual uncertainty, by adding blur to the cursor, and the results are again better in line with their model (predicting overall lower adaptation) than with the PReMo model (predicting equal saturation but at larger perturbations) or a causal inference model (predicting equal peak adaptation, but shifted to larger rotations). In particular, the model fits experiment 2 and the results from experiment 4 show that the core idea of the model has merit: increased visual uncertainty about errors dampens implicit adaptation.

      Strengths

      In this study, the authors propose a Perceptual Error Adaptation model ("PEA") and the work combines various ideas from the field of cue combination, Bayesian methods, and new data sets, collected in four experiments using various techniques that test very different components of the model. The central component of visual uncertainty is assessed in the first experiment. The model uses 4 other parameters to explain implicit adaptation. These parameters are 1) learning and 2) retention rate, as used in popular state space models, and the uncertainty (variance) of 3) predicted and 4) proprioceptive hand position. In particular, the authors observe that asymptotes for implicit learning do not saturate, as claimed before, but decrease again when rotations are very large and that this may have to do with visual uncertainty (e.g. Tsay et al., 2021, J Neurophysiol 125, 12-22). The final experiment confirms predictions of the fitted model about what happens when visual uncertainty is increased (overall decrease of adaptation). By incorporating visual uncertainty depending on retinal eccentricity, the predictions of the PEA model for very large perturbations are notably different from and better than, the predictions of the two other models it is compared to. That is, the paper provides strong support for the idea that visual uncertainty of errors matters for implicit adaptation.

      Weaknesses

      Although the authors don't say this, the "concave" function that shows that adaptation does not saturate for larger rotations has been shown before, including in papers cited in this manuscript.

      The first experiment, measuring visual uncertainty for several rotation sizes in error-clamped paradigms has several shortcomings, but these might not be so large as to invalidate the model or the findings in the rest of the manuscript. There are two main issues we highlight here. First, the data is not presented in units that allow comparison with vision science literature. Second, the 1 second delay between the movement endpoint and the disappearance of the cursor, and the presentation of the reference marker, may have led to substantial degradation of the visual memory of the cursor endpoint. That is, the experiment could be overestimating the visual uncertainty during implicit adaptation.

      The paper's third experiment relies to a large degree on reproducing patterns found in one particular paper, where the reported hand positions - as a measure of proprioceptive sense of hand position - are given and plotted relative to an ever-present visual target, rather than relative to the actual hand position. That is, 1) since participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to, and 2) if the reports are converted to a difference between the real and reported hand position (rather than the difference between the target and the report), those would be on the order of ~20{degree sign} which is roughly two times larger than any previously reported proprioceptive recalibration, and an order of magnitude larger than what the authors themselves find (1-2{degree sign}) and what their model predicts. Experiment 3 is perhaps not crucial to the paper, but it nicely provides support for the idea that proprioceptive recalibration can occur with error-clamped feedback.

      Perhaps the largest caveat to the study is that it assumes that people do not look at the only error feedback available to them (and can explicitly suppress learning from it). This was probably true in the experiments used in the manuscript, but unlikely to be the case in most of the cited literature. Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world, such that our brains may not be very good at this. So the question remains to what degree - if any - the ideas behind the model generalize to experiments without fixation control, and more importantly, to real-life situations.

      Specific comments:

      A small part of the manuscript relies on replicating or modeling the proprioceptive recalibration in a study we think does NOT measure proprioceptive recalibration (Tsay, Parvin & Ivry, JNP, 2020). In this study, participants reached for a visual target with a clamped cursor, and at the end of the reach were asked to indicate where they thought their hand was. The responses fell very close to the visual target both before and after the perturbation was introduced. This means that the difference between the actual hand position, and the reported/felt hand position gets very large as soon as the perturbation is introduced. That is, proprioceptive recalibration would necessarily have roughly the same magnitude as the adaptation displayed by participants. That would be several times larger than those found in studies where proprioceptive recalibration is measured without a visual anchor. The data is plotted in a way that makes it seem like the proprioceptive recalibration is very small, as they plot the responses relative to the visual target, and not the discrepancy between the actual and reported hand position. It seems to us that this study mostly measures short-term visual memory (of the target location). What is astounding about this study is that the responses change over time to begin with, even if only by a tiny amount. Perhaps this indicates some malleability of the visual system, but it is hard to say for sure.

      Regardless, the results of that study do not form a solid basis for the current work and they should be removed. We would recommend making use of the dataset from the same authors, who improved their methods for measuring proprioception shifts just a year later (Tsay, Kim, Parvin, Stover, and Ivry, JNP, 2021). Although here the proprioceptive shifts during error-clamp adaptation (Exp 2) were tiny, and not quite significant (p<0.08), the reports are relative to the actual location of the passively placed unseen hand, measured in trials separate from those with reach adaptation and therefore there is no visual target to anchor their estimates to.

      Experiment 1 measures visual uncertainty with increased rotation size. The authors cite relevant work on this topic (Levi & Klein etc) which has found a linear increase in uncertainty of the position of more and more eccentrically displayed stimuli.

      First, this is a question where the reported stimuli and effects could greatly benefit from comparisons with the literature in vision science, and the results might even inform it. In order for that to happen, the units for the reported stimuli and effects should (also) be degrees of visual angle (dva).

      As far as we know, all previous work has investigated static stimuli, where with moving stimuli, position information from several parts of the visual field are likely integrated over time in a final estimate of position at the end of the trajectory (a Kalman filter type process perhaps). As far as we know, there are no studies in vision science on the uncertainty of the endpoint of moving stimuli. So we think that the experiment is necessary for this study, but there are some areas where it could be improved.

      Then, the linear fit is done in the space of the rotation size, but not in the space of eccentricity relative to fixation, and these do not necessarily map onto each other linearly. If we assume that the eye-tracker and the screen were at the closest distance the manufacturer reports it to work accurately at (45 cm), we would get the largest distances the endpoints are away from fixation in dva. Based on that assumed distance between the participant and monitor, we converted the rotation angles to distances between fixation and the cursor endpoint in degrees visual angle: 0.88, 3.5, and 13.25 dva (ignoring screen curvature, or the absence of it). The ratio between the perturbation angle and retinal distance to the endpoint is roughly 0.221, 0.221, and 0.207 if the minimum distance is indeed used - which is probably fine in this case. But still, it would be better to do fit in the relevant perceptual coordinate system.

      The first distance (4 deg rotation; 0.88 dva offset between fixation and stimulus) is so close to fixation (even at the assumed shortest distance between eye and screen) that it can be considered foveal and falls within the range of noise of eye-trackers + that of the eye for fixating. There should be no uncertainty on or that close to the fovea. The variability in the data is likely just measurement noise. This also means that a linear fit will almost always go through this point, somewhat skewing the results toward linearity. The advantage is that the estimate of the intercept (measurement noise) is going to be very good. Unfortunately, there are only 2 other points measured, which (if used without the closest point) will always support a linear fit. Therefore, the experiment does not seem suitable to test linearity, only to characterize it, which might be sufficient for the current purposes. We'd understand if the effort to do a test of linearity using many more rotations requires too much effort. But then it should be made much clearer that the experiment assumes linearity and only serves to characterize the assumed linearity.

      Final comment after the consultation session:

      There were a lot of discussions about the actual interpretation of the behavioral data from this paper with regards to past papers (Tsay et al. 2020 or 2021), and how it matches the different variables of the model. The data from Tsay 2020 combined both proprioceptive information (Xp) and prediction about hand position (Xu) because it involves active movements. On the other hand, Tsay et al. 2021 is based on passive movements and could provide a better measure of Xp alone. We would encourage you to clarify how each of the variables used in the model is mapped onto the outcomes of the cited behavioral experiments.

      The reviewers discussed this point extensively during the consultation process. The results reported in the Tsay 2020 study reflect both proprioception and prediction. However, having a visual target contributes more than just prediction, it is likely an anchor in the workspace that draws the response to it. Such that the report is dominated by short-term visual memory of the target (which is not part of the model). However, in the current Exp 3, as in most other work investigating proprioception, this is calculated relative to the actual direction.

      The solution is fairly simple. In Experiment 3 in the current study, Xp is measured relative to the hand without any visual anchors drawing responses, and this is also consistent with the reference used in the Tsay et al 2021 study and from many studies in the lab of D. Henriques (none of which also have any visual reach target when measuring proprioceptive estimates). So we suggest using a different data set that also measures Xp without any other influences, such as the data from Tsay et al 2021 instead.

      These issues with the data are not superficial and can not be solved within the model. Data with correctly measured biases (relative to the hand) that are not dominated by irrelevant visual attractors would actually be informative about the validity of the PEA model. Dr. Tsay has so much other that we recommend using a more to-the-point data set that could actually validate the PEA model.

      As the comments are repetitive at some places, we summarize them into three questions and address it one by one below:

      (1) Methodological Concerns about visual uncertainty estimation in Experiment 1: a) the visual uncertainty is measured in movement angles (degrees), while the unit in vision science is in visual angles (vda). This mismatch of unit hinders direct comparison between the found visual uncertainty and those reported in the literature, and b) a 1-second delay between movement endpoint and the reference marker presentation causes an overestimate of visual uncertainty due to potential degradation of visual memory. c) The linear function of visual uncertainty is a result of having only three perturbation sizes.

      a) As noted by the reviewer, our visual uncertainty is about cursor motion direction in the display plane, which has never been measured before. We do not think our data is comparable to any findings in visual science about fovea/peripheral comparison. We quoted Klein and others’ work Klein & Levi, 1987; Levi et al., 1987 in vision science since their studies showed that the deviation from the fixation is associated with the increase in visual uncertainty. Their study thus inspired our Exp1 to probe how our concerned visual uncertainty (specifically for visual motion direction) changes with an increasing deviation from the fixation. We believe that any model and its model parameters should be specifically tailored to the task or context it tries to emulate. In our case, motion direction in a center-out reaching setting is the modeled context, and all the relevant model parameters should be specified in movement angles.

      b) The 1s delay of the reference cursor appears to have minimum impact on the estimate of visual uncertainty, based on previous vision studies. Our Exp1 used a similar visual paradigm by White et al., 1992, which shows that delay does not lead to an increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6). We will add more methodology justifications in our revision.

      c) We agree that if more angles are tested we can be more confident about the linearity of visual uncertainty. However, the linear function is a good approximation of visual uncertainty (as shown in Figure 2C). More importantly, our model performance does not hinge on a strict linear function. Say, if it is a power function with an increasing slope, our model will still predict the major findings presented in the paper, as correctly pointed out by the reviewer. It is the increasing trend of visual uncertainty, which is completely overlooked by previous studies, that lead to various seemingly puzzling findings in implicit adaptation. Lastly, without assuming a linear function, we fitted the large dataset of motor adaptation from Exp2 to numerically estimate the visual uncertainty. This estimated visual uncertainty has a strong linear relationship with perturbation size (R = 0.991, p<0.001). In fact, the model-fitted visual uncertainty is very close to the values we obtained in Exp1. We now included this analysis in the revision. See details in Supplementary text 2 and Figure S7.

      (2) Experiment 3's: the reviewer argues that the Tsay et al., 2020 data does not accurately measure proprioceptive recalibration, thus it is not suitable for showing our model’s capacity in explaining proprioceptive changes during adaptation.

      Response: We agree that the data from Tsay et al., 2020 is not from passive localization, which is regarded as the widely-accepted method to measure proprioceptive recalibration, a recalibration effect in the sensory domain. The active localization, as used in Tsay et al., 2020, is hypothesized as closely related to people’s forward prediction (where people want to go as the reviewer put it in the comments). However, we want to emphasize that we never equated Tsay’s findings as proprioceptive recalibration: throughout the paper we call them “reported hand location”. We reserved “proprioceptive recalibration” to our own Exp3, which used a passive localization method. Thus, we are not guilty of using this term. Secondly, as far as we know, localization bias or changes, no matter measured by passive or active methods, have not been formally modeled quantitatively. We believe our model can explain both, at least in the error-clamp adaptation setting here. Exp3 is for passive localization, the proprioceptive bias is caused by the biasing effect from the just-perceived hand location (X_hand_hat) from the adaptation trial. Tsay et al. 2020 data is for active localization, whose bias shows a characteristic change from negative to positive. This can be explained by just-perceived hand location (X_hand_hat again) and a gradually-adapting hand (X_p). We think this is a significant advance in the realm of proprioceptive changes in adaptation. Of course, our idea can be further tested in other task conditions, e.g., conventional visuomotor rotation or even gain adaptation, which should be left for future studies.

      For technical concerns, Tsay et al., 2020 data set is not ideal: when reporting hand location, the participants view the reporting wheel as well as the original target. As correctly pointed out by the reviewer, the presence of the target might provide an anchoring cue for perceptual judgment, which acts as an attractor for localization. If it were the case, our cue combination would predict that this extra attractor effect would lead to a smaller proprioceptive effect than that is currently reported in their paper. The initial negative bias will be closer to the target (zero), and the later positive bias will be closer to the target too. However, the main trend will remain, i.e. the reported hand location would still show the characteristic negative-to-positive change. The attractor effect of the target can be readily modeled by giving less weight to the just-perceived hand location (X_hand_hat). Thus, we would like to keep Tsay et al., 2020 data in our paper but add some explanations of the limitations of this dataset as well as how the model would fare with these limitations.

      That being said, our model can explain away both passive and active localization during implicit adaptation elicited by error clamp. The dataset from Tsay et al., 2021 paper is not a good substitute for their 2020 paper in terms of modeling, since that study interleaved some blocks of passive localization trials with adaptation trials. This kind of block design would lead to forgetting of both adaptation (Xp in our model) and the perceived hand (X_hand_hat in our model), the latter is still not considered in our model yet. As our Exp3, which also used passive localization, shows, the influence of the perceived hand on proprioceptive bias is short-lived, up to three trials without adaptation trials. Of course, it would be of great interest to design future studies to study how the proprioceptive bias changes over time, and how its temporal changes relate to the perceptual error. Our model provides a testbed to move forward in this direction.

      (3) The reviewer raises concerns about the study's assumption that participants ignore error feedback, questioning the model's applicability to broader contexts and real-world scenarios where ignoring errors might not be viable or common.

      Reviewer 2 raised the same question above. We moved our responses here. “We appreciate your suggestion to broaden the discussion about the model's applicability beyond the visuomotor rotation paradigm, a point we acknowledge was not sufficiently explored in our initial discussion.

      Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating movement feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.

      We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least should consider using the two principles we laid out above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.”

      We also add one more important implication of our model: as stated above, our model also explains that the proprioceptive changes, revealed by active or passive localization methods, are brought by (mis)perceived hand localization via Bayesian cue combination. This new insight, though only tested here using the error-clamp paradigm, can be further utilized in other domains, e.g., conventional visuomotor rotation or force field adaptation. We hope this serves as an initial endeavor in developing some computational models for proprioception studies. Please see the extended discussion on this matter in the revision.

      Recommendations for the authors:

      Revisions:

      All three reviewers were positive about the work and have provided a set of concrete and well-aligned suggestions, which the authors should address in a revised version of the article. These are listed below.

      A few points of particular note:

      (1) There are a lot of discussions about the actual interpretation of behavioral data from this paper or past papers (Tsay et al. 2020 or 2021) and how it matches the different variables of the model.

      (2) There are some discussions on the results of the first experiment, both in terms of how it is reported (providing degrees of visual angle) and how it is different than previous results (importance of the point of fixation). We suggest also discussing a few papers on eye movements during motor adaptation from the last years (work of Anouk de Brouwer and Opher Donchin). Could the authors also discuss why they found opposite results to that of previous visual uncertainty studies (i.e., visual uncertainty attenuates learning with large, but not small, visual errors); rather than the other way around as in Burge et al and Tsay et al 2021 and Makino Nozaki 2023 (where visual uncertainty attenuates small, but not large, visual errors).

      (3) It is recommended by several reviewers to discuss the applicability of the model to other areas/perturbations.

      (4) Several reviewers and I believe that the impact of the paper would be much higher if the code to reproduce all the simulations of the model is made available to the readers. In addition, while I am very positive about the fact that the authors shared the data of their experiments, metadata seems to be missing while they are highly important because these data are otherwise useless.

      Thank you for the concise summary of the reviewers’ comments. We have addressed their concerns point by point.

      Reviewer #2 (Recommendations For The Authors):

      L142: The linear increase in visual uncertainty should be substantiated by previous research in vision science. Please cite relevant papers and discuss why the linear model is considered reasonable.

      We cited relevant studies in vision science. Their focus is more about eccentricity inflate visual uncertainty, similar to our findings that deviations from the fixation direction inflate visual uncertainty about motion direction.

      We also want to add that our model performance does not hinge on a strict linear function of visual uncertainty. Say, if it is a power function with an increasing slope, our model will still predict the major findings presented in the paper. It is the increasing trend of visual uncertainty, which is completely overlooked by previous studies, that lead to various seemingly puzzling findings in implicit adaptation. Furthermore, without assuming a linear function, we fitted the large dataset of motor adaptation from Exp2 to numerically estimate the visual uncertainty. This estimated visual uncertainty has a strong linear relationship with perturbation size (R = 0.991, p<0.001). In fact, the model-fitted visual uncertainty is very close to the values we obtained in Exp1. We now included this new analysis in the revision. See details in Supplementary text 2 and Figure S7.

      L300: I found it challenging to understand the basis for this conclusion. Additional explanatory support is required.

      We unpacked this concluding sentence as follows:

      “The observed proprioceptive bias is formally modeled as a result of the biasing effect of the perceived hand estimate x_hand_hat. In our mini-block of passive localization, the participants neither actively moved nor received any cursor perturbations for three trials in a row. Thus, the fact that the measured proprioceptive bias is reduced to nearly zero at the third trial suggests that the effect of perceived hand estimate x_hand_hat decays rather rapidly.”

      L331: For the general reader, a visual representation of what the blurring mask looks like would be beneficial.

      Thanks for the nice suggestion. We added pictures of a clear and a blurred cursor in Figure 5D.

      L390: This speculation is intriguing. It would be helpful if the authors explained why they consider causal inference to operate at an explicit process level, as the reasoning is not clear here, although the idea seems plausible.

      Indeed, our tentative conclusion here is only based on the model comparison results here. It is still possible that causal inference also work for implicit adaptation besides explicit adaptation. We make a more modest conclusion in the revision:

      “The casual inference model is also based on Bayesian principle, then why does it fail to account for the implicit adaptation? We postulate that the failure of the causal inference model is due to its neglect of visual uncertainty as a function of perturbation size, as we revealed in Experiment 1. In fact, previous studies that advocating the Bayesian principle in motor adaptation have largely focused on experimentally manipulating sensory cue uncertainty to observe its effects on adaptation (Burge et al., 2008; He et al., 2016; Körding & Wolpert, 2004; Wei & Körding, 2010), similar to our Experiment 4. Our findings suggest that causal inference of perturbation alone, without incorporating visual uncertainty, cannot fully account for the diverse findings in implicit adaptation. The increase in visual uncertainty by perturbation size is substantial: our Experiment 1 yielded an approximate seven-fold increase from a 4° perturbation to a 64° perturbation. We have attributed this to the fact that people fixate in the desired movement direction during movements. Interestingly, even for conventional visuomotor rotation paradigm where people are required to “control” the perturbed cursor, their fixation is also on the desired direction, not on the cursor itself (de Brouwer, Albaghdadi, et al., 2018; de Brouwer, Gallivan, et al., 2018). Thus, we postulate that a similar hike in visual uncertainty in other “free-viewing” perturbation paradigms. Future studies are warranted to extend our PEA model to account for implicit adaptation in other perturbation paradigms.”

      L789: The method of estimating Sigma_hand in the brain was unclear. Since Bayesian computation relies on the magnitude of noise, the cognitive system must have estimates of this noise. While vision and proprioception noise might be directly inferred from signals, the noise of the hand could be deduced from the integration of these observations or an internal model estimate. This process of estimating noise magnitude is theorized in recursive Bayesian integration models (or Kalman filtering), where the size estimate of the state noise (sigma_hand) is updated concurrently with the state estimate (x_hand hat). The equation in L789 and the subsequent explanation appear to assume a static model of noise estimation. However, in practice, the noise parameters, including Sigma_hand, are likely dynamic and updated with each new observation. A more detailed explanation of how Sigma_hand is estimated and its role in the cognitive process.

      This is a great comment. In fact, if a Kalman filter is used, the learning rate and the state noise all should be dynamically updated on each trial, under the influence of the observed (x_v). In fact, most adaptation models assume a constant learning rate, including our model here. But a dynamic learning rate (B in our model) is something worth trying. However, in our error-clamp setting, x_v is a constant, thus this observation variable cannot dynamically update the Kalman filter; that’s why we opt to use a “static” Bayesian model to explain our datasets. Thus, Sigma_hand can be estimated by using Bayesian principles as a function of three cues available, i.e., the proprioceptive cue, the visual cue, and the motor prediction cue. We added a

      detailed derivation of sigma_hand in the revision in Supplementary text 1.

      Reviewer #3 (Recommendations For The Authors):

      We observed values in Fig 2C for the 64-degree perturbation that seem to be outliers, i.e., greater than 50 degrees. It is unclear how a psychometric curve could have a "slope" or JNP of over 60, especially considering that the tested range was only 60. Since the data plotted in panel C is a collapse of the signed data in panel B, it is perplexing how such large data points were derived, particularly when the signed uncertainty values do not appear to exceed 30.

      Related to the previous point, we would also recommend connecting individual data points: if the uncertainty increases (linearly or otherwise), then people with low uncertainty at the middle distance should also have low uncertainty at the high distance, and people with high uncertainty at one point, should also have that at other distances. Or perhaps the best way to go about this is to use the uncertainty at the two smaller perturbations to predict uncertainty at the largest perturbation for each participant individually?

      Thank you for your suggestion to examine the consistency of individual levels of visual uncertainty across perturbation sizes. First, a sigma_v of 60 degrees is well possible, naturally falling out of the experimental data. It shows some individuals indeed have large visual uncertainty. Given these potential outliers (which should not be readily removed as we don’t have any reason to do so), we estimated the linear function of sigma_v with a robust method, i.e., the GLM with a gamma distribution, which favors right-skewed distribution that can well capture positive outliers. Furthermore, we added in our revision a verification test of our estimates of sigma_v: we used Exp2’s adaptation data to estimate sigma_v without assuming its linear dependency. As shown, the model-fitted sigma_v closely matched the estimated ones from Exp1 (see Supplementary text 2 and Figure S7).

      We re-plotted the sigma_v with connected data points provided, and the data clearly indicate that individuals exhibit consistent levels of visual uncertainty across different perturbation sizes, i.e. those with relatively lower uncertainty at middle distances (in fact, angles) tend to exhibit relatively lower uncertainty at higher distances too, and similarly, those with higher uncertainty at one distance maintain that level of uncertainty at other distances. This is confirmed by spearman correlation analysis to assess the consistency of uncertainties across different degrees of perturbation among individuals. Again, we observed significant correlations between perturbation angles, indicating good individual consistency (4 and 16 degrees, rho = 0.759, p<0.001; 16 and 64 degrees, rho = 0.527, p = 0.026).

      Author response image 4.

      The illustration in Fig 2A does not seem to show a stimulus that is actually used in the experiment (looks like about -30{degree sign} perturbation). It would be good to show all possible endpoints with all other visual elements to scale - including the start-points of the PEST procedure.

      Thanks for the suggestion. We updated Fig 2A to show a stimulus of +16 degree, as well as added an additional panel to show all the possible endpoints.

      Finally (related to the previous point), in lines 589-591 it says the target is a blue cross. Then in lines 614-616, it says participants are to fixate the blue cross or the start position. The start position was supposed to have disappeared, so perhaps the blue plus moved to the start position (which could be the case, when looking at the bottom panel in Fig 2A, although in the illustration the plus did not move fully to the start position, just toward it to some degree). Perhaps the descriptions need to be clarified, or it should be explained why people had to make an eye movement before giving their judgments. And if people could have made either 1) no eye movement, but stayed at fixation, 2) moved to the blue plus as shown in the last panel in Fig 2A, or 3) fixated on the home position, we'd be curious to know if this affected participants' judgments.

      Thanks for pointing that out. The blue cross serves as the target in the movement task, then disappears with the cursor after 800ms of frozen time. The blue cross then appeared in the discrimination task at the center of the screen, i.e. the start location. Subjects were asked to fixate at the blue cross during the visual discrimination task. Note this return the fixation to the home position is exactly what we will see in typical error-clamp adaptation: once the movement is over, people guided their hand back to the home position. We performed a pilot study to record the typical fixation pattern during error-clamp adaptation, and Exp1 was intentionally designed to mimic its fixation sequence. We have now updated the description of Figure 2A, emphasizing the stimulus sequence. .

      In Figure 4A, the label "bias" is confusing as that is used for recalibrated proprioceptive sense of hand position as well as other kinds of biases elsewhere in the paper. What seems to be meant is the integrated hand position (x-hat_hand?) where all three signals are apparently combined. The label should be changed and/or it should be clarified in the caption.

      Thanks for pointing that out, it should be x_hand_hat, and we have corrected this in the revised version of Figure 4.

      In the introduction, it is claimed that larger perturbations have not been tested with "implicit adaptation" paradigms, but in the same sentence, a paper is cited (Moorehead et al., 2017) that tests a rotation on the same order of magnitude as the largest one tested here (95{degree sign}), as well as much larger rotations (135{degree sign} and 175{degree sign}). With error-clamps. Interestingly, there is no adaptation in those conditions, which seems more in line with the sensory cue integration model. Can the PEA model explain these results as well? If so, this should be included in the paper, and if not, it should be discussed as a limitation.

      First, we double checked our manuscript and found that we never claimed that larger perturbations had not been tested.

      We agree that it is always good to have as many conditions as possible. However, the 135 and 175 degree conditions would lead to minimum adaptation, which would not help much in terms of model testing. We postulated that this lack of adaptation is simply due to the fact that people cannot see the moving cursor, or some other unknown reasons. Our simple model is not designed to cover those kinds of extreme cases.

      Specify the size of the arc used for the proprioceptive tests in Exp 3 and describe the starting location of the indicator (controlled by the left hand). Ideally, the starting location should have varied across trials to avoid systematic bias.

      Thank you for the comments. The size of the arc used during these tests, as detailed in the methods section of our paper, features a ring with a 10 cm radius centered at the start position. This setup is visually represented as a red arc in Figure 7B.

      After completing each proprioceptive test trial, participants were instructed to position the indicator at approximately -180° on the arc and then relax their left arm. Although the starting location for the subsequent trial remained at-180°, it was not identical for every trial, thereby introducing slight variability.

      Please confirm that the proprioceptive biases plotted in Fig 4E are relative to the baseline.

      Thank you for bringing this to our attention. Yes, the proprioceptive biases illustrated in Figure 4E are indeed calculated relative to the baseline measurements. We have added this in the method part.

      Data availability: the data are available online, but there are some ways this can be improved. First, it would be better to use an open data format, instead of the closed, proprietary format currently used. Second, there is no explanation for what's in the data, other than the labels. (What are the units? What preprocessing was done?) Third, no code is made available, which would be useful for a computational model. Although rewriting the analyses in a non-proprietary language (to increase accessibility) is not a reasonable request at this point in the project, I'd encourage it for future projects. But perhaps Python, R, or Julia code that implements the model could be made available as a notebook of sorts so that other labs could look at (build on) the model starting with correct code - increasing the potential impact of this work.

      Great suggestions. We are also fully supportive of open data and open science. We now:

      (1) Updated our data and code repository to include the experimental data in an open data format (.csv) for broader accessibility.

      (2) The data are now accompanied by detailed descriptions to clarify their contents.

      (3) We have made the original MATLAB (.m) codes for data analysis, model fitting and simulation available online.

      (4) We also provide the codes in Jupyter Notebook (.ipynb) formats.

      These updates can be found in the revised “Data Availability” section of our manuscript.

      References

      Bromberg, Z., Donchin, O., & Haar, S. (2019). Eye Movements during Visuomotor Adaptation Represent Only Part of the Explicit Learning. eNeuro, 6(6). https://doi.org/10.1523/ENEURO.0308-19.2019

      Burge, J., Ernst, M. O., & Banks, M. S. (2008). The statistical determinants of adaptation rate in human reaching. Journal of Vision, 8(4), 1–19.

      de Brouwer, A. J., Gallivan, J. P., & Flanagan, J. R. (2018). Visuomotor feedback gains are modulated by gaze position. Journal of Neurophysiology, 120(5), 2522–2531.

      Egly, R., & Homa, D. (1984). Sensitization of the visual field. Journal of Experimental Psychology. Human Perception and Performance, 10(6), 778–793.

      Kim, H. E., Parvin, D. E., & Ivry, R. B. (2019). The influence of task outcome on implicit motor learning. eLife, 8. https://doi.org/10.7554/eLife.39882

      Klein, S. A., & Levi, D. M. (1987). Position sense of the peripheral retina. JOSA A, 4(8), 1543–1553.

      Levi, D. M., Klein, S. A., & Yap, Y. L. (1987). Positional uncertainty in peripheral and amblyopic vision. Vision Research, 27(4), 581–597.

      Makino, Y., Hayashi, T., & Nozaki, D. (2023). Divisively normalized neuronal processing of uncertain visual feedback for visuomotor learning. Communications Biology, 6(1), 1286.

      Owsley, C., Ball, K., & Keeton, D. M. (1995). Relationship between visual sensitivity and target localization in older adults. Vision Research, 35(4), 579–587.

      Simani, M. C., McGuire, L. M. M., & Sabes, P. N. (2007). Visual-shift adaptation is composed of separable sensory and task-dependent effects. Journal of Neurophysiology, 98(5), 2827–2841.

      Tsay, J. S., Avraham, G., Kim, H. E., Parvin, D. E., Wang, Z., & Ivry, R. B. (2021). The effect of visual uncertainty on implicit motor adaptation. Journal of Neurophysiology, 125(1), 12–22.

      Tsay, J. S., Chandy, A. M., Chua, R., Miall, R. C., Cole, J., Farnè, A., Ivry, R. B., & Sarlegna, F. R. (2024). Minimal impact of proprioceptive loss on implicit sensorimotor adaptation and perceived movement outcome. bioRxiv : The Preprint Server for Biology. https://doi.org/10.1101/2023.01.19.524726

      Tsay, J. S., Kim, H., Haith, A. M., & Ivry, R. B. (2022). Understanding implicit sensorimotor adaptation as a process of proprioceptive re-alignment. eLife, 11, e76639.

      Wei, K., Stevenson, I. H., & Körding, K. P. (2010). The uncertainty associated with visual flow fields and their influence on postural sway: Weber’s law suffices to explain the nonlinearity of vection. Journal of Vision, 10(14), 4.

      White, J. M., Levi, D. M., & Aitsebaomo, A. P. (1992). Spatial localization without visual references. Vision Research, 32(3), 513–526.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      …I find the concept and execution of the study very interesting and elegant. The paper is also commendably clear and readable. The differences between primary and higher cortex are compelling and I am largely convinced by the authors' claim that they have found evidence that broadly supports a mixed selectivity model of neural disentanglement along the lines of Rigotti et al (2013). I think that the increasing body of evidence for these kinds of representations is a significant development in our understanding of higher sensory representations. I also think that the dDR method is likely to be useful to researchers in a variety of fields who are looking to perform similar types of neural decoding analysis.

      Thanks! We agree that questions around population coding and high-level representations are critical in the field of sensory systems.

      Reviewer #2 (Public Review):

      ... This is a well-carried out study with thoughtful analyses which in large part achieves its aims to evaluate how task-engagement changes neural activity across multiple auditory regions. As with all work, there are several caveats or areas for future study/analysis. First, the sounds used here (tones, and narrow-band noise) are relatively simple sounds; previous work suggests that exactly what activity is observed within each region (e.g., sensory only, decision-related, etc) may depend in part upon what stimuli are used. Therefore, while the current study adds importantly to the literature, future work may consider the use of more varied stimuli. Second, the animals here were engaged in a behavioral task; but apart from an initial calculation of behavioral d', the task performance (and its effect on neural activity) is largely unaddressed.

      The reviewer makes several important points that we hope we addressed in the specific changes detailed below. Indeed, it is important to recognize the possibility that the specific stimuli involved in a task may interact with the effects of behavioral state and that variability in task performance should be considered as an important aspect of behavioral state.

      Reviewer #1 (Recommendations For The Authors):

      I have a few minor comments and criticisms:

      (1) Figure 1c. The choice of low-contrast grey text (e.g. "Target vs. target" is unfortunate, especially when printed, and should be replaced (e.g. with dark grey).

      We have edited the figure to use a higher contrast (dark grey). Thanks for catching this.

      (2) Figure 2 and Supplementary Figure 3. I think some indication of error or significance is required in all panels. Without this, it's hard to interpret any of these panels.

      Thank you for this feedback. Including significance here was clarifying and helps to strengthen our claim that state-dependent changes in neural activity were smaller and more diverse for single neurons than at the population level. We modified Figure 2b-c to indicate whether each neuron’s response to the target stimulus was significantly different than its response to the catch stimulus. The same test was performed in Supplementary Figure 3. Additionally, we added a statistical test in Figure 2d-e to indicate, for each pair of target/catch stimuli, whether discrimination (d-prime) changed significantly between active and passive conditions. Furthermore, we modified the text of the second paragraph under the results heading: “Diverse effects of task engagement on single neurons in primary and non-primary auditory cortex” to reference and interpret the results of these significance tests. The new text reads as follows (L. 121):

      “Sound-evoked spiking activity was compared between active and passive states to study the impact of task engagement on sound representation. In both A1 and dPEG, responses to target and catch stimuli were significantly discriminable for a subset of single neurons (about 25% in both areas, Figure 2A-C, Supplemental Figures 3-5, bootstrap test). This supports the idea that stimulus identity can be decoded in both brain regions, regardless of task performance. However, the fact that the responses of most neurons in both brain areas could not significantly discriminate target vs. catch stimuli also highlights the diversity of sound encoding observed at the level of single neurons. The accuracy of catch vs. target discrimination for each neuron was quantified using neural d-prime, the z-scored difference in target minus catch spiking response for each neuron (Methods: Single neuron PSTHs and d-prime (Niwa et al., 2012a)). Task engagement was associated with significant changes in catch vs. target d-prime for roughly 10% of neurons in both A1 (40 / 481 neurons, bootstrap test) and dPEG (33 / 377 neurons, bootstrap test). This included neurons that both increased their discriminability and decreased their discriminability (Figure 2D-E). Thus, the effects of task engagement at the level of single neurons were relatively mild and inconsistent across the population; many neurons showed no significant change and of those that did, effects were bidirectional (Figure 2D-E).”

      We also included an additional methods paragraph in the “Statistical tests” section to describe the bootstrapping procedure used for these significance tests (L. 644):

      “The one exception to this general approach is in Figure 2, where we analyzed the sound discrimination abilities of single neurons. In this case, we computed p-values for each neuron and stimulus independently. First, for each neuron and catch vs. target stimulus pair, we measured d-prime (see Methods: Single neuron evoked activity and d-prime). We generated a null distribution of d-prime values for each neuron-stimulus pair, under each experimental condition by shuffling stimulus identity across trials before computing d-prime (100 resamples). A neuron was determined to have a significant d-prime for a given target vs. catch pair if its actual measured d-prime was greater than the 95th percentile of the null d-prime distribution. Second, for each neuron and catch vs. target stimulus pair, we tested if d-prime was significantly different between active and passive conditions. To test this, we followed a similar procedure as above, however, rather than shuffle stimulus identity, we shuffled active vs. passive trial labels. This allowed us to generate a null distribution of active vs. passive d-prime difference for each neuron and stimulus pair. A neuron was determined to have a significant change in d-prime between conditions if the actual Δ d-prime lay outside the 95% confidence interval of the null Δ d-prime distribution.”

      For Figure 2a, we chose not to indicate significance on the figure to avoid clutter, since the significance for all neurons in the population are shown in panels b-c anyway. Additionally, the difference plot shown in panel a is in units of z-scores, which we believe already gives a raw sense of the significance of the target vs. catch response change per neuron in this example dataset.

      (3) Figure 2 and Supplementary Figure 3. I would consider including some more examples as a Supplementary Figure (and perhaps combining Supp Fig 3 with Fig 2 as a main figure).

      We found no significant or apparent difference in single-neuron properties between A1 and dPEG. Therefore, we decided it is not helpful to plot both A1 and PEG examples in the main text. However, we agree that the ability to see more examples of the raw data could be useful. Therefore, we compiled two supplementary figures (Supplementary Figures 4 and 5) that replicate Figure 2a for all datasets, encompassing A1 and PEG.

      (4) Figure 2a and Supp Fig 3a. I was initially confused that the "delta-spk/sec (z-score)" values had themselves been z-scored, but now I think that they are simply the differences of the two left hand sub-panels. This could be made clear in the figure legend.

      The figure legends have been modified to state the procedure for computing “delta-spk/sec” more clearly. Specifically, we added the following information to the legend (L. 141):

      “Difference is computed as the z-scored response to the target minus the z-scored catch response (resulting in a difference shown in units of z-score).”

      (5) Figure 2b-e and Supp Fig 3b-e. Indicate the time window over which the responses were measured, and the number of neurons.

      Figure legends have been modified to include a sentence clearly stating the time window over which responses were measured. The number of neurons is also now included in the legend and on the figure itself. Furthermore, a brief description of the new statistical testing procedure has been added here (L. 144).

      “Responses were defined as the total number of spikes recorded during the 300 ms of sound presentation (area between dashed lines in panel A). Neurons with a significantly different response to the catch vs. target stimulus are indicated in black and quantified on the respective figure panel.”

      (6) Figure 2. "singe" should read "single"

      Typo in figure label has been fixed.

      (7) Line 144. Figure number is missing (Figure 3B-C).

      The missing figure number has been added to the text.

      (8) Figure 3. Again, the low-contrast grey should be replaced.

      The low-contrast grey has been replaced with dark grey.

      Reviewer #2 (Recommendations For The Authors):

      This study really nicely compares the activity and effects on activity in two areas of the auditory cortex in respect to task-engagement; I think it is, for the most part, very well done.

      A couple of specific recommendations:

      (1) Although I understand 'inf dB' as the SNR, including the actual dB level used in the experiments, would be useful, especially in the case of the inf dB.

      Thank you for this feedback. We agree that clarification about the overall sound level used here would be helpful. We have modified the methods section “Behavioral paradigm” to include the following sentence (L. 450):

      “That is, the masking noise (and distractor stimuli) were always presented with an overall sound level of 60 dB SPL. Infinite (inf) dB trials corresponded to trials where the target tone was presented at 60 dB SPL without any masking noise present, 0 dB to trials where the target was 60 dB SPL, -5 dB to trials where the target was presented at 55 dB SPL etc.”

      In addition, we have modified the main text (L. 82):

      “Animals reported the occurrence of a target tone in a sequence of narrowband noise distractors by licking a piezo spout (Figure 1A, Methods: Behavioral paradigm, distractor stimulus sound level: 60 dB SPL). … We describe SNR as the overall SPL of the target relative to distractor noise level. Thus, an SNR of –5 dB corresponds to a target level of 55 dB SPL while an Inf dB SNR corresponds to a target tone presented without any masking noise.”

      And Figure legend 1 now explicitly states the sound level used in the experiments (L. 104):

      “Variable SNR was achieved by varying overall SPL of the target relative to the fixed (60 dB SPL) distractor noise, e.g., -5 dB SNR corresponds to a 55 dB SPL target with 60 dB SPL masking noise. Infinite (inf) dB SNR corresponds to a target tone presented in isolation (60 dB SPL).”

      (2) I very much appreciate the attempt to disentangle task engagement from generalized arousal state, and specifically, addressing this through the use of pupillometry. However, by focusing the discussion of pupil dynamics solely on the arousal-state aspects of pupil size, the paper doesn't address the increasing evidence suggests that pupil size may fluctuate based upon a lot of other things, including perceptual events (see Kronemer et al, 2022 for a recent human paper; for auditory: Zekveld et al 2018 (review) and Montes-Lourido et al, 2021; but many many others, too). It would be nice to see either a bit more nuanced discussion of what pupil size may be indicating (easier), or analyzing the behavior in the context of pupil dynamics (a heavier lift).

      This is a good point. We agree that it is worth mentioning these more nuanced aspects of cognition that may be reflected by pupil size. Therefore, we also analyzed pupil size in the context of behavioral performance (see Supplemental Figure 6) and added the following text to the results (L. 193).

      “In addition to reflecting overall arousal level, pupil size has also been reported to reflect more nuanced cognitive variables such as, for example, listening effort (Zekveld et al., 2014). Furthermore, rodent data suggests that optimal sensory detection is associated with intermediate pupil size (McGinley et al., 2015), consistent with the hypothesis of an inverted-U relationship between arousal and behavioral performance (Zekveld et al., 2014). To determine if this pattern was true for the animals in our task, we measured the dynamics of pupil size in the context of behavioral performance. Across animals, task stimuli evoked robust pupil dilation that varied with trial outcome (Supplemental Figure 6b-c). Notably, pre-trial pupil size was significantly different between correct (hit and correct reject), hit, and miss trials (Supplemental Figure 6b-c), recapitulating the finding of an inverted-U relationship to performance in rodents (McGinley et al., 2015).  Since we focused only on correct trials in our decoding analysis, these outcome-dependent differences in pupil size are unlikely to contribute to the emergent decoding selectivity in dPEG.”

      (3) I think it would make this paper shine that much more if behavioral performance were not subsumed into the overall label of task engagement. You've already established you have performance that varies as a function of SNR; I would love to see the neural d' and covariability related to the behavioral d' (in the comparisons where this is possible). I would also love to see a more direct measure of choice for those stimuli that show variable behavior (e.g., a choice probability analysis or something of the like would seem to be easily applied to the target SNRs of -5 and 0 dB); and compare task engaged activity of hits vs misses vs passive listening to those same stimuli. You discuss previous studies looking at choice-related/decision-related activity and draw parallels to this work-given that there is the opportunity with this data set to *directly* assess choice-related activity, the absence of such an analysis seems like a missed opportunity.

      Thank you for this feedback. We agree that “task engagement” is not a unimodal state and that a more fine-grained analysis of task-engaged neural activity, according to behavioral choice, could be informative.

      First, we would like to point out that in Figure 4 we did already compare behavioral d’ to delta neural d’. We found that the two were significantly correlated in dPEG, but not in A1. This suggests that task-dependent changes in stimulus decoding in dPEG, but not A1, are predictive of behavioral performance. This is consistent with the finding that task-relevant stimulus representations were selectively enhanced in dPEG, but not in A1.

      Second, we added a choice decoding analysis to address whether auditory cortex represents the animal’s choice in our task. The results of this analysis are summarized in Supplemental Figure 8 and are discussed under the results section: “Behavioral performance is correlated with neural coding changes in non-primary auditory cortex only.” (L. 226):

      “The previous analysis suggests that the task-dependent increase in stimulus information present in dPEG population activity is predictive of overall task performance. Next, we asked whether the population activity in either brain region was directly predictive of behavioral choice on single hit vs. miss trials. To do this, we conducted a choice probability analysis (Methods). We found that in both brain regions choice could be decoded well above chance level (Supplemental Figure 8). Choice information was present throughout the entire trial and did not increase during the target stimulus presentation. This suggests that the difference in population activity primarily reflects a cognitive state associated with the probability of licking on a given trial, or “impulsivity” rather than “choice.” This interpretation is consistent with our finding that baseline pupil size on each trial is predictive of trial outcome (Supplemental Figure 6b).”

      To keep our decoding approach consistent throughout the manuscript, we followed the same approach for choice decoding as we did for stimulus decoding (perform dDR then calculate neural d-prime in the dimensionality reduced space). To make the results more interpretable, we converted choice d-prime to a choice probability (percent correctly decoded choices) using leave-one-out cross validation. (We note that d-prime and percent correct are very highly correlated statistics.) This is described in the methods as follows (L. 550):

      “We performed a choice decoding analysis on hit vs. miss trials. We followed the same procedure as described above for stimulus decoding, where instead of a pair of stimuli our two classes to be decoded were “hit trial” vs. “miss trial”. That is, for each target stimulus we computed the optimal linear discrimination axis separating hit vs. miss trials (Abbott and Dayan, 1999) in the reduced dimensionality space identified with dDR (Heller and David, 2022). For the sake of interpretability with respect to previous work we reported choice probability as the percentage of correctly decoded trial outcomes rather than d-prime. Percent correct was calculated by projecting the population activity onto the optimal discrimination axis and using leave-one-out cross validation to measure the number of correct classifications.”

      (4) It would also be interesting to look at population coding across sessions (although the point is taken that within a session allows the opportunity to assess covariability). Minorly self-servingly but very much related to the above point, Christison-Lagay et al, 2017 employed a similar detect-in-noise task, analyzed single neurons and population level activity, and looked at putative choice-related activity. The current study has the opportunity to expand on that kind of analysis that much more by looking across multiple sites vs within a given recording site; and compare across regions.

      Thank you for highlighting this point, we agree that it is important. When studying population coding it is critical to consider the impact of covariability between neurons. Therefore, it is worthwhile to revisit our interpretations of prior results, e.g., Christison-Lagay et al, 2017, which studied population coding by combining neurons across different sessions, given that we now have access to simultaneously recorded population data.

      First, we would like to point out that this was the primary motivation for our simulation analyses presented in Figure 5. Using simulations, we found that task-dependent gain modulation (which can be observed across sessions) was sufficient to explain our primary finding – selective enhancement in decoding of behaviorally relevant sound stimuli in dPEG.

      Second, to address the question about how covariability affects choice-related information in auditory cortex and compare our findings with prior studies, we performed the same set of simulations for choice probability analysis. We found that, again, choice-dependent gain modulation was sufficient to explain our findings. That is, simulations with hit- vs. miss-dependent gain changes, but fixed covariability, closely mirrored the choice probability we observed in the raw data. An additional simulation where covariability between all neurons was set to zero also recapitulated our findings in the raw data. Collectively, this suggests that covariability does not play a significant role in shaping the choice information present in A1 and dPEG during this task. We have added the following text to the manuscript to summarize this finding (L. 293):

      “Finally, we used the same simulation approach to determine what aspects of population activity carry the “choice” related information we observed in A1 and dPEG (Figure 4 – figure supplement 1). Similar to our findings for stimulus decoding, we found that gain modulation alone was sufficient to recapitulate the choice information present in the raw data for this task. This helps frame prior work that pooled neurons across sessions to study population coding of choice in similar auditory discrimination tasks (Christison-Lagay et al, 2017).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors investigate the tolerance of aminoglycosides in E. coli mutants deleted in the Krebs cycle and respiratory chain enzymes. The motivation for this study is unclear. Transport of aminoglycosides is pmf-dependent, as the authors correctly note, and knocking out energy-producing components leads to tolerance of aminoglycosides, this has been well established. In S. aureus, clinically relevant "small colony" strains selected for in the course of therapy with aminoglycosides acquire null mutations in the biosynthesis of heme or ubiquinone, and have been studied in detail. In E. coli, such knockouts have not been reported in clinical isolates, probably due to severe fitness costs.

      Response: We sincerely appreciate the time and consideration the reviewer dedicated to evaluating our manuscript. It's important to highlight that while the transport of aminoglycosides is PMF-dependent, recent studies underscore the potential role of metabolic mutations in antibiotic tolerance, a facet that warrants further investigation. For instance, the study by Henimann’s and Michiels' groups explored genomic changes in E. coli strains (including uropathogenic UTI89 strains) subjected to daily antibiotic exposure (Van den Bergh et al., 2022). Notably, mutations predominantly occurred in genes of the nuo operon, a key component of E. coli energy metabolism, suggesting a link between metabolic adaptations and antibiotic tolerance. Furthermore, the research by Collin's group revealed previously unrecognized genes related to central metabolism (e.g., icd, gltD, sucA) that contribute to antibiotic resistance in E. coli cells exposed to multiple antibiotics, including aminoglycosides (Lopatkin et al., 2021). These findings are corroborated by the presence of similar mutations in clinical E. coli pathogens, as evidenced by the analysis of a large library of 7243 E. coli genomes from NCBI Pathogen Detection (Lopatkin et al., 2021). The clinical relevance of metabolic mutations in antibiotic tolerance is increasingly recognized, yet their underlying mechanisms remain enigmatic. Therefore, elucidating the role of metabolic pathways in conferring antibiotic tolerance is highly critical. We have updated the introduction to clearly convey our motivation in this study (see page 4).

      At the same time, single-cell analysis has shown that individual cells with a decrease in the expression of Krebs cycle enzymes are tolerant of antibiotics and have lower ATP (Manuse et al., PLoS Biol 19: e3001194). The authors of the study under review report that knocking out ICD, isocitrate dehydrogenase that catalyzes the rate-limiting step in the Krebs cycle, has little effect on aminoglycoside tolerance and actually leads to an increase in the level of ATP over time. This observation does not seem to make much sense and contradicts previous reports, specifically that E. coli ICD is tolerant of antibiotics and, not surprisingly, produces Less ATP (Kabir and Shimizu, Appl Micro-biol Biotechnol. 2004; 65(1):84-96; Manuse et al., PLoS Biol 19: e3001194). Mutations in other Krebs cycle enzymes, unlike ICD, do lead to a dramatic increase in tolerance of aminoglycosides according to the paper under review. This is all very confusing.

      Response: Although our data cannot be directly compared to that of Kabir and Shimizu (Mohiuddin Kabir and Shimizu, 2004), due to the utilization of entirely different experimental procedures and measurement techniques, we can draw some parallels to the study conducted by Lewis’ group (Manuse et al., 2021), despite certain differences in experimental protocols. Furthermore, the reviewer has made strong assertions regarding our manuscript based on the findings of Lewis’ group. Thus, we believe it's pertinent to expand our response regarding that study.

      In the study of Lewis’ group, bacterial cells were inoculated at a ratio of 1:100 into LB medium from an overnight culture (approximately 16 hours). Subsequently, the cultures were incubated at 37°C for approximately 2 hours, and ATP levels were measured using the BacTiter Glo kit (Promega, Madison, WI, USA). ATP levels were then normalized to cell density, determined through optical density measurements, and represented on a linear diagram. As demonstrated in Supplementary Figure S1c of their paper, there was a 10-15% reduction in normalized ATP levels in the icd mutant compared to the wild type. In our experiments, cells were grown for 24 hours in overnight cultures, diluted 100-fold in fresh media, and ATP levels were measured at 3, 4, 5, and 6 hours using the same kit. ATP levels were normalized to cell counts quantified by flow cytometry. Upon analyzing our data of the icd mutant for around 3 hours (the time point closest to that of the study of Lewis’ group), we observed a reduction of approximately 15-20% (without statistical significance) in the icd mutant compared to the wild-type (see raw data, linear plot, and logarithmic plot below; Author response image 1), which aligns with the findings of Lewis’ group.

      We further investigated the gentamicin tolerance of both wild-type and icd mutant strains of E. coli BW25113 (Author response image 2). Our findings indicate that the increased sensitivity of the icd mutant of the MG1655 strain to gentamicin is similar to the observation in the other E. coli strain.

      Author response image 1.

      ATP levels in the icd mutant. ATP levels of both the mutant and wild-type strains were measured at t=3 hours of cell growth and normalized to cell counts. The figure presents the raw data (a), linear plot (b), and logarithmic plot (c) of the same dataset. This data corresponds to the first panel of Figure 3B in the manuscript.

      Author response image 2.

      Gentamicin tolerance of wild-type and icd mutant strains of E. coli BW25113. Both wild type and mutant strains were treated with gentamicin (50 µg/ml) for 5 hours at the mid-exponential phase. Cells were plated before and after treatment for CFU/ml counts. The dashed line represents the limit of detection. CFU: Colony forming units.

      We think that there are two primary reasons why our study cannot contradict the findings of the Lewis group:

      Firstly, our study cannot be directly compared to theirs, as they did not comprehensively explore the impact of gene deletions on cell metabolism beyond the measurement of ATP levels at a single time point (Manuse et al., 2021). Our study encompasses various metabolic parameters such as cellular ATP, redox status, proton motive force (PMF), intracellular pH, and drug uptake throughout the exponential and/or early stationary phase. Additionally, we conducted proteomic analysis for five different strains including mutants and wild type. Moreover, we performed pathway enrichment analysis grounded in the statistical background of the entire genome, encompassing various functional pathway classification frameworks such as Gene Ontology annotations, KEGG pathways, and Uniprot keywords. The results of these pathway enrichment analyses are now available in the Supplementary File (see Supplementary Tables 11-17 in the current manuscript). Thus, we believe it is unjust to deem our study contradictory compared to the Lewis group's study, which does not have a comprehensive analysis of the metabolism of the mutant strains they investigated.

      Secondly, our study cannot be compared to that specific study (Manuse et al., 2021) due to the utilization of a distinct antibiotic (ciprofloxacin). Cell tolerance is heavily reliant on the mechanism of action of the antibiotic used. Therefore, the reviewer should have focused on studies closely related to aminoglycoside tolerance. Our study is not confusing or contradictory, as Lewis’ group also demonstrated that the tolerance of the icd mutant to gentamicin was significantly reduced while the tolerance of other TCA cycle mutant strains was increased in a different study (Shan et al., 2015). However, they did not delve into the metabolism of these mutant strains, as we did. We now mention this point in our manuscript (see pages 14-15).

      Apart from the confusing data, it is not clear what useful information may be obtained from the choice of the experimental system. The authors examine exponentially growing cells of E. coli for tolerance of aminoglycosides. The population at this stage of growth is highly susceptible to aminoglycosides, and only some rare persister cells can survive. However, the authors do not study persisters. A stationary population of E. coli is tolerant of aminoglycosides, and this is clinically relevant, but this is not the subject of the study.

      Response: Respectfully, we must express our disagreement with the reviewer's comments. Our experimental system is meticulously organized and logically structured. Mutant strains such as gltA, sucA, and nuoI deletions exhibit increased tolerance to all aminoglycosides tested, with their fractions clearly increasing around the mid-exponential phase between 3-4 hours (refer to Figure 2B in our manuscript). This surge in tolerance is evident at the population level as well (as depicted in Figure 1A in our manuscript, where certain mutant strains demonstrate complete survival to streptomycin, with survival fractions nearing 1). Given the pronounced increase observed around the mid-exponential phase, we primarily characterize the metabolism of these cells during this growth phase.

      It's essential to note that any investigation into antibiotic tolerance and/or resistance holds immense significance, regardless of the growth phase under scrutiny, as antibiotic tolerance/resistance poses a substantial healthcare challenge. Additionally, metabolic mutant strains do not necessarily entail severe fitness costs, as evidenced by Figure S2A published by the Lewis group (Manuse et al., 2021), a finding consistent with our study (see Figure 2B in our manuscript). This phenomenon could confer a survival advantage to bacterial cells, as they may acquire metabolic mutations to bolster their tolerance without incurring significant fitness costs. Furthermore, numerous studies suggest that bacterial cells may opt for the evolutionary pathway leading to increased tolerance before acquiring resistance mechanisms (Levin-Reisman et al., 2017; Santi et al., 2021). The presence of metabolic mutations in clinical E. coli pathogens has also been confirmed through the analysis of a large library of 7243 E. coli genomes from NCBI Pathogen Detection by Collin’s group (Lopatkin et al., 2021). Consequently, comprehending the tolerance mechanisms of metabolic mutations holds paramount importance.

      References

      Levin-Reisman I, Ronin I, Gefen O, Braniss I, Shoresh N, Balaban NQ. 2017. Antibiotic tolerance facilitates the evolution of resistance. Science (1979) 355:826–830. doi:10.1126/science.aaj2191

      Lopatkin AJ, Bening SC, Manson AL, Stokes JM, Kohanski MA, Badran AH, Earl AM, Cheney NJ, Yang JH, Collins JJ. 2021. Clinically relevant mutations in core metabolic genes confer antibiotic resistance. Science (1979) 371. doi:10.1126/science.aba0862

      Manuse S, Shan Y, Canas-Duarte SJ, Bakshi S, Sun WS, Mori H, Paulsson J, Lewis K. 2021. Bacterial persisters are a stochastically formed subpopulation of low-energy cells. PLoS Biol 19. doi:10.1371/journal.pbio.3001194

      Mohiuddin Kabir M, Shimizu K. 2004. Metabolic regulation analysis of icd-gene knockout Escherichia coli based on 2D electrophoresis with MALDI-TOF mass spectrometry and enzyme activity measurements. Appl Microbiol Biotechnol 65:84–96. doi:10.1007/s00253-004-1627-1

      Santi I, Manfredi P, Maffei E, Egli A, Jenal U. 2021. Evolution of Antibiotic Tolerance Shapes Resistance Development in Chronic Pseudomonas aeruginosa Infections. doi:10.1128/mBio.03482-20

      Shan Y, Lazinski D, Rowe S, Camilli A, Lewis K. 2015. Genetic basis of persister tolerance to aminoglycosides in Escherichia coli. mBio 6. doi:10.1128/mBio.00078-15

      Van den Bergh B, Schramke H, Michiels JE, Kimkes TEP, Radzikowski JL, Schimpf J, Vedelaar SR, Burschel S, Dewachter L, Lončar N, Schmidt A, Meijer T, Fauvart M, Friedrich T, Michiels J, Heinemann M. 2022. Mutations in respiratory complex I promote antibiotic persistence through alterations in intracellular acidity and protein synthesis. Nat Commun 13:546. doi:10.1038/s41467-022-28141-x

      Reviewer #2 (Public Review):

      Summary:

      This interesting study challenges a dogma regarding the link between bacterial metabolism decrease and tolerance to aminoglycosides (AG). The authors demonstrate that mutants well-known for being tolerant to AG, such as those of complexes I and II, are not so due to a decrease in the proton motive force (PMF) and thus antibiotic uptake, as previously reported in the literature.

      Strengths:

      This is a complete study. These results are surprising and are based on various read-outs, such as ATP levels, pH measurement, membrane potential, and the uptake of fluorophore-labeled gentamicin. Utilizing a proteomic approach, the authors show instead that in tolerant mutants, there is a decrease in the levels of proteins associated with ribosomes (targets of AG), causing tolerance.

      Response: We sincerely appreciate the reviewer for taking the time to read our manuscript and offer valuable suggestions.

      Weaknesses:

      The use of a single high concentration of aminoglycoside: my main comment on this study concerns the use of an AG concentration well above the MIC (50 µg/ml or 25 µg/ml for uptake experiments), which is 10 times higher than previously used concentrations (Kohanski, Taber) in study showing a link with PMF. This significant difference may explain the discrepancies in results. Indeed, a high concentration of AG can mask the effects of a metabolic disruption and lead to less specific uptake. However, this concentration highlights a second molecular level of tolerance. Adding experiments using lower concentrations (we propose 5 µg/ml to compare with the literature) would provide a more comprehensive understanding of AG tolerance mechanisms during a decrease in metabolism.

      Another suggestion would be to test iron limitation (using an iron chelator as DIP), which has been shown to induce AG tolerance. Can the authors demonstrate if this iron limitation leads to a decrease in ribosomal proteins? This experiment would validate their hypothesis in the case of a positive result. Otherwise, it would help distinguish two types of molecular mechanisms for AG tolerance during a metabolic disruption: (i) PMF and uptake at low concentrations, (ii) ribosomal proteins at high concentrations.

      Response: While we acknowledge the intriguing possibility of exploring whether iron limitation results in a reduction of ribosomal proteins, we believe that this topic falls slightly outside the scope of our current study. This area warrants independent investigation since our current research did not specifically focus on iron-limited environments (LB medium is iron-rich, as referenced (Abdul-tehrani et al., 1999; Rodríguez-Rojas et al., 2015)). However, we fully concur with the notion that experimental outcomes may be contingent upon the concentration of aminoglycosides (AG). Hence, we repeated the critical experiments using a lower concentration of gentamicin (5 µg/mL), as suggested by the reviewer. Before delving into a discussion of these results, we wish to emphasize two key points. Firstly, the majority of our metabolic measurements, including ATP levels, redox activities, intracellular pH, and metabolomics, were conducted in mutant and wild-type cells in the absence of drugs. Our objective was to elucidate the impact of genetic perturbations of the TCA cycle on cell metabolism. Secondly, it's important to emphasize that our study does not invalidate the hypothesis that AG uptake is proton motive force (PMF)-dependent. We observed similar drug uptake across the strains tested, which is reasonable considering that their energy metabolism and PMF are not significantly altered compared to the wild type (at least we did not observe a consistent trend in their metabolic levels). Consequently, our study does not necessarily contradict with previous claims (Taber Harry W et al., 1987). We have now clarified this point in the manuscript (see pages 1 and 13).

      When we employed a lower gentamicin concentration, we still noted a significant elevation in tolerance among the gltA, sucA, and nuoI mutant strains compared to the wild type. Also, it remained evident that the observed tolerance in the mutant strains cannot be ascribed to differences in drug uptake or impaired PMF, as the levels of drug uptake and the disruption of PMF by gentamicin (at lower concentrations) in the mutant strains were comparable to those of the wild type. Moreover, since our metabolic measurements and proteomics analyses failed to reveal any notable alterations in energy metabolism in these strains, the consistency in drug uptake levels across both mutant and wild-type strains, even at lower concentrations, further bolsters the validity of our findings obtained at higher gentamicin concentrations. The new results have been incorporated into the Supplementary file (see Supplementary Figures S1, S5, S7, and S9) and discussed throughout the manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Line 120: Luria-Bertani (LB), used Lysogeny Broth.

      Line 180: "RSG dye can be reduced by bacterial reductases of PMF" to be reformulated.

      Response: The suggested corrections have been incorporated into the manuscript.

      References

      Abdul-tehrani H, Hudson AJ, Chang Y, Timms AR, Hawkins C, Williams JM, Harrison PM, Guest JR, Andrews SC. 1999. Ferritin Mutants of Escherichia coli Are Iron Deficient and Growth Impaired, and fur Mutants are Iron Deficient, Journal of Bacteriology.

      Rodríguez-Rojas A, Makarova O, Müller U, Rolff J. 2015. Cationic Peptides Facilitate Iron-induced Mutagenesis in Bacteria. PLoS Genet 11. doi:10.1371/journal.pgen.1005546

      Taber Harry W, Mueller JP, Miller PF, Arrow AS. 1987. Bacterial Uptake of Aminoglycoside Antibiotics. Microbiol Rev 51:439–457. doi:10.1128/mr.51.4.439-457.1987

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presents a solid and generally convincing set of experiments to address the question of whether the lateral parafacial area (pFL) is active in controlling active expiration, which is particularly important in patient populations that rely on active exhalation to maintain breathing (eg, COPD, ALS, muscular dystrophy). This study presents a valuable finding by pharmacologically mapping the core medullary region that contributes to active expiration and addresses the question of where these regions lie anatomically. Results from these experiments will be of value to those interested in the neural control of breathing and other neuroscientists as a framework for how to perform pharmacological mapping experiments in the future.

      Thanks for the positive feedback on our study, as well as the assessment of the novelty of our investigation and the advancements to the field that these results will bring in the future.

      We have addressed the specific comments and made changes to the manuscript as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      The main focus of the current study is to identify the anatomical core of an expiratory oscillator in the medulla using pharmacological disinhibition. Although expiration is passive in normal eupneic conditions, activation of the parafacial (pFL) region is believed to evoke active expiration in conditions of elevated ventilatory demands. The authors and others in the field have previously attempted to map this region using pharmacological, optogenetic, and chemogenetic approaches, which present their own challenges.

      In the present study, the authors take a systematic approach to determine the precise anatomical location within the ventral medulla's rostrocaudal axis where the expiratory oscillator is located. The authors used a bicuculline (a GABA-A receptor antagonist) and fluorobeads solution at 5 distinct anatomical locations to study the effects on neuronal excitability and functional circuitry in the pFL. The effects of bicuculline on different phases of the respiratory cycle were characterized using a multidimensional cycle-by-cycle analysis. This analysis involved measuring the differences in airflow, diaphragm electromyography (EMG), and abdominal EMG signals, as well as using a phase-plane analysis to analyze the combined differences of these respiratory signals. Anatomical immunostaining techniques were also used to complement the functional mapping of the pFL.

      Major strengths of this work include a robust study design, complementary neurophysiological and immunohistochemical methods, and the use of a novel phase-plane analysis. The authors construct a comprehensive functional map revealing functional nuances in respiratory responses to bicuculline along the rostrocaudal axis of the parafacial region. They convincingly show that although bicuculline injections at all coordinates of the pFL generated an expiratory response, the most rostral locations in the lateral parafacial region play the strongest role in generating active expiration. These were characterized by a strong impact on the duration and strength of ABD activation and a robust change in tidal volume and minute ventilation. The authors also confirmed histologically that none of the injection sites overlapped grossly with PHOX2B+ neurons, thus confirming the specificity of the injections in the pFL and not the neighboring RTN.

      Collectively, these findings advance our understanding of the presumed expiratory oscillator, the pFL, and highlight the functional heterogeneity in the functional response of this anatomical structure.

      Thanks for the positive feedback on the results presented in the current manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Pisanski and colleagues map regions of the brainstem that produce the rhythm for active expiratory breathing movements and influence their motor patterns. While the neural origins of inspiration are very well understood, the neural bases for expiration lag considerably. The problem is important and new knowledge pertaining to the neural origins of expiration is welcome.

      The authors perturb the parafacial lateral (pFL) respiratory group of the brainstem with microinjection of bicuculline, to elucidate how disinhibition in specific locations of the pFL influences active expiration (and breathing in general) in anesthetized rats. They provide valuable, if not definitive, evidence that the borders of the pFL appear to extend more rostrally than previously appreciated. Prior research suggests that the expiratory pFL exists at the caudal pole of the facial cranial nucleus (VIIc). Here, the authors show that its borders probably extend as much as 1 mm rostral to VIIc. The evidence is convincing albeit with caveats.

      Strengths:

      The authors achieve their aim in terms of showing that the borders of the expiratory pFL are not well understood at present and that it (the pFL) extends more rostrally. The results support that point. The data are strong enough to cause many respiratory neurobiologists to look at the sites rostral to the VIIc for expiratory rhythmogenic neurons and characterize their properties and mechanisms. At present my view is that most respiratory neurobiologists overlook the regions rostral to VIIc in their studies of expiratory rhythm and pattern.

      Weaknesses:

      The injection of bicuculline has indiscriminate effects on excitatory and inhibitory neurons, and the parafacial region is populated by excitatory neurons that are expiratory rhythmogenic and GABA and glycinergic neurons whose roles in producing active expiration are contradictory (Flor et al. J Physiol, 2020, DOI: 10.1113/JP280243). It remains unclear how the microinjections of bicuculline differentially affect all three populations. A more selective approach would be able to disinhibit the populations separately. Nevertheless, for the main point at hand, the data do suggest that we should reconsider the borders of the expiratory pFL nucleus and begin to examine its physiology up to 1 mm rostral to VIIc.

      The control experiment showed that bicuculline microinjections induced cFos expression in the pFL, which is good, but again we don't know which neurons were disinhibited: glutamatergic, GABAergic, or glycinergic.

      Thanks for sharing your excitement on the results of our study, and appreciating the thorough investigation performed with the use of bicuculline, an approach that was originally used in Pagliardini et al, 2011, PMID: 21414911) and then used by many other groups to generate and study active expiration in vivo.

      In the current study we used the well known effect of Bicuculline to systematically test the area that is more sensitive to such a pharmacological effect, and hence may be the core for generating active expiration. While the use of GABA receptor antagonists may have an indiscriminate effect on GABA receptor expressing neurons with various phenotypes, anatomical assessment of inhibitory cells has shown very little distribution of GABAergic and glycinergic cells in the parafacial area (Tanaka et.al, 2003; PMID: 14512139) and it has been inferred in multiple publications (Huckstepp et al., 2015, PMID: 25609622; Huckstepp et al. 2016 PMID: 27300271; Huckstepp et al., 2018, PMID: 30096151; Flor et al., 2020, PMID: 32621515; Britto & Moraes, 2017; PMID: 28004411; Silva et al. 2016; PMID: 26900003) and demonstrated recently (Magalhaes et al.,  2021; PMID: 34510468) that late-E neurons in the parafacial region are excitatory and have a glutamatergic phenotype. We can’t exclude that a small fraction of neurons in the pFL area are inhibitory, and that they could influence recruitment of adjacent late-E expiratory neurons. A more selective activation of neuronal populations with different phenotype would be indeed interesting, nonetheless, if local inhibitory neurons have a role in the generation of active expiration, then their disinhibition could have either an inhibitory effect on late-E activity or stimulate expiration in a more indirect fashion.

      While the effect of bicuculline on active expiration has been reported and replicated in multiple manuscripts, the source of inhibition across different phases of the respiratory cycle is still under investigation. Some studies suggest that GABAergic and glycinergic inhibition is not originated in pFL but rather in the BötC and preBötC areas (Flor et al., 2020, PMID: 32621515; Magalhaes et al., 2021; PMID: 34510468) and the effects of this inhibition across the respiratory cycle is debated. Future studies will be key to identify the source of pFL inhibition.

      The manuscript characterizes how bicuculline microinjections affect breathing parameters such as tidal volume, frequency, ventilation, inspiratory and expiratory time, as well as oxygen consumption. Those aspects of the manuscript are a bit tedious and sometimes overanalyzed. Plus, there was no predictive framework established at the outset for how one should expect disinhibition to affect breathing parameters. In other words, if the authors are seeking to map the pFL borders, then why analyze the breathing patterns so much? Does doing so provide more insight into the borders of pFL? I did not think it was compellingly argued.

      We have edited the introduction to address this comment and emphasize the rationale for the study. We also edited the results section to summarize our findings.

      We continue to report our in-depth analysis of the perturbations induced by bicuculline injection over the various respiratory characteristics as this will be fundamental to determine the effects of our experiment not only on the activation of pFL and active expiration, but also on the respiratory network in general. In order to be fair and open about our findings we have reported the results of our analysis in detail. Of note, all sites generated active expiration, but since the objective of the study was to determine the sites with the most significant changes, a finer and multilevel analysis has been used.

      Further, lines 382-386 make a point about decreasing inspiratory time even though the data do not meet the statistical threshold. In lines 386-395, the reporting appears to reach significance (line 388) but not reach significance (line 389). I had trouble making sense of that disparity.

      The statistics were confirmed, and the lines edited as follows: “Interestingly, the duration of inspiration during the response was found to decrease in all groups relative to baseline respiration (Ti response = 0.279 ± 0.034s, Ti baseline = 0.318 ± 0.043s, Wilcoxon rank sum: Z = 3.24, p = 0.001). Contrary to this decrease in inspiratory duration, the total expiratory time was observed to increase in all groups and remained elevated compared to baseline (TE response = 1.313 ± 0.188s, TE baseline = 1.029 ± 0.161s, Wilcoxon rank sum: Z = 4.49, p = 0.001).”

      The other statistical hiccups include "tended towards significance" (line 454), "were found to only reach significance for a short portion of the response" (line 486-7), "did not reach the level of significance" (line 506), which gives one the sense of cherry picking or over-analysis. Frankly, this reviewer finds the paper much more compelling when just asking whether the microinjections evoke active expiration. If yes, then the site is probably part of the pFL.

      Statistical “tendencies” have been eliminated throughout the manuscript.

      We have analyzed in details our results in order to determine changes and differential effects on respiration when comparing the 5 sites of injections. Although the presentation of the results may seem tedious, it has allowed us to highlight some interesting effects: first, the effects on respiratory frequency. It has been shown in the past that optogenetic stimulation of this area causes an increase in respiratory frequency (Pagliardini et al., 2011, PMID: 21414911), whereas a dishinibition with this same approach or stimulation of AMPAreceptor in pFL have shown a reduction in frequency or not a significant change in the response (Pagliardini et al., 2011, PMID: 21414911; Huckstepp et al., 2015, PMID: 25609622; Huckstepp et al. 2016 PMID: 27300271; Huckstepp et al., 2018, PMID: 30096151). Here, we suggest that the reduction in respiratory frequency is observed only in the caudal sites and could be attributed to BötC effects rather than the stimulation of the core of the pFL since no respiratory change was observe where the effect was more potent (rostral side). Another interesting point was the effects on O2 consumption, although difficult to interpret at this point, we found very interesting that hyperventilation occurred only at the most rostral injection sites.

      I encourage the authors to consider the fickleness of p-values in general and urge them to consider not just p but also effect size.

      Thank you for the feedback on our description of the statistical results and the suggestion of incorporating effect size. We have now included measurements of effect size in the results section.  Specifically, we calculated the effect size within each ANOVA using the value of eta squared for all data shown in Figures 3 and 4. Please note that in our phase-plane analysis (Fig. 5-6) the Mahalanobis distance is itself an effect size measure for multidimensional data. We also note that statistical evaluation using non-parametric analyses do not involve effect sizes.

      Reviewer #3 (Public Review):

      Summary:

      The study conducted by Pisanski et al investigates the role of the lateral parafacial area (pFL) in controlling active expiration. Stereotactic injections of bicuculline were utilized to map various pFL sites and their impact on respiration. The results indicate that injections at more rostral pFL locations induce the most robust changes in tidal volume, minute ventilation, and combined respiratory responses. The study indicates that the rostrocaudal organization of the pFL and its influence on breathing is not simple and uniform.

      Strengths:

      The data provide novel insights into the importance of rostral locations in controlling active expiration. The authors use innovative analytic methods to characterize the respiratory effects of bicuculline injections into various areas of the pFL.

      Weaknesses:

      Bicuculline injections increase the excitability of neurons. Aside from blocking GABA receptors, bicuculline also inhibits calcium-activated potassium currents and potentiates NMDA current, thus insights into the role of GABAergic inhibition are limited.

      Increasing the excitability of neurons provides little insights into the activity pattern and function of the activated neurons. Without recording from the activated neurons, it is impossible to know whether an effect on active expiration or any other respiratory phase is caused by bicuculline acting on rhythmogenic neurons or tonic neurons that modulate respiration. While this approach is inappropriate to study the functional extent of the conditional "oscillator" for active expiration, it provides valuable insights into this region's complex role in controlling breathing.

      We have included a reflection of the weaknesses of our studies in the technical consideration section to address the possibility that bicuculline may induce active expiration through other mechanisms. Please note that the use of bicuculline was not to gain further insight on GABAergic inhibition of pFL but to adopt a tool to generate active expiration that has been extensively validated by our group and others.

      Multiple studies have shown recruitment of excitatory late expiratory neurons with bicuculline injections. Although we did not record from late-E neurons in this study, we infer from the body of literature that disinhibition of neurons in this area will activate late-E neurons (as previously demonstrated) and generate active expiration. Although we see value in recording activity of single neurons (especially to study mechanisms of rhythmogenesis), we opted to measure the physiological response from respiratory muscles as an indication of active expiration recruitment in vivo. Recording from single neurons after bicuculline injections in each site would confirm the presence of expiratory neurons along the parafacial area, which is probably not surprising, since every site tested promoted active expiration. The focus of the study though was to determine the site with the strongest physiological response to disinhibition. Future studies will be key to determine whether all neurons along this column have similar electrophysiological rhythmic properties to the ones recently reported (Magalhaes et al., 2021; PMID: 34510468), or some of them simply provide tonic drive to late-E neurons located elsewhere.

      We have discussed the issue as follows:

      “Our experiments focused on determining the area in the pFL that is most effective in generating active expiration as measured by ABD EMG activity and expiratory flow. We did not attempt to record single cell neuronal activity at various locations as previously shown in other studies (Pagliardini et al 2011; Magalhaes et al., 2021), as this approach would most likely find some late-E neurons across the pFL and thus not effectively discriminate between areas of the pFL. Future studies involving multi-unit recordings or imaging of cell population activities will help to determine the firing pattern and population density of bicuculline-activated cells and further determine differences in distribution and function of late-E neurons across the region of the pFL.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall, the manuscript addresses an important question in the field, the anatomical location of the expiratory oscillator. I commend the authors for a well-thought-out and clearly presented study. However, a few small concerns deserve attention to improve the clarity of the report.

      (1) The figures would benefit from a rostral-to-caudal representation of results instead of a caudal-to-rostral orientation. Example, Figure 2.

      We opted for a caudal to rostral representation to progressively move away from the inspiratory oscillator (preBötC) and the anatomical reference point (the caudal tip of the facial nucleus) with our series of injections. 

      (2) A discussion about how expiratory responses generated by these pharmacological approaches would compare to endogenous baseline conditions. The authors mention that bicuculline injections elicited a late-E downward inflection that was absent in baseline conditions. Thus, this raises the point of how these findings compare to awake freely moving animals or during different conditions of increased ventilatory demand.

      This is an interesting question that has not yet been address in the field. As far as we know, there are no recordings of pFL neurons in freely behaving animals although recordings of pFL late-E neurons under elevated PaCO2 have shown a late-E activity in in situ preparations (Britto & Moraes, 2017; PMID: 28004411; Magalhaes et al., 2021; PMID: 34510468).

      We have clarified this in the discussion as follows:

      “At rest, respiratory activity does not present with active expiration (i.e, expiratory flow below its functional residual capacity in conjunction with expiratory-related ABD muscle recruitment) and expiratory flow occurs due to passive recoil of chest wall with no contribution of abdominal activity. Active expiration and abdominal recruitment can be spontaneously observed during sleep (in particular REM sleep, Andrews and Pagliardini, 2015; Pisanski et al., 2019) and can be triggered during increased respiratory drive (e.g. Hypercapnia, RTN stimulation, Abbott et al., 2011). Although never assessed in freely moving, unanesthetized rodents, bicuculline has been extensively used to generate active expiration and late-E neuron activity in both juvenile and adult anesthetized rats (Pagliardini et al., 2011; Huckstepp et al., 2015 Huckstepp et al., 2016; Huckstepp et al., 2018; De Britto and Moraes, 2017; Magalhaes et al., 2021). “

      (3) In Figure 2A, there appears to be an injection site in the top right quadrant of the image, very distant from the intended site. Could the authors confirm if this is an artifact?

      Yes, it is an artifact of image acquisition, we should have marked that in the figure. To avoid confusion and follow other reviewers’ suggestions we have edited he figure.

      (4) A stylistic suggestion would be to include the subpanel of Figure 2C saline control injection as a graph of its own and also include the control anatomical location in 2B.

      Thanks for the suggestion. Because of the complex organization of the figure we opted to leave it as a subpanel in order to not distract the reader from the 5 injection sites, but still provide information about vehicle injection and their lack of changes in respiratory response.

      (5) The authors note that DIAm Area (norm.) during the inspiratory phase is increased in the +6 and +8mm groups. However, Figure 5E shows that the +8mm group is significantly reduced as compared to the +6mm group. Please clarify.

      During the inspiratory phase we did not observe any significant change in the DIA Area (norm.). We realize that the description of this part of the results was confusing and therefore we have eliminated that section.

      Reviewer #2 (Recommendations For The Authors):

      I encourage the authors to consider the fickleness of p-values in general and urge them to consider not just p but also effect size. There is a valuable editorial in this week's J Physiology (https://doi.org/10.1113/JP285575) that may provide helpful guidance.

      Thanks for this comments and the general assessment. We realized that the results section was dense and with a lot of information. We significantly slimmed the description of the results in order to facilitate the appreciation of the results and avoid confounding statement about significant vs non- significant results.

      We have now included measurements of effect size in the results section.  Specifically, we calculated the effect size within each ANOVA using the value of eta squared for all data shown in Figures 3 and 4. Please note that in our phase-plane analysis (Fig. 5-6) the Mahalanobis distance is itself an effect size measure for multidimensional data. We also note that statistical evaluation using non-parametric analyses do not involve effect sizes.

      The equipment and resources should be clearly identified and use RRIDs whenever possible. Resources like antibodies and other reagents (e.g., cryoprotectants) should be identified, not just by manufacturer, but also by specific part or product numbers or identifiers.

      Manuscript has been edited to add these details.

      The manuscript makes reference to ImageJ and Matlab routines, which must be public through GitHub or another stable repository.

      Thanks for pointing this out. Image J analysis has been performed following scripts already available to users (no custom scripts). The Matlab scripts used for the multivariate analysis is now available at: https://github.com/mprosteb/Pisanski2024

      The way that ABD-DIA coupling was assessed was unclear from the Methods.

      The following text has been added to the methods: “The coupling between ABD and DIA signals was measured as a ratio and analyzed by quantifying the number of bursts of activity observed for the ABD and DIA EMG signals during the first 10 minutes of the response, excluding time bins at end of the response (due to fading and waning of the ABD response in those instances).”

      Fig. 1A was never cited in the text.

      It has been cited now.

      Fig. 1A-C appears to be exactly the same as Fig. 5A-C.

      The reviewer is correct. We have used figure 1 to describe and explain our analytical methods with sample data and Figure 5 describes our results. We have clarified that in: “Figure 5: Rostral injections elicit more prominent changes to respiration in each signal and sub-period. A-C: Is the same as Method Figure 1, has been included here for further clarity when analyzing the results.”

      Late Expiratory airflow is given in units of volts (V) in lines 358-363 (Fig. 4C) but then in units of volts-seconds (V•s) in lines 363-367. Both units are problematic because the voltage is neither an air volume nor an air volume per unit time. Is there some conversion factor left out?

      In this section of the results we describe the changes in expiratory peak amplitude (V) and expiratory peak flow (V•s). Since calibration of airflow was performed on the positive flow and for larger volumes, we prefer to use the original units to guarantee precise assessment of the change and avoid introducing potential errors. Since the analysis considers changes from baseline readings, converting to ml or ml*s would not affect our analysis.

      Reviewer #3 (Recommendations For The Authors):

      The study conducted by Pisanski et al investigates the role of the lateral parafacial area (pFL) in respiratory control, specifically in modulating active expiration. The precise location of this expiratory oscillator within the ventral medulla remains uncertain, with some studies indicating that the caudal tip of the facial nucleus (VIIc) forms the core while others propose more rostral areas. Bicuculline injections were utilized at various pFL sites to explore the impact of these injections on respiration. The authors use innovative and impressive analytic methods to characterize the effect on respiratory activity. The results indicate that injections at more rostral pFL locations induce the most robust changes in tidal volume, minute ventilation, and combined respiratory responses. The study will contribute to an enhanced understanding of the neural mechanisms controlling active expiration. The main message of the study is that the rostro-caudal organization of the pFL is not simple and uniform. The data provides novel insights into the importance of rostral locations in controlling active expiration (see e.g. lines 738-740).

      The data and results of the paper are intriguing, and it appears that the experiments are well-managed and executed. However, there are several major and minor comments and suggestions that should be addressed by the authors:

      (1) The study relies heavily on local injections into specific areas that are confirmed histologically. One potential concern is the injection volume of 200 nL in such a tiny area. The authors suggest that the drug did not spread to rostral/caudal areas outside the specified coordinate partly based on their cFOS staining. For example, the lack of cFOS activation in TH+ cells and Phox2B cells is interpreted as proof that bicuculline did not spread to these somas (Figure 2). The authors seem to use a similar argument as evidence that the pFL does not include Phox2B neurons in the RTN as discussed in the Discussion section (lines 830-847). However, it is very surprising that bicuculline injections into an area that is known to contain Phox2B and Th+ neurons do not activate these neurons as assessed by the cFOS staining. It seems puzzling to me that none of their injections shown in Figure 2 activated Phox2B or Th neurons. I assume that in targeting the pFL the authors must have sometimes hit areas that included neurons that define the RTN, which would have activated Phox2B or Th+ neurons. Did the authors find that these activations did not activate active expiration? Such negative "controls" would strengthen their argument that pFL is a separate and distinct region that selectively controls active expiration.

      Thanks for the positive feedback on the manuscript. As it has been demonstrated and discussed in several previous publications, PHOX2B expressing neurons in this area of the brain are part of the RTN Neuromedin B positive neurons (more densely located in the ventral paraFacial rather than the lateral parafacial, our site of injection), the TH+ C1 neurons (located in a somewhat more caudal and medial position compared to our sites of injection, around the BötC/ preBötC area) and the large Facial MN (easily identifiable by their large size and compact location). Given this differential spatial distribution, and the controls described below, we believe we have reduced the possibility of the direct activation of these neurons, although we can’t exclude it in full.

      There is now strong evidence about lack of PHOX2B expression in late E neuron in juvenile and adult rats (Magalhaes et al., 2021; PMID: 34510468). We realize that the microinjected solution could potentially diffuse in the brain and hit other areas, but we combined two strategies to verify our intention for a focal injection activating only a restricted area of the brain (i.e., the pFL): i) localization of fluorobeads that were diluted in the Bicuculline solution; ii) expression of cFos combined with anatomical markers, to identify activated cells. Fluorobeads have a very limited spread in the brain and therefore informed us of the site of the injection to differentiate between the five injections locations. Although we can’t assume that Bicuculline will have a similar spread (and it will also be quickly degraded in the tissue), the combination of this analysis with the localized expression of cFos cells has helped us to differentiate between injections site. Because of the proximity of PHOX2B cells in RTN and C1 neurons, we also combined cFos expression with immunohistochemistry to determine whether bicuculline activation was also visible in these two neuronal populations. Our results indicate that there is baseline cfos activity in RTN neurons (see vehicle injection) but the fraction of PHOX2B activated cells did not increase with bicuculline injections suggesting that these neurons were not the target of our injections. Please note that cfos expression has been extensively used to determine RTN neuron activation, especially following chemoreflex responses. 

      (2) The authors refer to "the expiratory oscillator" throughout the manuscript (e.g. lines 58, 62, 65) as if there is only one expiratory oscillator i.e. "the expiratory oscillator". For some reason, the authors avoided citing and mentioning PiCo (Anderson et al. 2016), which is considered the oscillator for postinspiration. Since the present study focuses on the role of expiration, and since the authors describe convincing effects on postinspiration, considering this oscillator which is located dorsomedial to the VRC seems relevant for the present study.

      Due to the limited and controversial literature that is currently present describing Pico as a third oscillator and the fact that our studies do not directly assess the post-inspiratory activity (as measure by the V nerve or laryngeal muscles) or Pico activity and location (which would be even more distant than the RTN, for example), we prefer to avoid commenting on the effects of this injection on Pico or the connectivity between Pico and pFL.

      We have added this to the discussion:

      “Therefore, although it has previously been described, it is currently unknown the exact mechanism by which this post-I activity in the ABD muscles is generated. For example the interplay between the rostral pFL and brainstem structures generating post-inspiratory activity, such as the proposed post-inspiratory oscillator (PiCo; Anderson et al., 2016) or pontine respiratory networks, could be reasonably involved in this process.”

      (3) The authors do not specify what type of bicuculline they injected. Bicuculline is known to have significant effects on potassium channels. Thus, the effects reported here could be due to a non-specific change in excitability, rather than caused by a specific GABAergic blockade.

      The authors also do not know what effects these injections cause in the neurons in vivo, since the injections are not accompanied by recordings from the respiratory neurons that they activate. This together with the non-specific bicuculline effects will affect the interpretation of the results. Thus, the authors need to be more careful when interpreting their effects as "GABAergic". The use of more specific blockers like gabazine could partly address this concern. The authors have to discuss this in a "limitation section".

      Thanks for pointing that out, we have now clarified in the methods section that we used bicuculline methochloride. We can’t exclude that some side- effects could be present due to the use of this drug. For the purpose of this study though, we focused on using bicuculline as a tool to consistently generate active expiration since it has been extensively used by multiple laboratories to induce abdominal muscle recruitment and active expiration, as well as to directly record late-E neurons in this same area.

      We have included in the discussion the following statement:

      “Technical considerations

      Bicuculline methiodide has previously been observed to exhibit inhibitory effects on Ca2+ activated K+ currents inducing non-specific potentiation of NMDA currents (Johnson and Seutin, 1997). Consequently, caution is warranted in attributing our findings solely to the GABAa antagonist properties of bicuculline. Previous work has demonstrated a temporal correlation between the onset of late-E neuron activity in the caudal parafacial region and ABD activity in response to bicuculline (Pagliardini et al., 2011; de Britto and Moraes, 2017; Magalhaes et al., 2021) as well as GABAergic sIPSCs in late-E neurons (Magalhaes et al., 2012). However, it is essential to note that the current study lacks single unit recording, preventing us from definitively confirming whether the observed activity stems from late-E neuronal GABAergic dishinibition or excitation through non GABAergic mechanisms.”

      (4) I also caution the authors when stating that the bicuculline injections will reveal the precise location and functional boundaries of "the" expiratory oscillation within the pFL. Increasing the excitability with bicuculline is inappropriate to study the functional boundaries of an oscillator. It is particularly inappropriate to identify the boundaries of the pFL, a network that is normally inactive and activated only under certain behavioral and metabolic conditions. Because the injections are increasing the neuronal excitability unspecifically, and because the authors are not recording the activity of the neurons in the pFL region it is unclear what kind of neurons are activated. The cFOS staining may help to define whether these neurons are Phox2B or Th positive or negative, but they will not provide insights into the activity patterns of the activated neurons. Thus, it is fair to assume that these injections will likely include also tonic neurons that might indirectly control the activity of pFL neurons under certain metabolic or behavioral conditions without actually being involved in the rhythmogenesis of active expiration. Many of the effects peak after several minutes, and different regions cause differential effects with different time courses, which is difficult to interpret functionally. Thus, the "core" identified in the present study could consist of tonic neurons as opposed to rhythmic neurons generating active expiration.

      We agree with the reviewer that our local injections may have activated an heterogeneous population of neurons. We do not claim that we only activated late-E rhythmogenic neurons but that our multiple sites of injections revealed the area that is generating the strongest excitation of ABD muscles and active expiration.

      While the use of GABA receptor antagonists may have an indiscriminate effect on GABA receptor expressing neurons with various phenotypes, anatomical assessment of inhibitory cells has shown very little distribution of GABAergic and glycinergic cells in the parafacial area (Tanaka et.al, 2003; PMID: 14512139) and it has been inferred in multiple publications (Huckstepp et al., 2015, PMID: 25609622; Huckstepp et al. 2016 PMID: 27300271; Huckstepp et al., 2018, PMID: 30096151; Flor et al., 2020, PMID: 32621515; Britto & Moraes, 2017; PMID: 28004411; Silva et al. 2016; PMID: 26900003) and demonstrated recently (Magalhaes et al.,  2021; PMID: 34510468) that late-E neurons in the parafacial region are excitatory and have a glutamatergic phenotype

      As suggested by the reviewer, it is possible that the bicuculline injection may have activated some tonic non rhythmogenic neurons which could activate the expiratory oscillator located elsewhere.

      We have edited the introduction as follows:

      “By strategically administering localized volumes of bicuculline at multiple rostrocaudal levels of the ventral brainstem, we aimed to selectively enhance the excitability of neurons driving active expiration, thereby revealing the extension of the pharmacological response and the most efficient site in generating active expiration.”

      We have edited the results as follows:

      “Importantly, the group with injection sites at +0.6 mm from VIIc exhibited the swiftest response onset, suggesting that this area is the most critical for the generation of active expiration, either through direct activation of the expiratory oscillator or, alternatively, for providing a strong tonic drive to late-E neurons located elsewhere.”

      In the introduction, it should also be emphasized that the pharmacological approach used in the present study complements the existing elegant chemogenetic studies, rather than emphasizing primarily the limitations of the chemogenetic inhibitions. The conclusion should be that these studies together provide different, yet complementary insights: The chemogenetic approach by inhibiting neurons, the present study by exciting neurons, and all studies come with their own limitations.

      Thanks for the suggestion, we have updated the manuscript as follows:

      “Although both of these elegant chemogenetic studies have contributed extensively to our understanding of the pFL, the existing evidence suggests that the expiratory oscillator may expand beyond the limits of the viral expression achieved in said studies, as proposed by Huckstepp et al., (2015).”

      Throughout the manuscript, the authors have to be cautious when implying that an excitatory effect relates to the activity of rhythmogenic pFL neurons. For example, on line 710 the authors state that "it is conceivable to infer that the rostral pFL is in the closest proximity to the cells responsible for the generation of active expiration". While it may indeed be "conceivable", the bicuculline injections themselves provide no insights into the location of neurons responsible for rhythmogenesis. It is equally "conceivable" that the excited neurons provide a tonic drive to the neurons without being involved in the generation of active expiration. These tonic neurons could be located at a distance from the presumed rhythmogenic core.

      We have included the possibility of tonic excitation in the technical considerations section:

      “However, our study did not include recording from late-E neurons following bicuculline injections, preventing us from definitively confirming whether the observed activity stems from late-E neuronal excitation or the potentiation of a tonic drive, particularly in the rostral areas.”

      (5) It is intriguing that some of their injections (Fig.2D) evoked postinspiratory activity. This interesting finding should be discussed as it could provide important insights into the coordination of the different phases of expiration.

      Thanks for the suggestion. We have included the following to the discussion:

      “Therefore, although it has previously been described, the exact mechanism by which this post-I ABD activity is generated is unclear. This late-E/post-I pattern of activity is similar to what has been observed in in vitro preparations and in vivo recordings in juvenile rats (Janczewski et al., 2002; Janczewski et al., 2006).

      “Therefore, although it has previously been described, it is currently unknown the exact mechanism by which this post-I activity in the ABD muscles is generated. For example the interplay between the rostral pFL and brainstem structures generating post-inspiratory activity, such as the proposed post-inspiratory oscillator (PiCo; Anderson et al., 2016) or pontine respiratory networks, could be reasonably involved in this process.”

      (6) The authors conducted bilateral disinhibition of the pFL, but only a unilateral photomicrograph was shown. Figure 2 should include a representative bilateral photomicrograph along with a scatter plot for clarity and completeness.

      We have edited figure 2 to include representative images of bilateral injections.

      (7) Regarding the Bicuculline injections in the Methods section: Aside from specifying exactly what type of bicuculline was used, the authors should provide more information about the pFL location and landmarks used, including the missing medial-lateral coordinate. The fluorobead spread of approximately ~300 µm, as observed in Figure 2C, is crucial for the interpretation of the results and should be detailed. An alternative approach could involve e.g. calculating the area covered by fluorobeads in each group.

      We have included the following in the text:

      “Each rat was injected at 2.8 mm lateral from the midline and at a specific RC coordinate based on the following groups: -0.2 mm from the caudal tip of the facial nucleus (VIIc) (n=5), +0.1 mm from VIIc (n=7), +0.4 mm from VIIc (n=5), +0.6 mm from VIIc (n=6), +0.8 mm from VIIc (n=5)”

      “These findings strongly suggest that bicuculline specifically activated cells within the vicinity of the injection sites which spread ~300 ìm (Figure 2C, horizontal lines) and did not activate PHOX2B+ cells in the RTN area, beyond their baseline level of activity.”

      (8) In the Experimental Protocol, the authors should provide more details on how the parameters were determined. For example, specify the number of cycles included for Dia frequency/amplitude, Abd frequency/amplitude, and with regards to the averaging process, the authors should specify over how many cycles they obtained an average for Dia/Abd activity time and AUC. The authors should also provide information on the number of bicuculline injections that they repeated to average these values and they should report the coefficient of variation for repeated injections. Please clarify the method used to calculate AUC, considering the non-linear nature of the activity.

      Only one bicuculline injection per rat was performed and the number of rats used for each injection site is indicated in the methods as follows:

      “Each rat was injected at 2.8 mm lateral from the midline and at a specific RC coordinate based on the following groups: -0.2 mm from the caudal tip of the facial nucleus (VIIc) (n=5), +0.1 mm from VIIc (n=7), +0.4 mm from VIIc (n=5), +0.6 mm from VIIc (n=6), +0.8 mm from VIIc (n=5), and CTRL (n=7). We recorded the physiological responses to the injection for 20-25 min.”

      We have clarified in the methods section the following:

      “Respiratory data was tracked in time bins of 2-minute duration from the baseline period prior to injections and spanned 20 min of recording post-injection. Mean-cycle measurements for each signal were computed by averaging values across all cycles within a given time bin.”

      Additional clarifications have been added:

      “We then used the average calculations of respiratory rate (RR), tidal volume (VT), Minute Ventilation (Ve), expiratory ABD amplitude, expiratory ABD area, VO2, VE/VO2 to obtain values relative to the baseline period. Peak responses were identified as the time bin that produced the strongest changes relative to baseline.”

      “Mean-cycle measurements for each signal were computed by averaging across all cycles within a given time bin. (~300 cycles in baseline, ~100 cycles per response time bin). We then used the average calculations of respiratory rate (RR), tidal volume (VT), Minute Ventilation (Ve), expiratory ABD amplitude, expiratory ABD area, VO2, VE/VO2 to obtain values relative to the baseline period. Peak responses were identified as the time bin that produced the strongest changes relative to baseline.”

      “The Area under the curve (AUC) was measured during baseline and was subtracted from the corresponding AUC of the response for each time bin (Figure 1C). This AUC measure was computed as the sum of the signal in a given respiratory phase as all signals were sampled at the same rate. Note that areas calculated below the zero- (0) line, as would be expected from a negative airflow during expiration, yields negative AUC values.”

      (9) The authors should explain how oxygen consumption was calculated-did it involve the Depocas & Hart (1957) formula? Please provide information on expiratory CO2, whether ventilation was adjusted to achieve consistent CO2 levels across animals, and ideally specify the end-tidal CO2 range for the experiments. Discuss the rationale behind the chosen CO2 levels and whether CO2-dependent pFL activity could have influenced results.

      We have clarified in the measurement in the methods as follows:

      “The gas analyzer measured fractional concentration of O2. Based on this and the flow rate at the level of the trachea (minute ventilation), we calculated O2 consumption according to Depocas and Hart (1957).”

      We have also added to the methods section:

      “During the entire experimental procedure, rats breathed spontaneously and end tidal CO2 was not adjusted through the experimental protocol.”

      In terms of the CO2-dependent pFL activity possibly influencing the results: by inducing active expiration in conditions in which there is no physiological demand for it (i.e. no hypoxia or hypercapnia), it is likely that pCO2 is reduced, overall decreasing the drive for ABD activity which would suggest that our results are likely an underestimation of the response that would have been produced if we maintained the CO2 levels constant.

      (10) The authors should address the discrepancy in fos-activated neurons between the control (44 neurons) and experimental animals (90-120 neurons per hemisection). Please explain the activation in the control group. Please also provide insights into how the authors interpret this difference in cfos-activated neurons between control and experimental groups.

      The following paragraph has been added to the discussion:

      “The assessment of cellular activity, quantified through cFos staining, unveiled the existence of basal activity in control rats. This observed baseline activity is likely emanating from subthreshold physiological processes within the parafacial area which do not culminate in ABD activity. Analysis of the cFos staining confirmed focal activation of neurons in the pFL of rats injected with bicuculline and minimal cFos expression in the PHOX2B+ cells in all groups as compared to the control group. These results confirm the very limited mediolateral spread of the drug from the core site of injection and back previous findings supporting the hypothesis that the majority of PHOX2B+ cells are more ventrally located in the parafacial area (pFV, Huckstepp et al., 2015) and PHOX2B+ cell recruitment is not necessary for active expiration (de Britto & Moraes, 2017; Magalhães et al., 2021).”

      (11) In Figure 8, the authors plotted the relationship of each cycle correlated to the normalized area. Have you also calculated the same late-E, inspiratory, and post-I to fR or VT separately?

      No, we only did the separated breathing phase (late-E, I, Post-I) analysis in the calculations of the DIA, airflow and ABD area, as well as on the Euclidean and Mahalanobis distances.

      Minor comments:

      Is there any specific reason for conducting these experiments exclusively in males?

      No, we usually use male rats for this type of experiments. We use both male and female rats for other studies that concern the effects of sex hormones but in this case, we performed experiments only in male rats.

      Page 13, Line 320: What is the duration of the bicuculline-induced effects?

      This information is included in the results section as follows:

      “Similarly, the ABD response duration was longer at the two most rostral locations (+0.6 mm = 17.6 ± 2.7 min; +0.8 = 17.1 ± 3.3 min) compared to the most caudal group (-0.2 mm = 2.4 ± 1.1 min; One-Way ANOVA p = 0.043; Tukey -0.2 mm vs +0.6 mm: p = 0.048; -0.2 mm vs +0.8 mm: p = 0.041; Figure 3E).”

      Page 16, Line 400: Is there a rationale for the high tidal volume (VT) observed in these animals? A baseline VT of 7 ml/kg appears notably elevated.

      Please note that rats were vagotomised and spontaneously breathing, hence the tidal volume is increased compared to non-vagotomised rats as seen in previous studies (Ouahchi et al., 2011).

      Figure 2D: Could you provide longer recordings? Additionally, incorporating diaphragm (Dia) recordings would enhance the interpretation of abdominal (Abd) recordings.

      Figure 3 A has a representative example of the 20 minute recordings for each location.

      Page 18, Line 458: Please rectify "Dunn: p , 0.001" to the appropriate format, perhaps "Dunn: p < 0.001."

      Thank you, edited.

    1. Author response:

      eLife assessment

      “…The evidence however is incomplete, since the tai loss-of-clone phenotype is based on one allele and the mechanism involved in cell competition through Dlp and Wg lacks adequate supporting data.”

      We agree with the need for a second allele and are adding supporting data from a new tai lof allele we have generated by Crispr.

      We also agree that additional functional data would help demonstrate that differences in Dlp levels are required for the mechanism of Tai cell competition. Experiments are ongoing to test whether normalizing Dlp levels across clonal boundaries rescues elimination of Tai-low clones.

      Reviewer #1:

      Overall Statements:

      “There is some data in the supplementary materials suggesting that Tai promotes dlp mRNA expression, but this was not compelling.”

      We are currently testing effects on Tai on dlp and dally transcription using qPCR and reporter transgenes. As noted below, the effects of Tai on Dlp trafficking are ‘strong’, so resolving effects on Dlp transcription will complement this localization data.

      “The authors don't further examine Dlp protein in tai clones.”

      As noted by the Reviewer, we do examine Dlp levels and localization in tai-low clones (see Figure 9), but these experiments are challenging due to their very small size and the hypomorphic nature of the tai allele (tai[k15101]) that was used. Experiments are in progress to examine the effect of our Crispr null allele of tai on Dlp levels and localization in wing clones.

      “In sum, the authors have uncovered some interesting results, but the story has some unresolved issues that, if addressed, could boost its impact. Additionally, the preprint seems to have 2 stories, one about tai and cell competition and the other about tai and Wg distribution. It would be helpful to reorder the figures and improve the narrative so that these are better integrated with each other.”

      We agree. The results of our modifier screen required that we first understand how Tai regulates the Wg pathway before could apply this to understanding the competitive mechanism. Thus, the paper is composed of three sections: 1. the screen, 2. the Tai-Dlp-Wg connection in the absence of competition, and 3. the contribution of Dlp-Wg to the tai[low] ‘loser’ phenotype. These sections use different techniques (e.g., clonal mosaics with genomic alleles, Gal4/UAS and RNAi to define the effect of Tai loss on Wg and Dlp). Ongoing experiments return to clonal mosaics to test whether elevating Dlp can rescue tai lof clones in the same manner as Apc/Apc2 alleles (see Figs. 2-3), which elevate Wg pathway activity.

      Specifics:

      “It would be good to know whether the authors can rescue tai-low clones by over-expression UAS-Dlp.”

      As noted above, experiments are ongoing to test whether normalizing Dlp levels across clonal boundaries rescues elimination of Tai-low clones.

      “The data on Wg distribution seems disjointed from the data about cell competition. The authors could refocus the paper to emphasize the cell competition story. The role of Dlp in Wg distribution is well established, so the authors could remove or condense these results. The story really could be Figs 1, 2, 3 and 7 and keep the paper focused on cell competition. The authors could then discuss Dlp as needed for Wg signaling transduction, which is already established in the literature.”

      We appreciate the suggestion to reorganize the figures to focus the first part of the story on competition, and then follow with the role of Tai in controlling Dlp. We will consider this approach pending the results of ongoing experiments.  

      “The model of tai controlling dlp mRNA and Dlp protein distribution is confusing. In fact, the data for the former is weak, while the data for the latter is strong. I suggest that the authors focus on the altered Dlp protein distribution on tai-low clones. It would also be helpful to prove the Wg signaling is impeded in tai clones (see #5 below).”

      We agree but are currently testing how dlp reporters and mRNA respond to Tai in order to rigorously test a Dlp transcriptional mechanism. To complement the ‘strong’ evidence that Tai regulates Dlp distribution, we are testing Dlp in clones of our Tai Crispr null. Since submission, we have also assessed the effect of blocking the endocytic factor shibire/dynamin in Dlp distribution in Tai deficient cells to complement the data on Pentagone that is already in the paper (see Fig. S3).

      “I don't know if the Fz3-RFP reported for Wg signaling works in imaginal discs, but if it does then the authors could make clones in this background to prove that cell-autonomous Wg signaling is reduced in tai-low clones.”

      We thank the reviewer for this suggestion, which we are now testing.

      Reviewer #2

      Overall Comments:

      “While the authors present good evidence in support of most of their conclusions, there are alternative explanations in many cases that have not been excluded.”

      We appreciate this point and are conducting experiments for a revised submission that will help test alternative mechanisms and clarify our conclusions.

      Specifics:

      “However, the experiments have been done with a single allele, and these experiments do not exclude the possibility that there is another mutation on the same chromosome arm that is responsible for the observed phenotype. Since the authors have a UAS-tai stock, they could strengthen their results using a MARCM experiment where they could test whether the expression of UAS-tai rescues the elimination of tai mutant clones. Alternatively, they could use a second (independent) allele to demonstrate that the phenotype can be attributed to a reduction in tai activity.”

      As noted above, we agree with the need for a second allele and are adding supporting data from a new tai lof allele we have generated by Crispr.

      The tai[k15101] allele acts as a tai hypomorph and has been shown to produce weaker phenotypes than the 61G1 strong lof in a number of papers (Bai et al, 2000; König et al, 2011, Luo et al, 2019, and Zhang et al, 2015). We agree that rescue of tai[k1501] with a UAS-Tai transgene would help rule out effects of second site mutations. We are currently pursuing the reviewer’s second suggestion of phenocopy with a different allele, our new tai Crispr lof.   

      “The authors have screened a total of 21 chromosomes for modification and have not really explained which alleles are nulls and which are hypomorphs. The nature of each of the alleles screened needs to be explained better.”

      We will update the text to better reflect what type of alleles were chosen. In most cases we preferred amorphs or null alleles over hypomorphs, however when the amorph option was not available, we used hypomorphs.

      “Also, the absence of a dominant modification does not necessarily exclude a function of that gene or pathway in the process. This is especially relevant for the Spz/Toll pathway which the authors have previously implicated in the ability of tai-overexpressing cells to kill wild-type cells.”

      We thank the reviewer for this completely accurate point. The dominant screen does not rule out effects of other pathways such as Spz/Toll. Indeed, we were surprised by the lack of dominant effects by Spz/Toll alleles on tai[low] competition given our prior work. The reciprocally clear dominant effect of Apc/Apc2 led us to consider that Wg signaling plays a role in this phenomenon, which then became the starting point of this study.

      “The most important discovery from this screen is the modification by the Apc alleles. This part of the paper would be strengthened by testing for modification by other components of the Wingless pathway. The authors show modification by Apc[MI01007] and the double mutant Apc[Q8] Apc2[N175A]. Without showing the Apc[Q8] and Apc2[N175A] alleles separately, it is hard to know if the effect of the double mutant is due to Apc, Apc2,` or the combination.”

      We agree that testing for modification with other components of the Wg pathway would be helpful to strengthen the connection between Tai low clonal elimination and Wg pathway biology. We also agree that separating Apc [Q8] and Apc2 [N175A] would be a good idea to check if both Apc proteins are equally important for rescuing Tai low cell death, and future experiments for the lab could investigate this distinction.

      “RNAi of tai seems to block the formation of the Wg gradient. If so, one might expect a reduction in wing size. Indeed, this could explain why the wings of tai/Df flies are smaller. The authors mention briefly that the posterior compartment size is reduced when tai-RNAi is expressed in that compartment. However, this observation merits more emphasis since it could explain why tai/Df flies are smaller (Are their wings smaller?).”

      We agree that this is an exciting possibility. Growth effects of Tai linked to interactions with Yorkie and EcR could be due to a distinct role in promoting Wg activity. Alternatively, Tai may cooperate with Yorkie or EcR to control Wg pathway. These are exciting possibilities that we are pursuing in future work

      With regard to the “small size” effect of reducing Tai, we have previously shown that RNAi of Tai using engrailed-Gal4 causes the posterior compartment to shrink (Zhang et al. 2015, Figure 1C-F, H). In this paper, we also showed that tai[k15101]/Df animals are proportionally smaller than wildtype animals and quantified this by measuring 2D wing size (Zhang et al. 2015, Figure 1A and 1B)

      “In Figure 7, the authors show the effect of manipulating Tai levels alone or in combination with increasing Dlp levels. However, they do not include images of Wg protein distribution upon increasing Dlp levels alone.”

      We thank the reviewer for this reminder and have already generated these control images to include in a revised submission paper.

      “In Figure 8, there is more Wg protein both at the DV boundary and spreading when tai is overexpressed in the source cells using bbg-Gal4. However, in an earlier experiment (Figure 5C) they show that the wg-lacZ reporter is downregulated at the DV boundary when tai is overexpressed using en-Gal4. They therefore conclude that wg is not transcriptionally upregulated but is, instead secreted at higher levels when tai is expressed in the source cells. Wg protein is reduced in the DV stripe with tai is overexpressed using the en-Gal4 driver (Figure 6B') and is increased at the same location when tai is overexpressed with the bbg-Gal4 driver. (Figure 8) I don't know how to reconcile these observations.”

      We thank the reviewer for pressing us to develop an overall model explaining our results and how we envision Tai regulating Dlp and Wg. We are preparing a graphic abstract that illustrates this model and will be included in our revision.

      Briefly, we favor a model in which Tai controls the rate of Wg spread via Dlp, without a significant effect on wg transcription. For example, the induction of Dlp across the ‘engrailed’ domain of en>Tai discs (Fig 7B-B”) allows Wg to spread rapidly across the flanks and moderately depletes it from the DV margin (Fig 6B-B”) as noted by the reviewer. Adding a UAS-Dlp transgene in the en>Tai background dramatically accelerates Wg spread and causes it to be depleted from the DV margin and build up at the far end of the gradient adjacent to the dorsal and ventral hinge. Significantly blocking endocytosis of Wg in en>Tai discs with a dominant negative shibire transgene also causes Wg to build up in the same location (new data to be added in a revision) consistent with enhanced spreading. The difference in the bbg-Gal4 experiment is that Tai is only overexpressed in DV margin cells, which constrains and concentrates Wg within this restricted domain; we are in the process of testing whether this effect on Wg is blocked by RNAi of Dlp in bbg>Tai discs.

      “In Figure 9, the tai-low clones have elevated levels of Dlp. How can this be reconciled with the tai-RNAi knockdown shown in Figure 7C' where reducing tai levels causes a strong reduction in Dlp levels?”

      We apologize for not explaining this data well enough. First, the tai[k15101] allele is a weak, viable hypomorph (as shown in our Zhang et al, 2015 paper) whereas the Tai RNAi line is lethal with most drivers (including en-Gal4) and thus a stronger lof. Second, Tai RNAi lower Dlp levels (Fig 7C) while tai[k15101] causes Dlp to accumulate intracellularly (see Fig. 9A-C). These data indicate that reduced Tai leads to a defect in Dlp intracellular trafficking while its loss reduces Dlp overall levels; these data can be explained by a single role for Tai in Dlp traffic to or from the cell membrane, or two roles, one in trafficking and one Dlp expression. As noted, we are investigating both possibilities using dlp reporter lines and our new tai null Crispr allele.

      Reviewer #3:

      Overall Weaknesses:

      “The study has relatively weak evidence for the mechanism of cell competition mediated by Dlp and Wg.”

      The screen and middle section of the paper provide genetic evidence that elevating Wg pathway activity rescues Tai[low} loser cells and that Tai controls levels/localization of Dlp and distribution of Wg in the developing wing disc. Our current work is focused on linking these two finding together in Tai “loser” clones.

      “More evidence is required to support the claim that dlp transcription or endocytosis is affected in tai clones.”

      As noted above, we are testing whether normalizing Dlp levels across clonal boundaries rescues tai[low] loser clones and assessing effects of Tai on dlp transcription and Dlp trafficking.

      Specifics:

      “Most of the rest of the study is not in the clonal context, and mainly relies on RNAi KD of tai in the posterior compartment, which is a relatively large group of cells. I understand why the authors chose a different approach to investigate the role of tai in cell competition. However because ubiquitous loss of tai results in smaller organs, it is important to determine to what extent reducing levels of tai in the entire posterior compartment compares with clonal elimination i.e. cell competition. This is important in order to determine to what extent the paradigm of Tai-mediated regulation of Dlp levels and by extension, Wg availability, can be extended as a general mechanism underlying competitive elimination of tai-low clones. If the authors want to make a case for mechanisms involved in the competitive elimination of tai clones, then they need to show that the KD of tai in the posterior compartment shows hallmarks of cell competition. Is there cell death along the A/P boundary? Or is the compartment smaller because those cells are growing slower?”

      Based on data that cell competition does not occur over compartment boundaries (e.g., see review by L.A. Johnston, Science, 2009), we chose not to use UAS-Gal4 to assess competition, but rather to investigate underlying biology occurring between Tai, Wg, and Dlp.

      “Are the levels of Myc/DIAP1, proteins required for fitness, affected in en>tai RNAi cells?”

      This is, of course, an interesting question given that Myc is a well-studied competition factor and is proposed to be downstream of the Tai-interacting protein Yki. We are not currently focused on Myc, but plan to test its role in the Tai-Dlp-Wg pathway in future work.

      “The authors do not have direct/strong evidence of changes in dlp mRNA levels or intracellular trafficking. To back these claims, the authors should look for dlp mRNA levels and provide more evidence for Dlp endocytosis like an antibody uptake assay or at the very least, a higher resolution image analysis showing a change in the number of intracellular Dlp positive punctae. Also, do the authors think that loss of tai increases Dlp endocytosis, making it less available on the cell surface for maintaining adequate extracellular Wg levels?”

      As noted above, have added experiments using a dominant-negative shibire/dynamin allele to test whether Tai controls Dlp endocytosis. These data will be added to a revised manuscript. We have also gathered reagents to test effects of Tai gain/loss on Dlp secretion.

      “The data shown in the last figure is at odds with the model (I think) the authors are trying to establish: When cells have lower Tai levels, this reduces Dlp levels (S2) presumably either by reducing dlp transcription and/or increasing (?) Dlp endocytosis. This in turn reduces Wg (availability) in cells away from source cells (Figure 6). The reduced Wg availability makes them less fit, targeting them for competitive elimination. But in tai clones, I do not see any change in cell-surface Dlp (9B) (I would have expected them to be down based on the proposed model). The authors also see more total Dlp (9A) (which is at odds with S2 assuming data in S2 were done under permeabilizing conditions.).”

      As noted above (under Rev #2 comments), we apologize for not explaining this data well enough. First, the tai[k15101] allele is a weak, viable hypomorph (as shown in our Zhang et al, 2015 paper) whereas the Tai RNAi line is lethal with most drivers (including en-Gal4) and thus a stronger lof. Second, Tai RNAi lower Dlp levels (Fig 7C) while tai[k15101] causes Dlp to accumulate intracellularly (see Fig. 9A-C). These data indicate that reduced Tai leads to a defect in Dlp intracellular trafficking while its loss reduces Dlp overall levels; these data can be explained by a single role for Tai in Dlp traffic to or from the cell membrane, or two roles, one in trafficking and one Dlp expression. We are investigating both possibilities using dlp reporter lines and our new tai null Crispr allele.

      “As a side note, because Dlp is GPI-anchored, the authors should consider the possibility that the 'total' Dlp staining observed in 9A may not be actually total Dlp (and possibly mostly intracellular Dlp, since the permeabilizing membranes with detergent will cause some (most?) Dlp molecules to be lost, and how this might be affecting the interpretation of the data. I think one way to address this would be to process the permeabilized and non-permeabilized samples simultaneously and then image them at the same settings and compare what membrane staining in these two conditions looks like. If membrane staining in the permeabilized condition is decreased compared to non-permeabilized conditions, and the signal intensity of Dlp in permeabilized conditions remains high, then the authors will have evidence to support increased endocytosis in tai clones. Of course, these data will still need to be reconciled with what is shown in S2.

      We thank the reviewer for this excellent suggestion and are generating mosaic discs to test the proposed approach of synchronous analysis of total vs. intracellular Dlp.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Authors performed a metatranscriptomic analysis from publicly-available datasets of whole blood from 3 places in Indonesia. Their goal was to explore which pathogens were present on the blood of those 117 healthy individuals. It was interesting that reads from Flaviviridae and Plasmodium were detected in asymptomatic subjects.

      Major comments: 1) How did the authors assess and correct batch-effects between different datasets?

      Our response: We have sequencing batch information for the Indonesian dataset and saw no clear clustering based on batches in the first 8 PCs. We recognize that sampling variations may exist between islands, though the taxa matrix we acquired from the unmapped reads are very scarce that such variations did not have a strong enough effect to introduce batch effects in our microbiome analyses, and that the signals were driven by pathogenic reads. For our comparative analyses between datasets, we made sure that all three datasets shared similar processing (collected using Tempus Blood RNA Tubes and went through globin depletion method) and have trimmed both Indonesian and Malian reads to match the length of the UK reads (75BP).

      2) Did the RNA-seq capture poly-A mRNAs? If so... these reads that did not map the human genome were captured because of internal priming. Can they find internal poly A sequences in the genome of Flaviviridae and Plasmodium pathogens? I would like to know that to understand the source of the reads and which other pathogens may be missing (due to the lack of internal priming).

      __Our response: __No, our dataset did not capture poly-A mRNAs. We performed ribosomal RNA (rRNA) and globin mRNA depletion.

      3) Principal coordinates analysis (PCoA) is often utilized in metagenomics analysis. Although they are equivalent, is there a reason for using PCA?

      Our response: Since we used CLR transformation, the resulting matrix lies in Euclidian space. PCA is just a form of PCoA in Euclidian space.

      Minor comments: 1) "Indonesia is a country with large numbers of endemic and emerging infectious diseases [16], making it a crucially important location to monitor and understand the effects of pathogens on human hosts." Is there any epidemiological data that shows differences in infectious diseases across these 3 places? Can the authors provide a map and better explanation about the importance in comparing these 3 areas?

      __Our response: __We have added references to malaria infection being more prevalent in the eastern side of Indonesia in the discussion section.

      2) Why is it so hard to try to identify (only for Flaviviridae reads) reads that map to very relevant viruses, such as Zika, Dengue, and Yellow Fever? Why did the authors state that they "were unable to refine this assignment further" if this is one of the most interesting finding?

      __Our response: __Our reanalysis showed a small percentage of the Flaviviridae reads to be assigned to the Pegivirus genus. As more diverse microbial genomes are added to reference databases and identical regions become more common between them, it becomes harder for the classifer to further define reads to species level (https://link.springer.com/article/10.1186/s13059-018-1554-6). Flaviviridae has distinct species spread across six different genera (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=11050). In comparison, despite Plasmodiidae having more species recorded compared to Flaviviridae, an overwhelming majority of the species is part of the Plasmodium genus, hence we were able to refine them down to species-level (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1639119).

      3) Is the script available at https://gitlab.unimelb.edu.au/igr-lab/Epi_Study ? This reviewer could not access it. __Our response: __We thank Reviewer 1 for pointing this out and have amended the link, now accessible here: https://gitlab.svi.edu.au/muhamad.fachrul/indo_blood_microbiome

      Reviewer #1 (Significance (Required)):

      Interesting paper that enable to extract additional knowledge from whole blood RNA-seq data. There are already several papers that do this and I think authors could go one step forward (for instance, PCR validation of additional individuals). I don't think this can be used for surveillance if it cannot identify species, it is more expensive than running targeted assays, and that may be many false negative pathogens in the samples.

      __Our response: __We thank Reviewer 1 for their comments. We have updated our manuscript to reflect our updated analyses which minimizes false positive taxa and the project’s significance not as a mainline surveillance tool, but a retrospective one.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Bobowik and colleagues perform a computational analysis of whole blood RNA-seq datasets from healthy individuals of three different regions of Indonesia. Their goal is to identify infecting pathogens and other microbes and correlate their abundances to host gene expression patterns or health characteristics in these populations. They find a broad range of bacterial, viral and microeukaryote taxa. When comparing the three Indonesian populations, they find that the Korowai population is the most diverse and different from the other two, possibly driven by the higher prevalence and abundance of Plasmodium (Apicomplexa) in this population.

      Then, the authors conduct a statistical decomposition of human gene expression in these samples in independent factors using ICA, and correlate each of these factors to the abundances of the microbial taxa detected. This analysis allows researchers to associate specific patterns of gene expression, such as immune-related pathways, to the presence of members of the Apicomplexa and Kitrinoviricota phyla.

      Lastly, the authors use previously published data from other two cohorts (from Mali and the UK) to contextualize their blood microbiome findings. They find microbial reads in all datasets. The Mali cohort is characterized by a large abundance of archaea, not found in the other two populations, while the UK cohort has the lower diversity. Altogether, the authors propose the use of RNA-seq data from human whole blood as a way to study the blood microbiome and establish potential associations between blood resident microbes and host gene expression

      Major comments:

      1) The methodology to filter and remove reads from potential contaminants needs to be more stringent to ensure the results do not contain spurious contaminants and that the conclusions are correct. It has been described that genomic databases are heavily contaminated with human sequences (Steinegger and Salzberg, 2020), and in this manuscript, even after a two-pass alignment with STAR, reads mapping to helminths also corresponded to the human genome. Additionally, ad-hoc removal of specific taxa (Metazoa and Viridiplantae) was only performed after suspicion of contamination. However, this ad-hoc removal cannot be performed with microbial (bacterial, viral, etc.) contaminants as there is a risk of removing actual bacteria from the samples. But it has been confirmed that many microbial assemblies also suffer from human contamination. Possible actions to take are the following: a.Perform the human mapping with more lenient parameters to avoid human reads to map to other (likely contaminated) genomes in genome databases. b.Remove common contaminants that have been documented, for instance in blood (Chrisman et al., 2022). c.Run a tool to detect contaminated contigs in the database used to map reads to microbes and remove these problematic contigs from further analysis.

      Our response: We thank Reviewer 2 for the suggestions, especially to address contaminants. We have reanalyzed our data which resulted in much fewer taxa yet still retained the main pathogenic findings.

      2) In line with the above, removing singletons (as I have understood these are taxa that are represented by a single read), is a way to minimize the risk of contamination. To take advantage of the functional profiling of RNA-seq, a measure to ensure that microbes found in blood are active would be to include in the analysis only taxa for which expression of more than a few genes is detected. This type of filtering has been previously applied in studies where very low microbial loads are expected (Lloréns-Rico et al., 2021). In this study, it has only been applied to the specific case of the archaeal taxon Methanocaldococcaceae. However, I would expect cleaner results if applied consistently to all taxa detected.

      __Our response: __We have reanalyzed the data and applied this to all taxa detected.

      3) The specificity of Methanocaldococcaceae in the samples from Mali is very striking. I am highly suspicious that this only occurs due to a batch effect, even though the authors were highly selective in their cohorts to avoid these. In fact, I extracted the genes spanning the regions highlighted in Supplementary Figure 9 of the Methanocaldococcus jannaschii genome. A BLAST search of these sequences returned, among Methanocaldococcus hits, hits from the ERCC synthetic spike-in sequences, used as internal controls in many RNA-seq experiments. ERCC synthetic spike-in hits appeared for all 4 regions in the genome of M. jannaschii highlighted in this figure. In the original publications of this dataset, there is no reference to the use of these ERCC controls, but given the observed matches, I suggest the authors to perform an extra step in their filtering pipeline to remove all reads mapping to these ERCC standards in all their three cohorts to prevent these sort of batch effects.

      __Our response: __We thank Reviewer 2 for pointing this out. Our reanalysis, which now used proper 2-pass mapping and further downstream classification with both pairs of the reads, no longer detected any archaea.

      4) I am puzzled by the inconsistencies shown between forward and reverse reads when mapping paired-end data. I expect these inconsistencies at lower taxonomic ranks (species or genus level) due to incomplete genomes, but not at higher taxonomic ranks. I wonder if, by performing more stringent filtering of contaminants as suggested above, the consistency between forward and reverse reads increases and both mates can be used, making the mapping more reliable.

      __Our response: __We have reanalyzed the data using both pairs of the reads for classification, resulting in less detected taxa. We believe the new results are more robust as it no longer includes taxa that are not typically found in humans (such as the archae Methanocaldococcus and other environmental bacteria).

      In summary, my main concerns regarding this manuscript involve the possibility that contaminants in the sequencing data may be the cause of some of the results presented, and I tried to propose ways of dealing with these contaminants. While some of the results may not be affected by detection of contaminants (i.e. the association between Apicomplexa and some ICs), others such as the diversity measures or the comparison across cohorts may be severely affected. I will consider these results highly preliminary until a more thorough and stringent approach for contaminant removal is applied.

      Our response: We thank Reviewer 2 for the suggestions and have updated our manuscript with results updated analyses that are more stringent towards contaminants, as can be seen from our updated findings.

      Minor comments:

      1) I would appreciate some of the analyses done at lower taxonomic levels if the sparsity of the data allows it, after removing contaminants. Given that the CLR transformation does not allow for zeros, other alternatives such as GMPR (Chen et al., 2018) or adding a pseudocount would allow these analyses?

      __Our response: __After our reanalysis, we ended up with even sparser data and therefore could not perform the analyses at lower taxonomic levels.

      2) In the PCA shown in figure 1, does the number of microbial reads detected correlate with any of the first two components?

      __Our response: __Yes Plamosdiidae correlates well with PCs 1 and 2 (0.66 & 0.73) and Flaviviridae correlates very strongly with PC1 (0.917). We have added this detail in the results section.

      3) In Figure 1C, the x axis is wrongly named PC2.

      __Our response: __We thank Reviewer 2 for pointing this out and have amended this detail.

      4) There is a typo in the legend of Figure 1A ("showeing")

      __Our response: __We thank Reviewer 2 for pointing this out and have amended this detail.

      5) In the alpha diversity estimates comparison across the three different cohorts, after subsampling each population to achieve similar sample size in each cohort, it is stated that "after subsampling, each population had similar diversity estimates". However, the numbers shown afterwards corresponding to the mean values of alpha diversity, without confidence intervals or a boxplot/violin plot together with an accompanying statistical test, are not enough to assess similarity. I would appreciate a figure (similar to Figure 3E and F) or a test accompanying these mean values.

      __Our response: __We thank Reviewer 2 for pointing this out and have amended this detail.

      6) In the volcano plots (Figure 3A, B and others throughout the manuscript) it would help the reader to add lines for the thresholds chosen for the effect size and -log10(p-value) to separate significant results.

      __Our response: __We thank Reviewer 2 for pointing this out and have amended this detail.

      7) In Figure 3E and F, I would appreciate having bars for the statistically significant comparisons.

      __Our response: __We thank Reviewer 2 for pointing this out and have amended this detail.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides valuable information on the mechanism of PepT2 through enhanced-sampling molecular dynamics, backed by cell-based assays, highlighting the importance of protonation of selected residues for the function of a proton-coupled oligopeptide transporter (hsPepT2). The molecular dynamics approaches are convincing, but with limitations that could be addressed in the manuscript, including lack of incorporation of a protonation coordinate in the free energy landscape, possibility of protonation of the substrate, errors with the chosen constant pH MD method for membrane proteins, dismissal of hysteresis emerging from the MEMENTO method, and the likelihood of other residues being affected by peptide binding. Some changes to the presentation could be considered, including a better description of pKa calculations and the inclusion of error bars in all PMFs. Overall, the findings will appeal to structural biologists, biochemists, and biophysicists studying membrane transporters.

      We would like to express our gratitude to the reviewers for providing their feedback on our manuscript, and also for recognising the variety of computational methods employed, the amount of sampling collected and the experimental validation undertaken. Following the individual reviewer comments, as addressed point-by-point below, we have prepared a revised manuscript, but before that we address some of the comments made above in the general assessment:

      • “lack of incorporation of a protonation coordinate in the free energy landscape”.

      We acknowledge that of course it would be highly desirable to treat protonation state changes explicitly and fully coupled to conformational changes. However, at this point in time, evaluating such a free energy landscape is not computationally feasible (especially considering that the non-reactive approach taken here already amounts to almost 1ms of total sampling time).  Previous reports in the literature tend to focus on either simpler systems or a reduced subset of a larger problem.  As we were trying to obtain information on the whole transport cycle, we decided to focus here on non-reactive methods.

      • “possibility of protonation of the substrate”.

      The reviewers are correct in pointing out this possibility, which we had not discussed explicitly in our manuscript.  Briefly, while we describe a mechanism in which protonation of only protein residues (with an unprotonated ligand) can account for driving all the necessary conformational changes of the transport cycle, there is some evidence for a further intermediate protonation site in our data (as we commented on in the first version of the manuscript as well), which may or may not be the substrate itself. A future explicit treatment of the proton movements through the transporter, when it will become computationally tractable to do so, will have to include the substrate as a possible protonation site; for the present moment, we have amended our discussion to alert the reader to the possibility that the substrate could be an intermediate to proton transport. This has repercussions for our study of the E56 pKa value, where – if protons reside with a significant population at the substrate C-terminus – our calculated shift in pKa upon substrate binding could be an overestimate, although we would qualitatively expect the direction of shift to be unaffected. However, we also anticipate that treating this potential coupling explicitly would make convergence of any CpHMD calculation impractical to achieve and thus it may be the case that for now only a semi-quantitative conclusion is all that can be obtained.

      • “errors with the chosen constant pH MD method for membrane proteins”.

      We acknowledge that – as reviewer #1 has reminded us – the AMBER implementation of hybrid-solvent CpHMD is not rigorous for membrane proteins, and as such added a cautionary note to our paper.  We also explain how the use of the ABFE thermodynamic cycle calculations helps to validate the CpHMD results in a completely orthogonal manner (we have promoted this validation, which was in the supplementary figures, into the main text in the revised version).   We therefore remain reasonably confident in the results presented with regards to the reported pKa shift of E56 upon substrate binding, and suggest that if the impact of neglecting the membrane in the implicit-solvent stage of CpHMD is significant, then there is likely an error cancellation when considering shifts induced by the incoming substrate.

      • “dismissal of hysteresis emerging from the MEMENTO method”.

      We have shown in our method design paper how the use of the MEMENTO method drastically reduces hysteresis compared to steered MD for path generation, and find this improvement again for PepT2 in this study. We address reviewer #3’s concern about our presentation on this point by revising our introduction of the MEMENTO method, as detailed in the response below.

      • “the likelihood of other residues being affected by peptide binding”.

      In this study, we have investigated in detail the involvement of several residues in proton-coupled di-peptide transport by PepT2. Short of the potential intermediate protonation site mentioned above, the set of residues we investigate form a minimal set of sorts within which the important driving forces of alternating access can be rationalised.  We have not investigated in substantial detail here the residues involved in holding the peptide in the binding site, as they are well studied in the literature and ligand promiscuity is not the problem of interest here. It remains entirely possible that further processes contribute to the mechanism of driving conformational changes by involving other residues not considered in this paper. We have now made our speculation that an ensemble of different processes may be contributing simultaneously more explicit in our revision, but do not believe any of our conclusions would be affected by this.

      As for the additional suggested changes in presentation, we provide the requested details on the CpHMD analysis. Furthermore, we use the convergence data presented separately in figures S12 and S16 to include error bars on our 1D-reprojections of the 2D-PMFs in figures 3, 4 and 5. (Note that we have opted to not do so in figures S10 and S15 which collate all 1D PMF reprojections for the OCC ↔ OF and OCC ↔ IF transitions in single reference plots, respectively, to avoid overcrowding those necessarily busy figures). We have also changed the colours schemes of these plots in our revision to improve accessibility. We have additionally taken the opportunity to fix some typos and further clarified some other statements throughout the manuscript, besides the requests from the reviewers.

      Reviewer #1 (Public Review):

      The authors have performed all-atom MD simulations to study the working mechanism of hsPepT2. It is widely accepted that conformational transitions of proton-coupled oligopeptide transporters (POTs) are linked with gating hydrogen bonds and salt bridges involving protonatable residues, whose protonation triggers gate openings. Through unbiased MD simulations, the authors identified extra-cellular (H87 and D342) and intra-cellular (E53 and E622) triggers. The authors then validated these triggers using free energy calculations (FECs) and assessed the engagement of the substrate (Ala-Phe dipeptide). The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cellbased transport assays. An alternating-access mechanism was proposed. The study was largely conducted properly, and the paper was well-organized. However, I have a couple of concerns for the authors to consider addressing.

      We would like to note here that it may be slightly misleading to the reader to state that “The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cell-based transport assays.” The cellbased transport assays confirmed the importance of the extracellular gating trigger residues H87, S321 and D342 (as mentioned in the preceding sentence), not of the substrate-protonation link as this line might be understood to suggest.

      (1) As a proton-coupled membrane protein, the conformational dynamics of hsPepT2 are closely coupled to protonation events of gating residues. Instead of using semi-reactive methods like CpHMD or reactive methods such as reactive MD, where the coupling is accounted for, the authors opted for extensive non-reactive regular MD simulations to explore this coupling. Note that I am not criticizing the choice of methods, and I think those regular MD simulations were well-designed and conducted. But I do have two concerns.

      a) Ideally, proton-coupled conformational transitions should be modelled using a free energy landscape with two or more reaction coordinates (or CVs), with one describing the protonation event and the other describing the conformational transitions. The minimum free energy path then illustrates the reaction progress, such as OCC/H87D342-  →  OCC/H87HD342H →  OF/H87HD342H as displayed in Figure 3.

      We concur with the reviewer that the ideal way of describing the processes studied in our paper would be as a higher-dimensional free energy landscapes obtained from a simulation method that can explicitly model proton-transfer processes. Indeed, it would have been particularly interesting and potentially informative with regards to the movement of protons down into the transporter in the OF → OCC → IF sequence of transitions. As we note in our discussion on the H87→E56 proton transfer: 

      “This could be investigated using reactive MD or QM/MM simulations (both approaches have been employed for other protonation steps of prokaryotic peptide transporters, see Parker et al. (2017) and Li et al. (2022)).  However, the putative path is very long (≈ 1.7 nm between H87 and E56) and may or may not involve a large number of intermediate protonatable residues, in addition to binding site water. While such an investigation is possible in principle, it is beyond the scope of the present study.” 

      Where even sampling the proton transfer step itself in an essentially static protein conformation would be pushing the boundaries of what has been achieved in the field, we believe that considering the current state-of-the-art, a fully coupled investigation of large-scale conformational changes and proton-transfer reaction is not yet feasible in a realistic/practical time frame. We also note this limitation already when we say that:

      “The question of whether proton binding happens in OCC or OF warrants further investigation, and indeed the co-existence of several mechanisms may be plausible here”. 

      Nonetheless, we are actively exploring approaches to treat uptake and movement of protons explicitly for future work.

      In our revision, we have expanded on our discussion of the reasoning behind employing a non-reactive approach and the limitations that imposes on what questions can be answered in this study.

      Without including the protonation as a CV, the authors tried to model the free energy changes from multiple FECs using different charge states of H87 and D342. This is a practical workaround, and the conclusion drawn (the OCC→ OF transition is downhill with protonated H87 and D342) seems valid. However, I don't think the OF states with different charge states (OF/H87D342-, OF/H87HD342-, OF/H87D342H, and OF/H87HD342H) are equally stable, as plotted in Figure 3b. The concern extends to other cases like Figures 4b, S7, S10, S12, S15, and S16. While it may be appropriate to match all four OF states in the free energy plot for comparison purposes, the authors should clarify this to ensure readers are not misled.

      The reviewer is correct in their assessment that the aligning of PMFs in these figures is arbitrary; no relative free energies of the PMFs to each other can be estimated without explicit free energy calculations at least of protonation events at the end state basins. The PMFs in our figures are merely superimposed for illustrating the differences in shape between the obtained profiles in each condition, as discussed in the text, and we now make this clear in the appropriate figure captions.

      b) Regarding the substrate impact, it appears that the authors assumed fixed protonation states. I am afraid this is not necessarily the case. Variations in PepT2 stoichiometry suggest that substrates likely participate in proton transport, like the Phe-Ala (2:1) and Phe-Gln (1:1) dipeptides mentioned in the introduction. And it is not rigorous to assume that the N- and C-termini of a peptide do not protonate/deprotonate when transported. I think the authors should explicitly state that the current work and the proposed mechanism (Figure 8) are based on the assumption that the substrates do not uptake/release proton(s).

      This is indeed an assumption inherent in the current work. While we do “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change” we do not in the previous version indicate explicitly that this may involve the substrate. We make clear the assumption and this possibility in the revised version of our paper. Indeed, as we discuss, there is some evidence in our PMFs of an additional protonation site not considered thus far, which may or may not be the substrate. We now make note of this point in the revised manuscript.

      As for what information can be drawn from the given experimental stoichiometries, we note in our paper that “a 2:1 stoichiometry was reported for the neutral di-peptide D-Phe-L-Ala and 3:1 for anionic D-Phe-L-Glu. (Chen et al., 1999) Alternatively, Fei et al. (1999) have found 1:1 stoichiometries for either of D-Phe-L-Gln (neutral), D-Phe-L-Glu (anionic), and D-Phe-L-Lys (cationic).” 

      We do not assume that it is our place to arbit among the apparent discrepancies in the experimental data here, although we believe that our assumed 2:1 stoichiometry is additionally “motivated also by our computational results that indicate distinct and additive roles played by two protons in the conformational cycle mechanism”.

      (2) I have more serious concerns about the CpHMD employed in the study.

      a) The CpHMD in AMBER is not rigorous for membrane simulations. The underlying generalized Born model fails to consider the membrane environment when updating charge states. In other words, the CpHMD places a membrane protein in a water environment to judge if changes in charge states are energetically favorable. While this might not be a big issue for peripheral residues of membrane proteins, it is likely unphysical for internal residues like the ExxER motif. As I recall, the developers have never used the method to study membrane proteins themselves. The only CpHMD variant suitable for membrane proteins is the membrane-enabled hybrid-solvent CpHMD in CHARMM. While I do not expect the authors to redo their CpHMD simulations, I do hope the authors recognize the limitations of their method.

      We discuss the limitations of the AMBER CpHMD implementation in the revised version. However, despite that, we believe we have in fact provided sufficient grounds for our conclusion that substrate binding affects ExxER motif protonation in the following way.

      In addition to CpHMD simulations, we establish the same effect via ABFE calculations, where the substrate affinity is different at the E56 deprotonated vs protonated protein. This was figure S20 before, though in the revised version we have moved this piece of validation into a new panel of figure 6 in the main text, since it becomes more important with the CpHMD membrane problem in mind. Since the ABFE calculations are conducted with an all-atom representation of the lipids and the thermodynamic cycle closes well, it would appear that if the chosen CpHMD method has a systematic error of significant magnitude for this particular membrane protein system, there may be the benefit of error cancellation. While the calculated absolute pKa values may not be reliable, the difference made by substrate binding appears to be so, as judged by the orthogonal ABFE technique.

      Although the reviewer does “not expect the authors to redo their CpHMD simulations”, we consider that it may be helpful to the reader to share in this response some results from trials using the continuous, all-atom constant pH implementation that has recently become available in GROMACS (Aho et al 2022, https://pubs.acs.org/doi/10.1021/acs.jctc.2c00516) and can be used rigorously with membrane proteins, given its all-atom lipid representation.

      Unfortunately, when trying to titrate E56 in this CpHMD implementation, we found few protonationstate transitions taking place, and the system often got stuck in protonation state–local conformation coupled minima (which need to interconvert through rearrangements of the salt bridge network involving slow side-chain dihedral rotations in E53, E56 and R57). Author response image 1 shows this for the apo OF state, Author response image 2 shows how noisy attempts at pKa estimation from this data turn out to be, necessitating the use of a hybrid-solvent method.

      Author response image 1.

      All-atom CpHMD simulations of apo-OF PepT2. Red indicates protonated E56, blue is deprotonated.

      Author response image 2.

      Difficulty in calculating the E56 pKa value from the noisy all-atom CpHMD data shown in Author response image 1.

      b) It appears that the authors did not make the substrate (Ala-Phe dipeptide) protonatable in holosimulations. This oversight prevents a complete representation of ligand-induced protonation events, particularly given that the substrate ion pairs with hsPepT2 through its N- & C-termini. I believe it would be valuable for the authors to acknowledge this potential limitation. 

      In this study, we implicitly assumed from the outset that the substrate does not get protonated, which – as by way of response to the comment above – we now acknowledge explicitly. This potential limitation for the available mechanisms for proton transfer also applies to our investigation of the ExxER protonation states. In particular, a semi-grand canonical ensemble that takes into account the possibility of substrate C-terminus protonation may also sample states in which the substrate is protonated and oriented away from R57, thus leaving the ExxER salt bridge network in an apo-like state. The consequence would be that while the direction of shift in E56 pKa value will be the same, our CpHMD may overestimate its magnitude. It would thus be interesting to make the C-terminus protonatable for obtaining better quantitative estimates of the E56 pKa shift (as is indeed true in general for any other protein protonatable residue, though the effects are usually assumed to be negligible). We do note, however, that convergence of the CpHMD simulations would be much harder if the slow degree of freedom of substrate reorientation (which in our experience takes 10s to 100s of nanoseconds in this binding pocket) needs to be implicitly equilibrated upon protonation state transitions. We discuss such considerations in the revised paper.

      Reviewer #2 (Public Review):

      This is an interesting manuscript that describes a series of molecular dynamics studies on the peptide transporter PepT2 (SLC15A2). They examine, in particular, the effect on the transport cycle of protonation of various charged amino acids within the protein. They then validate their conclusions by mutating two of the residues that they predict to be critical for transport in cell-based transport assays. The study suggests a series of protonation steps that are necessary for transport to occur in Petp2. Comparison with bacterial proteins from the same family shows that while the overall architecture of the proteins and likely mechanism are similar, the residues involved in the mechanism may differ. 

      Strengths: 

      This is an interesting and rigorous study that uses various state-of-the-art molecular dynamics techniques to dissect the transport cycle of PepT2 with nearly 1ms of sampling. It gives insight into the transport mechanism, investigating how the protonation of selected residues can alter the energetic barriers between various states of the transport cycle. The authors have, in general, been very careful in their interpretation of the data. 

      Weaknesses: 

      Interestingly, they suggest that there is an additional protonation event that may take place as the protein goes from occluded to inward-facing but they have not identified this residue.

      We have indeed suggested that there may be an additional protonation site involved in the conformational cycle that we have not been able to capture, which – as we discuss in our paper – might be indicated by the shapes of the OCC ↔ IF PMFs given in Figure S15. One possibility is for this to be the substrate itself (see the response to reviewer #1 above) though within the scope of this study the precise pathway by which protons move down the transporter and the exact ordering of conformational change and proton transfer reactions remains a (partially) open question. We acknowledge this, denote it with question marks in the mechanistic overview we give in Figure 8 and also “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change”.

      Some things are a little unclear. For instance, where does the state that they have defined as occluded sit on the diagram in Figure 1a? - is it truly the occluded state as shown on the diagram or does it tend to inward- or outward-facing?

      Figure 1a is a simple schematic overview intended to show which structures of PepT2 homologues are available to use in simulations. This was not meant to be a quantitative classification of states. Nonetheless, we can note that the OCC state we derived has extra- and intracellular gate opening distances (as measured by the simple CVs defined in the methods and illustrated in Figure 2a) that indicate full gate closure at both sides. In particular, although it was derived from the IF state via biased sampling, the intracellular gate opening distance in the OCC state used for our conformational change enhanced sampling was comparable to that of the OF state (ie, full closure of the gate), see Figure S2b and the grey bars therein. Therefore, we would schematically classify the OCC state to lie at the center of the diagram in Figure 1a. Furthermore, it is largely stable over triplicates of 1 μslong unbiased MD, where in 2/3 replicates the gates remain stable, and the remaining replicate there is partial opening of the intracellular gate (as shown in Figure 2 b/c under the “apo standard” condition). We comment on this in the main text by saying that “The intracellular gate, by contrast, is more flexible than the extracellular gate even in the apo, standard protonation state”, and link it to the lower barrier for transition to IF than to OF. We did this by saying that “As for the OCC↔OF transitions, these results explain the behaviour we had previously observed in the unbiased MD of Figure 2c.” We acknowledge this was not sufficiently clear and have added details to the latter sentence to help clarify better the nature of the occluded state.

      The pKa calculations and their interpretation are a bit unclear. Firstly, it is unclear whether they are using all the data in the calculations of the histograms, or just selected data and if so on what basis was this selection done. Secondly, they dismiss the pKa calculations of E53 in the outward-facing form as not being affected by peptide binding but say that E56 is when there seems to be a similar change in profile in the histograms.

      In our manuscript, we have provided two distinct analyses of the raw CpHMD data. Firstly, we analysed the data by the replicates in which our simulations were conducted (Figure 6, shown as bar plots with mean from triplicates +/- standard deviation), where we found that only the effect on E56 protonation was distinct as lying beyond the combined error bars. This analysis uses the full amount of sampling conducted for each replicate. However, since we found that the range of pKa values estimated from 10ns/window chunks was larger than the error bars obtained from the replicate analysis (Figures S17 and S18), we sought to verify our conclusion by pooling all chunk estimates and plotting histograms (Figure S19). We recover from those the effect of substrate binding on the E56 protonation state on both the OF and OCC states. However, as the reviewer has pointed out (something we did not discuss in our original manuscript), there is a shift in the pKa of E53 of the OF state only. In fact, the trend is also apparent in the replicate-based analysis of Figure 6, though here the larger error bars overlap. In our revision, we added more details of these analyses for clarity (including more detailed figure captions regarding the data used in Figure 6) as well as a discussion of the partial effect on the E53 pKa value. 

      We do not believe, however, that our key conclusions are negatively affected. If anything, a further effect on the E53 pKa which we had not previously commented on (since we saw the evidence as weaker, pertaining to only one conformational state) would strengthen the case for an involvement of the ExxER motif in ligand coupling.

      Reviewer #3 (Public Review):

      Summary: 

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most wellstudied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions. 

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family. 

      Some of the key results include:

      (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition. 

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down. 

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations. 

      Strengths: 

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses: 

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge. 

      The reviewer is right to point out that the statement and Figure S3 as they were do not adequately support our decision to exclude the K64-D317 salt-bridge in our further investigations. The violin plot shown in Figure S3, visualised as pooled data from unbiased 1 μs triplicates, did indeed not rule out a scenario where the salt bridge only formed late in our simulations (or only in some replicates), but then is stable. Therefore, in our revision, we include the appropriate time-series of the salt bridge distances, showing how K64-D317 is initially stable but then falls apart in replicate 1, and is transiently formed and disengaged across the trajectories in replicates 2 and 3. We have also remade the data for this plot as we discovered a bug in the relevant analysis script that meant the D170-K642 distance was not calculated accurately. The results are however almost identical, and our conclusions remain.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      If the conclusions were based on that alone, then we would agree.  However, this section of work covers merely the observations of the initial unbiased simulations which we go on to test/explore with enhanced sampling in the rest of the paper, and which then lead us to the eventual conclusions.

      Figure S5 shows the results from triplicate 1 μs-long trajectories as violin-plot histograms of the extracellular gate opening distance, also indicating the first and final frames of the trajectories as connected by an arrow for orientation – a format we chose for intuitively comparing 48 trajectories in one plot. The reviewer reads the plot correctly when they analyse the “WT H87-prot” vs “D342A H87-prot” conditions. In the former case, no spontaneous opening in unbiased MD is taking place, whereas when D342 is mutated to alanine in addition to H87 protonation, we see spontaneous transition in 1 out of 3 replicates.  However, the reviewer does not seem to interpret the statement in question in our paper (“the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed”) in the way we intended it to be understood. We merely want to note here a correlation in the unbiased dataset we collected at this stage, and indeed the one spontaneous opening in the case comparison picked out by the reviewer is in the condition where both the H87 interaction network and D342-R206 are perturbed. In noting this we do not intend to make statistically significant statements from the limited dataset. Instead, we write that “these simulations show a large amount of stochasticity and drawing clean conclusions from the data is difficult”. We do however stand by our assessment that from this limited data we can “already appreciate a possible mechanism where protons move down the transporter pore” – a hypothesis we investigate more rigorously with enhanced sampling in the rest of the paper. We have revised the section in question to make clearer that the unbiased MD is only meant to give an initial hypothesis here to be investigated in more detail in the following sections. In doing so, we also incorporate, as we had not done before, the case (not picked out by the reviewer here but concerning the same figure) of S321A & H87 prot. In the third replicate, this shows partial gate opening towards the end of the unbiased trajectory (despite D342 not being affected), highlighting further the stochastic nature that makes even clear correlative conclusions difficult to draw.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresisfree; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

      We certainly do not intend to claim that the MEMENTO method is flawless. The concern the reviewer raises around the statement "These paths are then by definition non-hysteretic" is perhaps best addressed by a clarification of the language used and considering how MEMENTO is applied in this work. 

      Hysteresis in the most general sense denotes the dependence of a system on its history, or – more specifically – the lagging behind of the system state with regards to some physical driver (for example the external field in magnetism, whence the term originates). In the context of biased MD and enhanced sampling, hysteresis commonly denotes the phenomenon where a path created by a biased dynamics method along a certain collective variable lags behind in phase space in slow orthogonal degrees of freedom (see Figure 1 in Lichtinger and Biggin 2023, https://doi.org/10.1021/acs.jctc.3c00140). When used to generate free energy profiles, this can manifest as starting state bias, where the conformational state that was used to seed the biased dynamics appears lower in free energy than alternative states. Figure S6 shows this effect on the PepT2 system for both steered MD (heavy atom RMSD CV) + umbrella sampling (tip CV) and metadynamics (tip CV). There is, in essence, a coupled problem: without an appropriate CV (which we did not have to start with here), path generation that is required for enhanced sampling displays hysteresis, but the refinement of CVs is only feasible when paths connecting the true phase space basins of the two conformations are available. MEMENTO helps solve this issue by reconstructing protein conformations along morphing paths which perform much better than steered MD paths with respect to giving consistent free energy profiles (see Figure S7 and the validation cases in the MEMENTO paper), even if the same CV is used in umbrella sampling. 

      There are still differences between replicates in those PMFs, indicating slow conformational flexibility propagated from end-state sampling through MEMENTO. We use this to refine the CVs further with dimensionality reduction (see the Method section and Figure S8), before moving to 2D-umbrella sampling (figure 3). Here, we think, the reviewer’s point seems to bear. The MEMENTO paths are ‘non-hysteretic by definition’ with respect to given end states in the sense that they connect (by definition) the correct conformations at both end-states (unlike steered MD), which in enhanced sampling manifests as the absence of the strong starting-state bias we had previously observed (Figure S7 vs S6). They are not, however, hysteresis-free with regards to how representative of the end-state conformational flexibility the structures given to MEMENTO really were, which is where the iterative CV design and combination of several MEMENTO paths in 2D-PMFs comes in. 

      We also cannot make a direct claim about whether in the transition region the MEMENTO paths might be separated from the true (lower free energy) transition paths by slow orthogonal degrees of freedom, which may conceivably result in overestimated barrier heights separating two free energy basins. We cannot guarantee that this is not the case, but neither in our MEMENTO validation examples nor in this work have we encountered any indications of a problem here.

      We hope that the reviewer will be satisfied by our revision, where we replace the wording in question by a statement that the MEMENTO paths do not suffer from hysteresis that is otherwise incurred as a consequence of not reaching the correct target state in the biased run (in some orthogonal degrees of freedom).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Figure S1: it would be useful to label the panels.

      We have now done this.

      At the bottom of page 4, it is written that "the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." But it is hard to interpret that from the figure.  

      See also our response to reviewer #3. We have revised the wording of this statement, and also highlight in Figure S5 the crucial runs we are referring to, in order to make them easier to discern.

      At the bottom of page 5, and top of page 6, there is a lot of "other" information shown, which is inserted for the record - this is a bit glossed over and hard to follow.

      The “other” information refers to further conditions we had calculated PMFs for and that gave some insight, but which were secondary for drawing our key conclusions. We thank the reviewer for their feedback that this section needs clarification. We have revised this paragraph to make it easier to follow and highlight better the conclusions we draw form the data.

      In Figure 7 it looks as though the asterisks have shifted.

      We are indebted to the reviewer for spotting this error, the asterisks are indeed shifted one bar to the right of their intended position. The revised version fixes this issue.

      Reviewer #3 (Recommendations For The Authors):

      Minor points: In Figure 1a, The 7PMY label and arrow are slightly misplaced.

      Figure 1a is a schematic diagram to show the available structures of PepT2 homologues (see also the response to reviewer #2 above). The 7PMY label placement is intentional to indicate a partially occluded inwards-facing state. As we write in the figure caption: “Intermediate positions between states indicate partial gate opening”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      TMC7 knockout mice were generated by the authors and the phenotype was analyzed. They found that Tmc7 is localized to Golgi and is needed for acrosome biogenesis.

      Strengths:

      The phenotype of infertility is clear, and the results of TMC7 localization and the failed acrosome formation are highly reliable. In this respect, they made a significant discovery regarding spermatogenesis.

      Weaknesses:

      There are also some concerns, which are mainly related to the molecular function of TMC7 and Figure 5.

      (1) It is understandable that TMC7 exhibits some channel activity in the Golgi and somehow affects luminal pH or Ca2+, leading to the failure of acrosome formation. On the other hand, since they are conducting the pH and calcium imaging from the cytoplasm, I do not think that the effect of TMC7 channel function in Golgi is detectable with their methods.

      We agree with the reviewer that there are no direct evidences showing the effect of TMC7 channel function in Golgi. We have changed the description in the revised manuscript.

      (2) Rather, it is more likely that they are detecting apoptotic cells that have no longer normal ion homeostasis.

      We thank the reviewer for raising this concern. We apologize for not labeling the postnatal stage in original Figure 5. We measured intracellular Ca2+, pH and ROS in PD30 testes (revised Fig. S6a-c), no apoptotic cells were observed at this stage (revised Fig. S6e, f). Apoptotic cells were found in the seminiferous tubules and cauda epididymis of 9-week-old Tmc7–/– mice (revised Fig. 5e-f). We have included TUNEL data in testis of PD21, PD30 and 9-week-old mice (revised Fig. 5e, f and Fig. S6e, f). In accordance with our findings, Tmc1 mutation has also been shown to result in reduced Ca2+ permeability, thus triggering hair cell apoptosis (Fettiplace, R, PNAS. 2022) [1].

      (3) Another concern is that n is only 3 for these imaging experiments.

      As suggested by the reviewer, more replicates were included in imaging experiments.

      Reviewer #2 (Public Review):

      Summary:

      This study presents a significant finding that enhances our understanding of spermatogenesis. TMC7 belongs to a family of transmembrane channel-like proteins (TMC1-8), primarily known for their role in the ear. Mutations to TMC1/2 are linked to deafness in humans and mice and were originally characterized as auditory mechanosensitive ion channels. However, the function of the other TMC family members remains poorly characterized. In this study, the authors begin to elucidate the function of TMC7 in acrosome biogenesis during spermatogenesis. Through analysis of transcriptomics datasets, they identify TMC7 as a transmembrane channel-like protein with elevated transcript levels in round spermatids in both mouse and human testis. They then generate Tmc7-/- mice and find that male mice exhibit smaller testes and complete infertility. Examination of different developmental stages reveals spermatogenesis defects, including reduced sperm count, elongated spermatids, and large vacuoles. Additionally, abnormal acrosome morphology is observed beginning at the early-stage Golgi phase, indicating TMC7's involvement in proacrosomal vesicle trafficking and fusion. They observed localization of TMC7 in the cis-Golgi and suggest that its presence is required for maintaining Golgi integrity, with Tmc7-/- leading to reduced intracellular Ca2+, elevated pH, and increased ROS levels, likely resulting in spermatid apoptosis. Overall, the work delineates a new function of TMC7 in spermatogenesis and the authors suggest that its ion channel activity is likely important for Golgi homeostasis. This work is of significant interest to the community and is of high quality.

      Strengths:

      The biggest strength of the paper is the phenotypic characterization of the TMC7-/- mouse model, which has clear acrosome biogenesis/spermatogenesis defects. This is the main claim of the paper and it is supported by the data that are presented.

      Weaknesses:

      The claim is that TMC7 functions as an ion channel. It is reasonable to assume this given what has been previously published on the more well-characterized TMCs (TMC1/2), but the data supporting this is preliminary here, and more needs to be done to solidify this hypothesis. The authors are careful in their interpretation and present this merely as a hypothesis supporting this idea.

      We appreciate the insightful comment. It is indeed a limitation of our study that we lack strong evidences to support that TMC7 functions as an ion channel. We have planned to conduct cellular electrophysiology in GC-1 cells heterologous expression of TMC7. However, TMC7 was trapped in the endoplasmic reticulum like TMC1 and TMC2 (Yu X, PNAS. 2020)[2], and failed to localize to the Golgi. According to the reviewer’s suggestion, we have made careful and more detailed interpretation the molecular function of TMC7 in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Wang et al. have demonstrated that TMC7, a testis-enriched multipass transmembrane protein, is essential for male reproduction in mice. Tmc7 KO male mice are sterile due to reduced sperm count and abnormal sperm morphology. TMC7 co-localizes with GM130, a cis-Golgi marker, in round spermatids. The absence of TMC7 results in reduced levels of Golgi proteins, elevated abundance of ER stress markers, as well as changes of Ca2+ and pH levels in the KO testis. However, further confirmation is required because the analyses were performed with whole testis samples in spite of the differences in the germ cell composition in WT and KO testis. In addition, the causal relationships between the reported anomalies await thorough interrogation.

      Strengths:

      The microscopic images are of great quality, all figures are properly arranged, and the entire manuscript is very easy to follow.

      Weaknesses:

      (1) Tmc7 KO male mice show multiple anomalies in sperm production and morphogenesis, such as reduced sperm count, abnormal sperm head, and deformed midpiece. Thus, it is confusing that the authors focused solely on impaired acrosome biogenesis.

      We are grateful to your comments and suggestions. We agree and have added these defects in spermiogenesis of Tmc7–/– mice in the abstract and discussion sections of revised manuscript.

      (2) Further investigations are warranted to determine whether the abnormalities reported in this manuscript (e.g., changes in protein, Ca2+, and pH levels) are directly associated with the molecular function of TMC7 or are the byproducts of partially arrested spermiogenesis. Please find additional comments in "Recommendations for the authors".

      Thank you for raising this concern. Per your comments, we have included data of intracellular Ca2+, pH and ROS in PD21 testes. The intracellular homeostasis was impaired as early as PD21, indicating TMC7 depletion impairs cellular homeostasis which in turn results in arrested spermiogenesis.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      As noted by all three reviewers, current flow cytometry data does not necessarily support the 'ion channel' hypothesis, thus the phenotypic analysis is compelling but the molecular mechanism of how TMC7 facilitates acrosome biogenesis remains incomplete. It is highly recommended for the authors to at least discuss or test alternative hypotheses (as reviewer #2 suggested) such as the possibility of acting as 'lipid scramblase'. Also, the authors need to provide further explanation for other morphological defects if TMC7 is truly a functional ion channel in Golgi (and thus later at acrosome), which is also related to the key question of whether TMC7 is a functional ion channel.

      We thank the reviewing editor for the comments and suggestions. We agree that our study lack strong evidences to support that TMC7 functions as an ion channel. We have discussed the possibility of TMC7 acting as 'lipid scramblase' as suggested. We have also included data of intracellular Ca2+, pH and ROS in PD21, PD30 testes.

      Indeed, Tmc7–/– mice exhibits other defects including abnormal head morphology and disorganized mitochondrial sheaths. As TMC7 is localized to the cis-Golgi apparatus and is required for maintaining Golgi integrity. Previous studies on Golgi localized proteins including GOPC (Yao R, PNAS. 2002)[2], HRB (Kang-Decker N. Science. 2001)[3] and PICK1(Xiao N, JCI. 2009)[4] exhibit similar defects in spermiogenesis with Tmc7–/– mice. It is possible that defects morphologies in Tmc7–/– mice might be due to impaired function of Golgi.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide more details about the imaging experiments using FACS. Since they only describe catalog numbers (Beyotime, S1056, S1006, S0033S) for imaging reagents, it is not immediately clear what reagents they actually used. Since they used Fluo3, BCECF, and DCFH, it would be better to mention their names.

      Thanks. We have provided more detailed antibody information as suggested.

      (2) I am also concerned that in the FACS there is no information at all about laser wavelength and filter properties. This is especially important for BCECF because the wavelength spectrum changes with pH. Also, if there are any positive controls for these imaging reagents, such as ionophores, it would be more convincing to include them.

      Thank you for your comment. Excitation wavelength is 488nm for detecting Ca2+, pH and ROS in FACS. BCECF is the most popular pH probe to monitor cellular pH and the reagent from Beyotime (S1006) has been used by other studies (Chen S, Blood. 2016)[5], (Liu H, Cell Death Dis. 2022)[6]. To make the results more reliable, we have repeated these experiments in PD21 testes (revised Figure 5a-c). No positive controls for these reagents were used in our experiments.

      (3) As noted above, it is better to avoid directly linking the cell's abnormal ion homeostasis to TMC7 ion channel function in the text. The discussion should be changed to emphasize that the TMC7-deficient cells are apoptotic and that these physiological phenomena are occurring as a side effect of this apoptosis.

      Thank you for raising this concern. We agree with the reviewer that there are no direct evidences showing the effect of TMC7 channel function in Golgi and we have changed the description in the revised manuscript.

      We performed new experiment to measure apoptosis and intracellular Ca2+, pH and ROS in PD21 testes. No apoptotic cells were observed at this stage. However, impaired cellular homeostasis was still found in testis of PD21 Tmc7-/- mice. These data suggest that TMC7 depletion impairs cellular homeostasis and hence induces spermatid apoptosis.

      (4) While I understand that it appears to be difficult to experimentally verify the ion channel function of TMC7, it may be supportive to compare its amino acid sequence and/or 3D predicted structure with that of TMC1/2. Including a supplemental figure for this purpose would emphasize the possibility that TMC7 functions as an ion channel.

      We thank the reviewer for making this great suggestion. We compared the amino acid sequence and structure of TMC1, TMC2 with TMC7 respectively. TMC1 had 81% sequence similarity with TMC7 and the RMSD (Root Mean Square Deviation) was 3.079. TMC2 had 82% sequence similarity with TMC7, the RMSD was 2.176. These data suggest that TMC7 has similar amino acid sequence and predicted structure with TMC1/2 and might functions as an ion channel. We have included the predicted structures in revised Fig. S7.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      I do not have any experimental comments or concerns to address, but I do ask that the authors consider an alternative hypothesis. Based on prior data demonstrating that TMC1 is a mechanosensitive ion channel, the authors reasonably assume that TMC7 may also function as an ion channel. Although the authors observe alterations in cytosolic Ca2+ and pH upon loss of TMC7 by flow cytometry, which begins to support this hypothesis, these data do not directly demonstrate ion channel activity.

      I was wondering if the authors had considered whether TMC7 could also function as a lipid scramblase. TMC1 has also been proposed to function as a Ca2+-inhibited scramblase, where knockout of TMC1 leads to a loss of phosphatidylserine (PS) exposure and membrane blebbing at the apical region of hair cells (Ballesteros, A. and Swartz, K., Science Advances, 2022). Furthermore, TMC proteins are structurally related to the Anoctamin/TMEM16 family of chloride channels and lipid scramblases, where TMEM16A-B are bona fide Ca2+-activated chloride channels, and TMEM16C-H are characterized as Ca2+-dependent scramblases. Based on their structural similarity and the observation that TMC1 may also exhibit lipid scrambling properties based on the PS exposure, I wonder if the authors may have data that support a TMC7 scramblase hypothesis. I was intrigued by this idea, especially given the authors' observations of large vacuoles in the seminiferous tubules and cauda epididymis and the vesicle accumulation phenotype in their TEM data. Incorporating this hypothesis into the discussion section, at minimum, could provide a valuable perspective, and this line of thought may lead to interesting data interpretation throughout the paper.

      We thank the reviewer for the valuable suggestion. We have discussed the possibility of TMC7 acting as 'lipid scramblase' as suggested.

      Reviewer #3 (Recommendations For The Authors):

      (1) Gene symbols should be italicized, and protein symbols should be capitalized.

      Thanks. We have made changes to the manuscript as recommended.

      (2) Tmc7 KO males show reduced sperm count, which alters the germ cell composition in the testis (Figure 2g). Thus, it is inappropriate to compare protein levels using whole testis lysates (Figure 3e, 4h, 5d, 5f). Instead, the same immunoblotting analyses could be done with purified round spermatids or 3-wk-old testis. Likewise, the significance of the intracellular Ca2+ and pH measurements is potentially diminished by the differences in the germ cell composition in WT and KO mice.

      We appreciate this constructive suggestion. We agree with the reviewer that whole testis lysates diminished the differences between WT and _Tmc7-/-_mice. However, we are unable purify round spermatids due to the lack of specific markers.

      (3) Figures 2i, 2j: How sperm motility was measured should be specified in the Methods.

      We thank you for your significant reminding and have added sperm motility assessment in Methods section.

      (4) Figure 4g: It does not make sense to compare the fluorescence intensity of these proteins without making sure that the seminiferous tubules are in the same stage. As shown in Figures S5a and S5b, TMC7 exhibits varied abundance in spermatids at different steps.

      We thank the reviewer for the insightful comment. We have replaced images in the same stage seminiferous tubules and compared the fluorescence intensity of new images as suggested.

      (5) Figure 4h: How were the band intensities measured? The third band from the left is visually stronger than the first one, but it does not seem to be so according to the column graph. The reviewer measured the intensity of GRASP65 bands relative to alpha-tubulin by ImageJ and obtained relative intensities of 0.35, 0.87, 0.6, and 0.08 for the bands from left to right. Additional replicates of the western blots should be included in the supplementary figures.

      Thank you for this insightful comment. The density and size of the blots were quantified by Image J. We have checked the first band from the left of GRASP65 and it seems that the protein was not fully transferred onto the PVDF membrane. We have performed new experiments and replaced the original bands (Revised Fig. 4h). Additional replicates of the western blots have been included in revised Fig. S8.

      (6) Figures 5a, 5b: Based on the observation of abnormal intracellular Ca2+ and pH levels in the KO germ cells, the authors concluded that TMC7 maintains the homeostasis of Golgi pH and ion (Lines 223-224, 263-264). However, intracellular Ca2+ and pH levels do not directly reflect those in the Golgi apparatus.

      We thank the reviewer for this important comment. We agree and have changed “Golgi” to “intracellular” as suggested.

      (7) Figure 5c: ROS is produced during apoptosis. Thus, it is not appropriate to conclude that the increased ROS levels in Tmc7 KO germ cells lead to apoptosis.

      According to the reviewer’s comment, we measured ROS and apoptosis in testis of PD21 and PD30 mice. ROS levels were increased, but no apoptotic cells were observed in testis of PD21 and PD30 Tmc7–/– mice. Apoptotic cells were observed in testis of 9-week-old Tmc7–/– mice (Revised Fig. 5e-f). These data suggest that TMC7 depletion results in the accumulation of ROS, thereby leads to apoptosis.

      (1) Fettiplace, R., D.N. Furness, and M. Beurg, The conductance and organization of the TMC1-containing mechanotransducer channel complex in auditory hair cells. Proc Natl Acad Sci U S A, 2022. 119(41): p. e2210849119.

      (2) Yu, X., et al., Deafness mutation D572N of TMC1 destabilizes TMC1 expression by disrupting LHFPL5 binding. Proc Natl Acad Sci U S A, 2020. 117(47): p. 29894-29903.

      (3) Kang-Decker, N., et al., Lack of acrosome formation in Hrb-deficient mice. Science, 2001. 294(5546): p. 1531-3.

      (4) Xiao, N., et al., PICK1 deficiency causes male infertility in mice by disrupting acrosome formation. J Clin Invest, 2009. 119(4): p. 802-12.

      (5) Chen, S., et al., Sympathetic stimulation facilitates thrombopoiesis by promoting megakaryocyte adhesion, migration, and proplatelet formation. Blood, 2016. 127(8): p. 1024-35.

      (6) Liu, H., et al., PRMT5 critically mediates TMAO-induced inflammatory response in vascular smooth muscle cells. Cell Death Dis, 2022. 13(4): p. 299.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study makes an interesting finding: a polyunsaturated fatty acid, Lin-Glycine, increases the conductance of KCNQ1/KCNE1 channels by stabilizing a state of the selectivity filter that allows K+ conduction. The stabilization of a conducting state appears well supported by single-channel analysis, though some method details are missing. The linkage to PUFA action through the selectivity filter is supported by the disruption of PUFA effects by mutation of residues which change conformation in two KCNQ1 structures from the literature. Claims about differences in Lin-Glycine binding to these two structural conformations seem to lack clear support, thus the claim seems speculative that PUFAs increase Gmax by binding to a crevice in the pore domain. A potentially definitive functional experiment is conducted by single-channel recordings with selectivity filter domain mutation Y315F which ablates the Lin-Glycine effect on Gmax. However, this appears to be an n=1 experiment. Overall, the major claim of the abstract is supported: "... that the selectivity filter in KCNQ1 is normally unstable ... and that the PUFA-induced increase in Gmax is caused by a stabilization of the selectivity filter in an open-conductive state." However, the claim in the abstract that selectivity filter instability "explains the low open probability" seems too general.

      We thank the reviewer for the comments, and we would like to address the main concern regarding the single channels. We now state the number of experiments used for the single channel analysis. We agree that the claim in the abstract seems too general and we now made it more specific to our findings.

      Reviewer #2 (Public Review):

      Golluscio et al. address one of the mechanisms of IKs (KCNQ1/KCNE1) channel upregulation by polyunsaturated fatty acids (PUFA). PUFA is known to upregulate KCNQ1 and KCNQ1/KCNE1 channels by two mechanisms: one shifts the voltage dependence to the negative direction, and the other increases the maximum conductance (Gmax). While the first mechanism is known to affect the voltage sensor equilibrium by charge effect, the second mechanism is less known. By applying the single-channel recordings and mutagenesis on the putative binding sites (most of them related to the selectivity filter), they concluded that the selectivity filter is stabilized to a conductive state by PUFA binding.

      Strengths:

      They mainly used single-channel recordings and directly assessed the behavior of the selectivity filter. The method is straightforward and convincing enough to support their claims.

      Weaknesses:

      The structural model they used is the KCNQ1 channel without KCNE1 because KCNQ1/KCNE1 channel complex is not available yet. As the binding site of PUFAs might overlap with KCNE1, it is not very clear how PUFA binds to the KCNQ1 channel in the presence of KCNE1.

      Using other previous PUFA-related KCNQ1 mutants will strengthen their conclusions. For example, the Gmax of the K326E mutant is reduced by PUFA binding. Examining whether K326E shows reduced numbers of non-empty sweeps in the single-channel recordings will be a good addition.

      We thank the reviewer for the public review. We would like to address the main weak points of the comments. As a structure of KCNQ1/KCNE1 in complex is not available yet, we used KCNQ1 alone. We believe that the PUFA and KCNE1 binding sites will not overlap as we previously presented data in agreement with the idea that KCNE1 rotates the VSD relative the PD (Wu et al., 2021). This would leave enough space for both PUFA and KCNE1, so that PUFA can bind to the crevice (K326 and D301) without competing with KCNE1.  We appreciate the suggestion of adding single-channel recordings of K326E mutant and we agree it would make a valuable addition to strengthen our conclusions. However, single channel recordings for KCNQ1 are very challenging and time consuming to obtain, so we would like to keep this in consideration for future studies.

      Reviewer #3 (Public Review):

      This manuscript reveals an important mechanism of KCNQ1/IKs channel gating such that the open state of the pore is unstable and undergoes intermittent closed and open conformations. PUFA enhances the maximum open probability of IKs by binding to a crevice adjacent to the pore and stabilizing the open conformation. This mechanism is supported by convincing single-channel recordings that show empty and open channel traces and the ratio of such traces is affected by PUFA. In addition, mutations of the pore residues alter PUFA effects, convincingly supporting that PUFA alters the interactions among these pore residues.

      Strengths:

      The data are of high quality and the description is clear.

      Weaknesses:

      Some comments about the presentation.

      (1) The structural illustrations in this manuscript in general need to be more clarified.

      (2) The manuscript heavily relies on the comparison between the S4-down and S4-up structures (Figures 3, 4, and 7) to illustrate the difference between the extracellular side of the pore and to lead to the hypothesis of open-state stability being affected by PUFA. This may mislead the readers to think that the closed conformation of the channel in the up-state is the same as that in the down-state.

      We thank the reviewer for the public review, and we would like to address the comments about the presentation. We agree that the structural illustrations need to be more detailed, and we amended our previous illustrations. We have now included a new Figure 3 with a more detailed legend and a new Figure 4 that includes more information, such as the main chain of the whole selectivity filter and surrounding peptide.

      We have now added some clarification regarding the structures of KCNQ1 with S4-down and S4-up to clarify that the closed conformation of the channel in the up-state is different from that in the down-state. We also emphasize this difference in the Discussion.

      Recommendations for the authors:

      Reviewer #1:

      (1) Explain more thoroughly how the single-channel recordings were done:

      - How was Lin-Glycine applied in these experiments? The patch configuration is unclear. Was Lin-Glycine added to the patch pipette? If not, why is Lin-Glycine expected to reach the proposed binding site in the outer leaflet? Were controls time-matched applications of vehicles with ethanol?

      Data were collected using the cell attached patch configuration to minimize disruption to the patch and avoid rundown problems due to the loss of PIP2. Lin-Glycine was solubilized in DMSO and the desired concentration was added directly to the bath. We had no a priori reason to know if the PUFA would reach the proposed binding site but the consistency at which there was an increase in channel activity 5-10 minutes after addition to the bath convinced us that it was indeed reaching the binding site. This time frame fits with our prior experience with mefenamic acid effects on single channels (Wang et al 2020). The mefenamic acid binding site is external to the membrane so the drug must enter the cell and cross the patch membrane to affect channel activity. In addition, shown below is a previous recording from our lab, where nothing was added to the bath over a 55-minute time while recording consecutive files.  This shows the typical behavior of IKs, with activity tending to cluster with a few active sweeps in between many blank sweeps.  The behavior in this patch contrasts with that seen in the presence of Lin-glycine, where the clusters of activity spread over an increasing number of sweeps.

      In addition, we have previously shown that 0.1% DMSO (concentration used in the present study) does not affect the GV of KCNQ1 but there is a non-significant decrease in tail current amplitudes of about 14% (Eldstrom et al., 2021). As such we do not think that the effects we see with Lin-Glycine, with an increase in activity can be explained by vehicle effects alone.

      Author response image 1.

       

      We added some more details in the section Material and Method.

      - How well the replicates match the representative data in Figures 1, S1, and 6 is unclear (except for average current and Po in the last second of the traces from Figure 1). Are the results in Fig 6 n=1? 

      We now show in a data supplement that 3 replicates were used to access the change in channel activity upon addition of Lin-glycine.

      - Diary plots (as in Werry et al. 2013) and additional descriptions of the timeline of Lin-Glycine application and analyses could add credibility to interpretations. 

      We added a Diary plot of for the First latency to open in Supplementary Figure S1.

      - Amounts of plasmids and lipofectamine that were used in transfections are missing. 

      We added the information in Material and Method section as follow:

      “Single channel currents were recorded from transiently transfected mouse ltk- fibroblast cells (LM cells) using 1.5 mL Lipofectamine 2000 (Thermo Fisher Scientific). Cells were transfected with 1.5 mg of pcDNA3 containing a linked KCNE1-KCNQ1 construct 20, to ensure fully KCNE1-saturated complexes, in addition to a plasmid containing green fluorescent protein (GFP) to identify transfected cells”

      - Inclusion/exclusion criteria for patches analyzed are missing. 

      We added the information in Material and Method section as follow:

      “Only patches that were largely free of endogenous currents and had few channels, such that there were several blank sweeps to average for use for leak subtraction, were analyzed.”

      - Whether blinding, randomization, or pre-determined n values were employed is not mentioned. 

      No blinding, randomization or pre-determined n values were employed.

      - Analysis methods are sometimes unclear: How was Po calculated? Representative sweeps appear to have been leak and capacitance subtracted. How was that done? 

      Po was estimated from all-point amplitude histogram as follow: Po = Sum (iN/(iestimateNtotal), where N is the number of points for a specific current i in the histogram, iestimate = 0.4 pA from the peak of the histogram, and Ntotal = 10,000 is the total number of points in the last second of the trace. p = 0.75 ± 0.12 (n = 8) and p = 0.87 ± 0.04 (n = 3) for Control and Lin-Glycine, respectively.

      Leak and capacitance were subtracted with averaged empty sweeps.

      (2) The change of cells used for whole cell vs single channel (oocytes vs mouse ltk- fibroblast cells) could be discussed. These cells likely have different lipids in their membranes. Is there any other evidence that PUFAs have the same effects on KCNE1-KCNQ1 in these cells? Does the V0.5 shift? 

      A similar effect on Gmax, in both oocytes and mouse ltk-fibroblast cells, is shown in Figure 1 and 2. In Figure 2, the shift in latency suggests a shift in V0.5, suggesting the binding of PUFA to Site I.

      (3) The manuscript associates selectivity filter changes with S4 being up or down. It would help to clarify whether there was a change in [K+] in the two KCNQ1 structures used for modeling, as Mandala and MacKinnon (2023) state: "We note that one interesting difference between the two up structures regards the occupancy of K+ ions in the selectivity filter (SI Appendix, Fig. S5 C and D). In the polarized sample, due to the low extravesicular concentration of K+, density is only visible at the first and third positions in the selectivity filter, while density is present at all four positions in the unpolarized sample. Similar differences were observed in our previous study on Eag (20) and are qualitatively consistent with crystal structures of KcsA solved under symmetrical high and low K+ concentrations (45)." 

      Our studies states that there are some differences in the two structures with S4 in up-state and S4 in down-state and a reorganization of the pore. As for the change in [K+] occupancy in the two structures, we are not sure as our knowledge only come from what stated in Mandala and Mackinnon (2023). Mandala and MacKinnon did not discuss the selectivity filter in the down state structure in their paper and there are no K ions in any of their pdb files. So, we don’t know how many K+ ions there are in the down state.

      (4) The manuscript states " PUFAs increase Gmax by binding to a crevice in the pore domain" and "we elucidated that Lin-Glycine binds to a crevice between K326 and D301", this seems speculative without any actual binding studies or concrete structural evidence. A quantitative structural modeling analysis of whether changes in the crevice change the theoretical binding of Lin-Glycine might provide a stronger basis for speculation. 

      We toned down these statements in Results and Discussion to:

      “Crevice residues affect PUFA ability to increase Gmax"

      And

      Discussion: “We tested the hypothesis that the effect of Lin-Glycine involved conformational changes in the selectivity filter following PUFA binding to two residues K326 and D301 at the pore domain. Those residues delimit a small crevice that seems to change in size in different structures with S4 up or S4 down (Figure 3, D-F).”

      (5) The several figures detailing differences in selectivity filter conformation in the KCNQ1 structures are interesting and relevant in that they identify the movement of residues such as Y315 that, when mutated, ablate Lin-Glycine effect on Gmax. It would help to clarify whether T312 and I313 also move between the two selectivity filter conformations. 

      From the morph of the selectivity filter in the two conformations, it is noticeable that the changes and residue movements involve only residues at the upper part of the selectivity filter (including Y315 and D317). T312 and I313, are in the lower part of the selectivity filter and do not seem to move or rotate from their position between the two conformations of the selectivity filter.

      We now include a Supplementary Figures S3 and S4 that show the extent of movement of each residue in the pore region and a short description of this in the Results section.

      (6) The claim in the abstract that selectivity filter instability "explains the low open probability" seems too general. Lin-Glycine seems to increase the likelihood of conduction by 2.5-fold, but it was not clear whether open probability ceases to be low or whether other mechanisms also keep Po low. 

      We reword this sentence to “Our results suggest that the selectivity filter in KCNQ1 is normally unstable, contributing to the low open probability, and that the PUFA-induced increase in Gmax is caused by a stabilization of the selectivity filter in an open-conductive state..”

      Reviewer #2:

      (1) While all the electrophysiological recordings used KCNQ1/KCNE1 channels, all the structural models they used are KCNQ1 channels (without KCNE1). I know it is because the KCNQ1/KCNE1 complex structure is unavailable. However, according to their previous results, KCNQ1 alone is also upregulated by PUFAs. I am curious about what the single-channel recordings of KCNQ1 alone look like in the presence and absence of PUFAs. 

      We would love to include single-channel recordings of KCNQ1, but they are extremely hard to measure due to the small size and flickering nature of the channel.

      (2) As mentioned above, we do not have the KCNQ1/KCNE1 structure yet have the KCNQ1/KCNE3 structures (Sun and MacKinnon, Cell, 2020). According to the PDBs (6V00 or 6V01), the clevis (K326 and D301) looks covered by KCNE3. Is it true that PUFAs do not upregulate KCNQ1/KCNE3? If true, KCNE1 may not cover the clevis, so the binding mode should differ from the KCNQ1/KCNE3 structures. Please discuss the possible blocking of the clevis by KCNE proteins. 

      We previously presented data that is consistent with that KCNE1 rotates the VSD towards the PD (Wu et al., 2021). This mechanism would leave room for PUFA and KCNE1, so that PUFA can bind to the crevice (K326 and D301). So we think that this rotation will prevent PUFA and KCNE1 from competing for the same space. As for KCNQ1/KCNE3 we currently do not have any evidence about a possible upregulation by PUFA.

      (3) In the cryoEM structure with S4 resting (Figure 3F), the clevis looks too narrow for PUFA to bind. Is there any (either previous or current) evidence supporting that PUFA binding is state-dependent? 

      Because PUFAs integrate first into the bilayer and then diffuse towards its binding site on the channel, it would be hard to test a state-dependence of the binding. In addition, once PUFAs are in the bilayer, the rate of binding/unbinding is quite fast (within the ns range according to our previous MD simulations), whereas opening/closing rate is very slow (100 ms-s). So, the combination of slow wash in/washout, fast binding/unbinding, and slow opening/closing would make it very difficult to test the state-dependence of the binding by using a fast perfusion or different voltage protocols.  

      (4) In the previous report (Liin et al. Cell Reports, 2018), K326 is the most critical site for PUFA binding. Why the K326 mutants are not included in the current study? I also would like to see the single-channel recordings of the K326E mutant, which showed a smaller Gmax. Does the PUFA application reduce the probability of non-empty traces in this mutant? 

      As Liin et al. reported, mutations of K326 reduce the ability of PUFA to increase the Gmax. In this work, we wanted to gain further biophysical information on the mechanism that leads to an increase in Gmax, considering the knowledge we had from work conducted in our lab previously. We therefore focused here on residues downstream of K326 that we think are important for inducing the conformational changes at the selectivity filter. We agree that single channel experiments on K326E would be very interesting but that has to be for a future study.

      Minor points 

      (1) Liin et al. used S209F (Po of 0.4) and I204F (Po of 0.04) mutants. Their single-channel recordings would be a good addition. 

      We thank the reviewer for the suggestion. However, single channels analysis on S209F and I204F were previously shown (Eldstrom et al., 2010).

      (2) I would like to see how the Site I mutations (R2Q/Q3R) affect (or do not affect) the single-channel recordings (open probability and latency). 

      Thank you for the excellent suggestion. It would be interesting to assess the behavior of the channel when mutations occur at Site I. However, we think this information will not add any more detail to this study as we focus here our attention on the mechanism for Gmax increase. Single channels recordings are extremely hard to get, therefore we chose to include only mutations at Site II for this study.

      (3) I would like the G-V curves for all the mutations at 0 and 20 uM of Lin-Glycine (Figure 3C and Figures 5A and B). 

      We now added the G-V curves in Supplementary Figure S7.

      (4) I assume all the PUFAs have a similar effect on the selectivity filter, but a few other examples of PUFAs would be nice to see. 

      We anticipate that PUFAs and analogues with similar properties to Lin-Glycine would increasing the Gmax by a similar mechanism, because other PUFAs have been previously shown to increase the Gmax (Bohannon et al., 2020).

      (5) Although the probabilities of non-empty sweeps are written in the manuscript, bar graph presentations would be a nice addition to Figures 2 and 6. 

      We have added bar graphs of non-empty sweeps for Fig 2 and 6 in.

      (6) Is there no statistical significance for D317E and T309S in Figure 5A? 

      No statistical significance for D317E and T309S

      (7) There is no reference to Figure 7 in the manuscript. 

      A reference to Figure 7 has been added to the manuscript in the following paragraph.

      “Taken together, our results suggest that the binding of PUFA to Site II increases Gmax by promoting a series of interactions that stabilize the channel pore in the conductive state. For instance, we speculate that in the conductive state, hydrogen bonds between W304-D317 and W305-Y315, which are likely absent in the non-conductive conformation of KCNQ1, are created and that PUFA binding to Site II favors the transition towards the conductive state of the channel (Figure 7)”

      Reviewer #3:

      (1) Clarify the structural figures. Figures 3 D, E, and F - explain what the colors indicate. 

      A more detailed description of Figure 3 has been added to the legend.

      “D, E and F) Structure of crevice between S5 and S6 in KCNQ1 with S4 up (D and E) and S4 down (F). Residues that surround the crevice from S6 shown in blue (K326, T327, S330, V334) and from S5 in red (D301, A300, L303, F270). Remaining KCNQ1 residues shown in purple…, linoleic acid (LIN: gold color)”

      Fig 4. Only side chains of the residues are shown, making it hard to relate the figure to the familiar K channel selectivity filter. The main chain of the entire selectivity should be shown to orient readers to the familiar view of the K channel selectivity filter. In addition, the structures shown are only part of the selectivity filter, it should be specified which part of the selectivity filter is shown. These will also help the discussion at the bottom of page 10 and subsequent text. 

      We now provide a new Figure 4 with more details such as the main chain of the whole selectivity filter and surrounding peptide.

      (2) Cautions should be stated clearly when the structural comparison between the S4-up and S4-down is made that the structure of the pore when it is closed with S4-up may differ from the structure of the pore with S4-down. 

      We now state in addition “Clearly, there will be other differences in the pore domain between structures with activated and resting VSDs, for example the state of the activation gate.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript constitutes an important contribution to antimalarial drug discovery, employing diverse systems biology methodologies; with a focus on an improved M1 metalloprotease inhibitor, the study provides convincing evidence of the utility of chemoproteomics in elucidating the preferential targeting of PfA-M1. Additionally, metabolomic analysis effectively documents specific alterations in the final steps of hemoglobin breakdown. These findings underscore the potential of the developed methodology, not only in understanding PfA-M1 targeting but also in its broader applicability to diverse malarial proteins or pathways. Revisions are needed to further enhance overall clarity and detail the scope of these implications.

      We thank the editor and reviewers for recognising the contribution our work makes to understanding the selective targeting of aminopeptidase inhibitors in malaria parasites and the wider impact this multi-omic strategy can have for anti-parasitic drug discovery efforts. The reviewers have provided constructive feedback and raised important points that we have taken on-board to improve our manuscript. In particular, we have revised aspects of the text and figures to enhance clarity, performed additional analysis on the other possible MIPS2673 interacting proteins and more comprehensively analysed the effect of MIPS2673 on parasite morphology. NB: Specific responses to comments in the public reviews are provided within responses to the specific recommendations to authors.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The article "Chemoproteomics validates selective targeting of Plasmodium M1 alanyl aminopeptidase as a cross-species strategy to treat malaria" presents a series of biochemical methods based on proteomics and metabolomics, as a means to:

      (1) validate the specific targeting of biologically active molecules (MIPS2673) towards a defined (unique) protein target within a parasite and (2) to explore whether by quantifying the perturbations generated at the level of the parasite metabolome, it is possible to extrapolate which metabolic pathway has been disrupted by using this biologically active molecule and whether this may further confirm selective targeting in parasites of the expected (or in-vitro targeted) enzyme (here PfA-1).

      The inhibitor used in this work by the authors (MIPS2673) is to my knowledge a novel one, although belonging to a chemical series previously explored by the authors, which recently enabled them to discover a specific PfA-M17 inhibitor, MIPS2571 (Edgard et al., 2022, ref 11 of this current work). Indeed, inhibitors specifically targeting either PfA-M1 or PfA-M17 (and not both, as currently done in the past) are scarce today, and highly needed to functionally characterize these two zinc-aminopeptidases. MIPS2673, blocks the development of erythrocytic stages of Plasmodium falciparum with an EC50 of 324 nM, blocks the parasite development at the young trophozoite stage at 5x EC50 (but at ring stages at 10xEC50, figure 1E), and inhibits the enzymatic activity of PfA-M1 (and its ortholog Pv-M1) but not of the related malarial metallo-aminopeptidases (M17 and M18 families) nor the human metalloenzymes from closely related enzymatic families, supporting its selective targeting of PfA-M1 (and Pv-M1).

      All experiments are carried out in vitro (e.g. biochemical studies such as enzymology, proteomics, metabolomics) and on cultured parasites (erythrocyte stages of Plasmodium falciparum and several gametocytes stages obtained in vitro); there are no in vivo manipulations. The work related to Plasmodium vivax, which justifies the "cross-species" indication in the title of the article, is restricted to using a recombinant form of the M1-family aminopeptidase in enzymatic assays. The rest of the work concerns only Plasmodium falciparum. While I found globally that this work is original and brings new data and above all proposes chemical validation approaches that could be used for other target validations under similar limiting conditions (impossibility of KO of the gene), I have some specific questions to address to the authors.

      Strengths and weaknesses:

      - The chemoproteomic approach, that explores the ability of MIPS2673 to more significantly "protect" the putative target (PfA-M1) against thermal degradation or enzymatic attack (by proteinase K), to document its selective targeting towards PfA-M1 (the inhibitor, once associated with its target, is expected to stabilize its structure or prevent the action of end proteases), uses several concentrations of MIPS2673 and provides convincing results. My main criticism is that these tests are carried out with parasite extracts enriched in 30-38 hours old forms, and restricted to the fraction of soluble proteins isolated from these parasitic forms, which still limits the scope of the analysis. It is clear that this methodological approach is a choice that can be argued both biologically (PfA-M1 is well expressed in these stages of the parasite development) and biochemically (it is difficult to do proteomic analyses on insoluble proteins) but I regret that the authors do not discuss these limitations further, notably, I would have expected (from Figure 1E) some targets to be also present at ring stages.

      - The metabolomic approach, by documenting the ability of MIPS2673 to selectively increase the number of non-hydrolyzed dipeptides in treated versus untreated parasites is another argument in favor of the selective targeting of PfA-M1 by MIPS2673, in particular by its broad-spectrum aminopeptidase action preferentially targeting peptides resulting from the degradation of hemoglobin by the parasite. The relative contribution of peptides derived from host hemoglobin versus other parasite proteins is, however, little discussed.

      The work as a whole remains highly interesting, both for the specific topic of PfA-M1's role in parasite biology and for the method, applicable to other malarial drug contexts.

      Reviewer #2 (Public Review):

      In this manuscript, the authors first developed a new small molecular inhibitor that could target specifically the M1 metalloproteases of both important malaria parasite species Plasmodium falciparum and P. vivax. This was done by a chemical modification of a previously developed molecule that targets PfM1 as well as PfM17 and possibly other Plasmodial metalloproteases. After the successful chemical synthesis, the authors showed that the derived inhibitor, named MIPS2673, has a strong antiparasitic activity with IC50 342 nM and it is highly specific for M1. With this in mind, the authors first carried out two large-scale proteomics to confirm the MIPS2673 interaction with PfM1 in the context of the total P. falciparum protein lysate. This was done first by using thermal shift profiling and subsequently limited proteolysis. While the first demonstrated overall interaction, the latter (limited proteolysis) could map more specifically the site of MIPS2673-PfM1 interaction, presumably the active site. Subsequent metabolomics analysis showed that MIPS2673 cytotoxic inhibitory effect leads to the accumulation of short peptides many of which originate from hemoglobin. Based on that the authors argue that the MIPS2673 mode of action (MOA) involves inhibition of hemoglobin digestion that in turn inhibits the parasite growth and development.

      Reviewer #3 (Public Review):

      This is a manuscript that attempts to validate Plasmodium M1 alanyl aminopeptidase as a target for antimalarial drug development. The authors provide evidence that MIPS2673 inhibits recombinant enzymes from both Pf and Pv and is selective over other proteases. There is in vitro antimalarial activity. Chemoproteomic experiments demonstrate selective targeting of the PfA-M1 protease.

      This is a continuation of previous work focused on designing inhibitors for aminopeptidases by a subset of these authors. Medicinal chemistry explorations resulted in the synthesis of MIPS2673 which has improved properties including potent inhibition of PfA-M1 and PvA-M1 with selectivity over a closed related peptidase. The compound also demonstrated selectivity over several human aminopeptidases and was not toxic to HEK293 cells at 40 uM. The activity against P. falciparum blood-stage parasites was about 300 nM.

      Thermal stability studies confirmed that PfA-M1 was a binding target, however, there were other proteins consistently identified in the thermal stability studies. This raises the question as to their potential role as additional targets of this inhibitor. The authors dismiss these because they are not metalloproteases, but further analysis is warranted. This is particularly important as the authors were not able to generate mutants using in vitro evolution of resistance strategies. This often indicates that the inhibitor has more than one target.

      The next set of experiments focused on a limited proteolysis approach. Again several proteins were identified as interacting with MIPS2673 including metalloproteases. The authors go on to analyze the LiP-MS data to identify the peptide from PfA-M1 which putatively interacts with MIPS2673. The authors are clearly focused on PfA-M1 as the target, but a further analysis of the other proteins identified by this method would be warranted and would provide evidence to either support or refute the authors' conclusions.

      The final set of experiments was an untargeted metabolomics analysis. They identified 97 peptides as significantly dysregulated after MIPS2673 treatment of infected cells and most of these peptides were derived from one of the hemoglobin chains. The accumulation of peptides was consistent with a block in hemoglobin digestion. This experiment does reveal a potential functional confirmation, but questions remain as to specificity.

      Overall, this is an interesting series of experiments that have identified a putative inhibitor of PfA-M1 and PvA-M1. The work would be significantly strengthened by structure-aided analysis. It is unclear why putative binding sites cannot be analyzed via specific mutagenesis of the recombinant enzyme.

      In the thermal stability and LiP -MS analysis, other proteins were consistently identified in addition to PfA-M1 and yet no additional analysis was undertaken to explore these as potential targets.

      The metabolomics experiments were potentially interesting, but without significant additional work including different lengths of treatment and different stages of the parasite, the conclusions drawn are overstated. Many treatments disrupt hemoglobin digestion - either directly or indirectly and from the data presented here it is premature to conclude that treatment with MIPS2673 directly inhibits hemoglobin digestion.

      Finally, the potency of this compound on parasites grown in vitro is 300 nM - this would need improvements in potency and demonstration of in vivo efficacy in the SCID mouse model to consider this a candidate for a drug.

      Summary:

      Overall, this is an interesting series of experiments that have identified a putative inhibitor of the Plasmodium M1 alanyl aminopeptidases, PfA-M1 and PvA-M1.

      Strengths:

      The main strengths include the synthesis of MIPS2673 which is selectively active against the enzymes and in whole-cell assay.

      Weaknesses:

      The weaknesses include the lack of additional analysis of additional targets identified in the chemoproteomic approaches.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Question 1. Line 737 (and elsewhere). Why are Plasmodium vivax orthologs of PfA-M1 and PfA-M17 called Pv-M1 and Pv-M17 and not PvA-M1 and PvA-M17, where A stands for Aminopeptidase? I would recommend changing the names if possible, although the mention of Pv-M1 and Pv-M17 is now current in the literature (which is kind of regrettable). See also Supplemental Table S1 where PfA-M1 is named Pf-M1.

      Supplemental Table S1 was updated to PfA-M1. Nomenclature for the Plasmodium vivax aminopeptidase orthologs was amended to PvA-M1 and PvA-M17 as suggested by the reviewer.

      Question 2. Figure 1. Observation of parasite culture slide smears in Figure 1E strongly suggests that an important target of MIPS2673 appears to be expressed at the ring stage or very young trophozoites, whereas the authors, in their proteomic and metabolomic analyses, performed studies focused on late trophozoites stages (30-38h post-invasion). This difference in the targeting of Plasmodium stages puzzles me and deserves some explanations from the authors, and is related to my question 3.

      As the reviewer indicates, ring-stage parasite growth appears to be affected at high concentrations (5x and 10x EC50) of MIPS2673. Under these conditions, parasite growth appears to stall during late rings/early trophs at ~16-22 h post invasion when haemoglobin digestion is increasing and when one presumes PfA-M1 (the primary target of MIPS2673) is increasing in both expression and activity (see references 26 and 28 of this manuscript). Thus, whilst it is unsurprising that MIPS2673 has some activity against ring-stage parasites, we focused on the trophozoite stage for our proteomics studies as we showed this to be the stage most susceptible to MIPS2673 (Fig. 1D) and reasoned that we would most likely identify the primary MIPS2673 target, and other interacting proteins, from a complex biological mixture at this stage. The same reasoning underpinned our decision to perform metabolomics on drug-treated trophozoites, as we reasoned we would see a greater functional effect on this stage. Furthermore, performing these experiments on trophozoites rather than rings minimises the interference from the host red blood cell. While we cannot rule out additional targets in rings, repeating all experiments during this parasite stage is beyond the scope of this study.

      Question 3. Figure 2. Although Figure 2 is insightful and somehow self-explanatory, I think it misses two specific pieces of information. First, it is indicated in line 618 (M&M) that parasite material for thermal stability and limited proteolysis studies correspond to synchronized parasites (30-38h post-invasion) but this information is not given in Figure 2. In addition, if I fully understand the experimental protocol of obtaining parasite extracts, they strictly correspond to the soluble protein fraction of the erythrocytic stages of plasmodium at the late trophozoite stage, and not to all parasitic proteins as the scheme of Figure 2 might suggest. I would appreciate it very much if these two points (parasite stages and soluble proteins) were clearly indicated in the scheme as indeed, not the whole parasite blood stage proteome is investigated in the study but just a part of it (~47%, as the authors indeed indicate line 406). Please, edit also the legend of the figure accordingly.

      This is correct, the soluble protein fraction from synchronised trophozoites was used in our proteomics studies. These details have been included in an updated Figure 2 and in the corresponding figure legend.

      Question 4. Thermal stabilization. Figure 3B. Could the authors explain how they calculated or measured "absolute" protein abundances, and how this refers to a number of parasites in initial assays as this is not clear to me. Notably, abundance for PfA-M1 is much higher than for PF3D7_0604300, which are interesting "absolute" values.

      Protein abundance was calculated using the mean peptide quantity of the stripped peptide sequence, with only precursors passing the Q-value threshold (0.01) considered for relative quantification. Within independent experiments, normalisation was based on total protein amount (determined by the BCA assay) rather than the initial number of parasites.

      PfA-M1 is known to be a highly abundant protein and PF3D7_0604300 (as well as the other protein hits identified by thermal stability proteomics) are likely less abundant. It is noted that abundance is also dependent on ionisation efficiency and trypsin digestion efficiency. Therefore, we avoid comparing absolute abundances across proteins and use relative differences across conditions instead.

      NB: the word “absolute” in the text (“absolute fold-change”) refers to the absolute value of the fold-change (i.e. positive or negative), and not to absolute quantification of proteins. The preceding text in each case clarifies that these are based on “relative peptide abundance”.

      Question 5. Figure 5A. How do the authors explain peptides whose abundances are decreasing instead of increasing? Figure 5C. Could the authors provide digital cues (aa numbers or positions) on the ribbon representation of the PfA-M1 sequence? It is difficult to correlate the position of the 3D domains with respect to the primary structure of the protein. Also, the "yellow" supposed to show the "drug ligand" is really not very visible.

      LiP-MS is based on the principle that ligand binding alters the local proteolytic susceptibility of a protein to a non-specific protease (in this case proteinase K, PK). In this sense, in LiP-MS we are not looking at variations in the stability of whole proteins (as is the case with thermal stability proteomics, where proteins detected with significantly higher abundance in treated relative to control samples reflects thermal stabilisation of the target due to ligand binding), but differences in peptide patterns between treated and control samples that reflect a change in the ability of PK to cleave the target. Thus, in the bound state, the ligand prevents proteolysis with PK. This results in decreased abundance of peptides with non-tryptic ends (as PK cannot access the region around where the ligand is bound) and increased abundance of the corresponding fully tryptic peptide, when compared to the free target. This concept is demonstrated in Fig. 4A and is explained in the text (lines 279-282) and Fig. 4 figure legend.

      To aid visualisation, we have not added amino acid positions on the PfA-M1 sequence in Fig. 5, but have provided amino acid positions for all peptides in Supplementary File 3. We have also changed the colour of the ligand in Fig. 5C to blue and increased transparency of the binding and centre of mass neighbourhoods.

      Question 6. Gametocyte assays. Line 824 states that several compounds were used as positive controls for anti-gametocyte activity (chloroquine, artesunate, pyronaridine, pyrimethamine, dihydroartemisinin, and methylene blue) and line 821 states that the biological effects are measured against puromycin. This is not very clear to me, could the authors comment on this?

      This wording has been clarified in the methods to reflect that 5 µM puromycin was used as the positive control to calculate percent viability, whereas the other antimalarials were run in parallel as reference compounds with known anti-gametocyte activity (line 862).

      Question 7. Metabolomics. Metabolomic assays were done on parasites at 28h pi, incubated for 1h with 3x EC50 of MIPS2673. You mention applying the drug on 2x10E8 infected red blood cells (line 838) but you do not explain how you isolate these infected red blood cells from non-infected red blood cells. Could you please specify this?

      Metabolomics studies were performed such that cultures at 2% haematocrit and 6% trophozoite-stage parasitaemia (representing 2 x 108 cells in total, rather than 2 x 108 infected cells) were treated with compound or vehicle and after 1 h metabolites were extracted. This methodological detail has been clarified in the methods (line 875).

      Question 8. Figure 3B. Does this diagram come from the experimental 3D structure created by the authors (8SLO) or from molecular modeling? Please specify in the legend (line 1305).

      The diagram showing the binding mode of MIPS2673 bound to PfA-M1 comes from the experimentally determined 3D structure (PDB ID: 8SLO). This has now been stated in the figure legend. Note that the structural diagram refers to Fig. 1B (not Fig. 3B as indicated by the reviewer). The experimentally determined PfA-M1 structure with MIPS2673 bound (PDB ID: 8SLO) was also used to map LiP peptides and estimate the MIPS2673 binding site in Fig. 5, which is also now reflected in the appropriate section of the text (line 308) and Fig. 5 legend.

      Question 9. Line 745. Why not indicate µm concentration for this H-Leu-NHMec substrate while it is indicated for the other substrates mentioned in the rest of the paragraph (H-Ala-NHMec, 20 μM, etc..). Also in this section (Enzyme assays) the pH at which the various enzymatic assays were done is missing.

      All enzyme assays were performed at pH 8.0. The concentration of H-Leu-NHMec varied depending on the enzyme assayed, as follows: 20 µM for PfA-M1, 40 µM for PvA-M1 and 100 µM for ERAP1 and ERAP2. This information is now clearly stated in the methods section (lines 782 and 787) and as a footnote for Supplemental Table S1.

      Question 10. Line 830, please define FBS.

      Fetal bovine serum (FBS) has been added where appropriate (line 867).

      Question 11. The authors mention in the title the targeting of several plasmodium species, but the only experimental study on the Plasmodium vivax species concerns the use of the recombinant enzyme Pv-M1. Authors also mention "multi-stage targets", but ultimately only look at erythrocyte stages and three different gametocyte stages.

      We have now removed the words “cross-species” and “multi-stage” from the manuscript title and abstract so as not to overstate these findings. We have also added the word “potential” in the manuscript text to clarify that selective M1 inhibition could offer a potential multistage and cross species strategy for malaria.

      Question 12. Supplemental Table S1. I would suggest replacing "Percent inhibition by MIPS2673 of PfA-M1 and Pv-M1 aminopeptidases compared to selected human M1 homologues" with "Percent inhibition by MIPS2673 of PfA-M1 and Pv-M1 aminopeptidase activities compared to selected human M1 homologues".

      Done.

      Question 13. Supplemental Table S3. Here you indicate IC50 while in text and Figure 1 you quote EC50. Why this difference?

      This has now been changed to EC50 in Supplemental Table S3.

      Reviewer #2 (Recommendations For The Authors):

      Amendments that I would recommend in order to improve the presentation include all four parts of the study:

      (1) In vitro antiparasitic activity of MIPS2673.

      The authors showed that MIPS2673 inhibits parasite growth with IC50 of 324nM measured by a standard drug sensitivity assay, Fig 1C. This is all well and good, but it would be helpful to include at least one if not more other compounds such as antimalaria drugs and/or their earlier inhibitors (e.g. inhibitor 1) for comparisons. This is typically done to show that the assay in this manuscript is fully compatible with previous studies. It will also give a better view of how the selective inhibition of PfM1 kills the parasite, specifically.

      Alongside MIPS2673, we also analysed the potency of the known antimalarial artesunate, which was found to have an EC50 of 4 nM. This value agrees with the expected potency of artesunate and indicates our MIPS2673 value of 324 nM is indeed compatible with previous studies. We have now reported the artesunate EC50 value for reference (lines 197-198 and Fig. S1).

      Next, the authors proceeded to investigate the stage-specific effect of MIPS2673 but this time doing a survival assay instead of proper IC50 estimations (Figure 1. I wonder why? Drug survival assays have typically very limited information content and measuring proper IC50 in stage-specific wash-off assays would be much more informative.

      We performed single concentration stage specificity assays to determine the parasite asexual stage at which MIPS2673 is most active. This involved washing off the compound after a 24 h exposure in rings or trophozoites and determining parasite viability in the next asexual lifecycle. While a full dose response curve would allow generation of an EC50 value against the respective parasite stages, this information is unlikely to change the interpretation that MIPS2673 is more active against trophozoites stages than against rings.

      Finally, in Figure 1E, the authors present the fact that the MIPS2673 arrests the parasite development. This is done by presenting a single (presumably representative) cell per time point. This is in my view highly insufficient. I recommend this figure be supplemented by parasite stage counts or other more comprehensive data representation. Also, the authors mention that while there is a growth arrest, hemoglobin is still being made. From the cell images, I can not see anything that supports this statement.

      We thank the reviewer for this constructive comment and they are correct in their assessment that these are representative parasite images at the respective time points. To address the reviewers concerns we have now provided cell counts from each treatment condition (Fig. 1E) at selected time points, which shows parasite stalling at the ring to trophozoite transition under drug treatment. On reflection, we agree that it is difficult to determine the presence of haemozoin from our images and have removed this statement.

      (2) Protein thermal shift profiling. In the next step, the authors proceed to carry out cellular thermal shift profiling to show that PfM1 indeed interacts with MIPS2673, this time in the context of the total protein lysates from P. falciparum. This section of the study is in my view quite solid and indeed it is nice to see that the inhibitor causes a thermal shift of PfM1 which further supports what was already expected: interaction.

      I have no problem with this study in terms of the technical outcome but I would urge the authors to tone down the interpretation of these results in two ways.

      Four other proteins were found to be shifted by the inhibitor which also indicates interactions. Calling it simply "off-target" interactions might not represent the truth. The authors should explore and in some way comment that interactions with these proteins could contribute to the MIPS2673 MOA. I do not suggest conducting any more studies but simply acknowledge this situation. Identifying more than one target is indeed very common in CETSA studies and it would be helpful to acknowledge this here as well.

      We agree that identifying binding proteins in addition to the “expected” target is commonplace, and is indeed one of the benefits of this unbiased and proteome-wide approach. In the results and discussion, we have now amended our language to refer to these additional hits as MIPS2673-interacting proteins. In our original manuscript we dedicate a paragraph in the discussion to these additional interacting proteins and the likelihood of them being targets that contribute to antimalarial activity. Of these four additional interacting proteins, only the putative AP2 domain transcription factor (PF3D7_1239200) is predicted to be essential for blood stage growth and is therefore the only protein from this additional four that would likely contribute to antimalarial activity. These points are explicitly stated in the discussion (lines 530-550). Notably, all of the other interacting proteins identified in our thermal stability dataset were detected in our LiP-MS experiment but were not identified as interacting proteins by this method. The remaining three proteins were two non-essential P. falciparum proteins with unknown functions (PF3D7_1026000 and PF3D7_0604300) that are poorly described in the literature and a human protein (RAB39A). Further analysis of these other thermal stability proteomics hits in our LiP-MS dataset (see responses to Reviewer #3) identified none or only 1 significant LiP peptide from these proteins across our LiP-MS datasets, indicating they are likely to be false positive hits. Caveats around identifying protein targets by different deconvolution methods are also now addressed (lines 545-550).

      At some point, the author argues that causing shifts of only four/five proteins including PfM1 shows that MIPS2673 does not interact with other (off) targets. Here one must be careful to present the lack of shifts in the CETSA as proof of no interaction. There are many reasons why thermal shifts are not observed including the physical properties of the individual proteins, detection limit etc. Again I suggest adjusting these statements accordingly.

      We thank the reviewer for raising this important point and have now included additional discussion around this comment (lines 545-550).

      Finally, I am not convinced that Figure 2 presents nothing more than the overall experimental scheme with not much new information. Many of such schemes were published previously in the original publication of thermal profiling. I would suggest omitting it from the main text and shifting it into supplementary methods etc.

      We agree that similar schemes have been published previously, especially for thermal proteome profiling, and acknowledge the reviewer’s suggestion of moving this figure to the supplemental material. However, we have kept Fig. 2 in the main text as this scheme also incorporates a LiP-MS workflow for malaria drug target deconvolution (the first to do so) and also to satisfy the additional details requested for this figure by Reviewer #1 (question 3).

      (3) Identification of MIPS2673 target proteins using LiP-MS. In the next step, the authors carried out the limited proteolysis analysis with the rationale that protein peptides that are near the inhibitor binding site will exhibit higher resilience to proteolysis. The authors did a very good job of showing this for PfM1-MISP2673 interaction. This part is very impressive from a technological perspective, and I congratulate the authors on such achievement. I imagine these types of studies require very precise optimizations and performance.

      Here, however, I struggle with the meaning of this experiment for the overall flow of the manuscript. It seems that the binding pocket of MIPS2673 is less known since the inhibitor was designed for it. In fact, the authors mentioned that the crystal structure of PfM1 is available. From this perspective, the LiP-MS study represents more of a technical proof of concept for future drug target analysis but has limited contribution to the already quite well-established PfM1-MISP2673 interaction. Perhaps this could be presented in this way in the text.

      We thank the reviewer for this comment and they are correct that we solved the crystal structure of PfA-M1 bound to MIPS2673. We wish to highlight that the primary reason for performing the LiP-MS study was as an independent and complementary target deconvolution method to narrow down the shortlist of targets identified with thermal stability proteomics, and validate with high confidence that PfA-M1 is indeed the primary target of MIPS2673 in parasites. The use of a complementary approach based on a different biophysical principle (proteolytic susceptibility vs thermal stability) would also allow us to identify MIPS2673 interacting proteins that may not be detectable by thermal stability proteomics, for example targets that do not alter their thermal stability upon ligand binding. The text in the results and discussion has been amended to clarify these points (lines 266-268 and 545-550).

      Furthermore, we agree that correctly predicting the MIPS2673 binding site on PfA-M1 using our LiP-MS peptide data is a technical proof of concept. Indeed, we wished to highlight the potential utility of LiP-MS for identifying both the protein targets of drugs and predicting their binding site, which is not possible with many other target deconvolution approaches. This point has been updated in the text (lines 303-304, 459-461).

      (4) Metabolomic profiling of MIPS2673 inhibition showed a massive accumulation of short peptides which clearly indicates that this inhibitor blocks some proteolytic activity of short peptides, presumably products of upstream proteolytic activities. Here the authors argue, that because many of these detected short (di-/tri-) peptides could be mapped on the hemoglobin protein sequence, this must be their origin. Although this might be the case the author could not exclude the fact that at least some of these come from other sources (e.g. Plasmodium proteins). It would be quite helpful to comment on such a possibility as well. In particular, it was mentioned that the main subcellular localization of PfM1 is in the cytoplasm while most if not all hemoglobin digestion occurs in the digestive vacuole...?

      Indeed, we agree that Pf_A-M1 is likely processing both Hb and non-Hb peptides and do not definitively conclude that all dysregulated peptides must be derived from haemoglobin. A subset of dysregulated peptides cannot be mapped to haemoglobin and must have an alternative source such as other host proteins or turnover of parasite proteins. We have amended the discussion to better reflect these possible alternate peptide sources (480-482). Although the peptides detected in the metabolomics study (2-5 amino acids) are too short to be definitively assigned to any specific parasite or RBC protein, it is important to note that our analysis strongly indicates that the majority, but not all, of dysregulated peptides are more likely to originate from haemoglobin than other human or parasite proteins. This is based on sequence mapping, which was aided by acquiring MS/MS data for a subset of dysregulated peptides from which we derive accurate sequences (as opposed to residue composition inferred from total peptide mass) to more directly link dysregulated peptides to haemoglobin. We further quantified the sequence similarity of dysregulated peptides to all detectable proteins in the _P. falciparum infected erythrocyte proteome (~4700 proteins), showing that these peptides are statistically more similar to haemoglobin than other host or parasite proteins.

      The apparent disconnect between PfA-M1 localisation (cytosol) and the predominant site of haemoglobin digestion (digestive vacuole, DV) is explained by the fact that peptides originating from digestion of haemoglobin in the DV are required to be transported into the cytoplasm for further cleavage by peptidases, including PfA-M1. This point has now been clarified in the discussion (lines 473-474).

      Reviewer #3 (Recommendations For The Authors):

      (1) Thermal stability studies confirmed that PfA-M1 was a binding target, however, there were other proteins consistently identified in the thermal stability studies. This raises the question as to their potential role as additional targets of this inhibitor. The authors dismiss these because they are not metalloproteases, but further analysis is warranted. This is particularly important as the authors were not able to generate mutants using in vitro evolution of resistance strategies. This often indicates that the inhibitor has more than one target.

      We thank the reviewer for this comment. The possibility of other targets contributing to MIPS2673 activity was also raised by Reviewer #2 (question 2) and is addressed above. Further to our response to Reviewer #2, we agree that the inability to generate resistant parasites in vitro could indicate that inhibition of multiple essential parasite proteins (including PfA-M1) contribute to MIPS2673 activity and do not rule out this possibility. It may also indicate the target has a very high barrier for resistance and is unable to tolerate resistance causing mutations as they are deleterious to function. Indeed, previous attempts to mutate PfA-M1 (references 12 and 50), and our own attempts to generate MIPS2673 resistant parasites in vitro (unpublished), were unsuccessful. It is important to note that of the hits reproducibly identified using thermal stability proteomics, only PfA-M1 and a putative AP2 domain transcription factor (PF3D7_1239200) are predicted to be essential for blood stage growth. We have explicitly stated that PF3D7_1239200 could also contribute to activity (line 533 and 537).

      As we identified multiple hits with thermal stability proteomics we employed the complementary LiP-MS method to further investigate the target landscape of MIPS2673. PfA-M1 was the only protein reproducibly identified as the target through this approach. Importantly, the five proteins identified as hits by thermal stability proteomics were also detected in our LiP-MS datasets, but only PfA-M1 was identified as a target by both target deconvolution methods, strongly indicating it is the primary target of MIPS2673 in parasites. An important caveat is that we profiled the soluble proteome (we did not include detergents necessary for extracting membrane proteins as they may interfere with these stability assays) and other factors (e.g. the biophysical properties of the protein) will impact on whether ligand induced stabilisation events are detected. We have added additional text in the discussion around the above points (lines 545-550).

      While we do not definitively rule out other MIPS2673 interacting proteins existing in parasites (that possibly also contribute to activity), our metabolomics studies indicated no functional impact by MIPS2673 outside of elevated levels of short peptides. This is indicative of aminopeptidase inhibition and the profile of peptide accumulation was distinct from a known PfA-M17 inhibitor, and other antimalarials, further pointing to selective inhibition of the PfA-M1 enzyme by MIPS2673 being responsible for antimalarial activity.

      (2) The next set of experiments focused on a limited proteolysis approach. Again several proteins were identified as interacting with MIPS2673 including metalloproteases. The authors go on to analyze the LiP-MS data to identify the peptide from PfA-M1 which putatively interacts with MIPS2673. The authors are clearly focused on PfA-M1 as the target, but a further analysis of the other proteins identified by this method would be warranted and would provide evidence to either support or refute the authors' conclusions.

      As PfA-M1 was the only protein reproducibly identified as an interacting protein across both LiP-MS experiments (and by thermal stability proteomics) we focused our analysis on this protein. However, we agree that further analysis of the other putative interacting proteins would be valuable. Additional analysis was performed  (see new figure S4) on the other interacting proteins identified by thermal stability proteomics and the other interacting proteins identified in LiP-MS experiment one, as no other proteins (apart from PfA-M1) were identified as hits in the second LiP-MS experiment (lines 314-318, 495-505, 740-762 and Fig. S4). Using the common peptides detected across both LiP-MS experiments we mapped significant LiP peptides to the structures of the other putative MIPS2673-interacting proteins, where a structure was available and significant LiP-MS peptides were detected, and measured the minimum distance to expected binding sites. It is noted that when using the same criteria for a significant LiP peptide that we used for our PfA-M1 analysis, only one significant LiP peptide is identified from these other putative interacting proteins (YSPSFMSFK from PfADA). Therefore, we used a less stringent criteria for defining significant LiP peptides for these other proteins (see methods and Fig. S4 legend) in order to identify significant LiP peptides to map to structures. This analysis showed that, with the exception of PfA-M17, significant LiP-MS peptides for these other proteins are not significantly closer to binding sites than all other detected peptides, supporting our assertion that these other proteins are likely to be false positives or not functionally relevant MIPS2673 interacting proteins. Although significant peptides from PfA-M17 were closer to the binding site, our thermal stability and metabolomics data, combined with our previous work on the PfA-M17 enzyme, argue against this being a functionally relevant target (see lines 362-374 and 486-529 for a more detailed discussion). Another possible explanation for this result is that peptide substrates accumulating due to primary inhibition of PfA-M1 interact with PfA-M17, leading to structural changes around the enzyme active site that are detected by LiP-MS.

      (3) The final set of experiments was an untargeted metabolomics analysis. They identified 97 peptides as significantly dysregulated after MIPS2673 treatment of infected cells and most of these peptides were derived from one of the hemoglobin chains. The accumulation of peptides was consistent with a block in hemoglobin digestion. This experiment does reveal a potential functional confirmation, but questions remain as to specificity.

      As indicated, the accumulation of short peptides identified by metabolomics suggests MIPS2673 perturbs aminopeptidase function. Many of these peptides (but not all) likely map to haemoglobin and are more haemoglobin-like than other proteins in the infected red blood cell proteome. An effect on a subset of non-haemoglobin peptides is also apparent and we have added this to our discussion (also refer to our response to question 4 from Reviewer #2). A direct comparison to our previous metabolomics analysis of a specific PfA-M17 inhibitor (MIPS2571, reference 11) revealed MIPS2673 induces a unique metabolomic profile. The extent of peptide accumulation differed and a subset of short basic peptides (containing Lys or Arg) were elevated only by MIPS2673, consistent with the broad substrate preference of PfA-M1. Importantly, the metabolomics profile induced by MIPS2673 is the opposite of many other antimalarials, which cause depletion of haemoglobin peptides. Taken together, the profile of short peptide accumulation induced by MIPS2673 is consistent with specific inhibition of PfA-M1.

      (4) Overall, this is an interesting series of experiments that have identified a putative inhibitor of PfA-M1 and PvA-M1. The work would be significantly strengthened by structure-aided analysis. It is unclear why putative binding sites cannot be analyzed via specific mutagenesis of the recombinant enzyme.

      Contrary to this comment we solved the crystal structure of PfA-M1 bound to MIPS2673, determining its binding mechanism to the enzyme. This was further supported through proteomics-based structural analysis by LiP-MS. Undertaking site specific mutagenesis would be interesting to further probe the binding dynamics of MIPS2673 to the M1 protein. However, we believe it is beyond the scope of this study and would not change our conclusion that MIPS2673 binds to PfA-M1, which we have shown using multiple unbiased proteomics-based methods, enzyme assays and X-ray crystallography.

      (5) In the thermal stability and LiP -MS analysis, other proteins were consistently identified in addition to PfA-M1 and yet no additional analysis was undertaken to explore these as potential targets.

      As addressed in our previous responses, across independent thermal stability proteomics experiments we consistently identified 5 interacting proteins, including the expected target PfA-M1. In contrast, only PfA-M1 was reproducible across independent LiP-MS experiments. While several plausible putative targets (including aminopeptidases and metalloproteins) were identified in one of our LiP-MS experiment, they appear to be false discoveries and not responsible for the antiparasitic activity of MIPS2673, as peptide-level stabilisation was not consistent across independent LiP-MS experiments, and an interaction is refuted by our thermal stability, metabolomics and recombinant enzyme inhibition data. We have now performed further analysis of these other putative interacting proteins, which also argues against them being likely interacting proteins (see also response to question 2). We have also added to our existing discussion on possible MIPS2673 targets and the likelihood of these proteins contributing to antimalarial activity (lines 486-550).

      (6) The metabolomics experiments were potentially interesting, but without significant additional work including different lengths of treatment and different stages of the parasite, the conclusions drawn are overstated. Many treatments disrupt hemoglobin digestion - either directly or indirectly and from the data presented here it is premature to conclude that treatment with MIPS2673 directly inhibits hemoglobin digestion.

      Our metabolomics studies were performed using typical experimental conditions for investigating the antimalarial mechanisms of compounds by metabolomics (see references 11, 39, 40 and 55-57). We used a short 1 h incubation at 3x EC50 allowing us to profile the primary parasite pathways affected by MIPS2673 and avoid a nonspecific death phenotype associated with longer incubations. As addressed in our response to Reviewer #1 (question 2) we focused on trophozoite infected red blood cells as this is the stage most susceptible to MIPS2673 and when one presumes the greatest functional impact would be seen. It is possible that an expanded kinetic metabolomics analysis may reveal secondary mechanisms involved in MIPS2673 activity and we have now acknowledged this in the manuscript (lines 515-516). However, even though secondary mechanisms may become apparent at longer incubations it also becomes difficult to uncouple drug specific responses from nonspecific death effects. We believe any additional information provided by an expanded metabolomics analysis is unlikely to outweigh the significant extra financial cost associated with this type of experiment.

      It is correct that many antimalarial compounds appear to disrupt haemoglobin digestion when analysed by metabolomics. However, as indicated in our manuscript (lines 369-373) and previous responses, the profile of elevated haemoglobin peptides induced by MIPS2673 is substantially different to the profile caused by other antimalarials. For example, artemisinins and mefloquine cause haemoglobin peptide depletion (references 55-57) and chloroquine results in increased levels of a different subset of non-haemoglobin peptides (see Creek et al. 2016). While there is some overlap in profile with a selective M17 inhibitor (our previous work, reference 11), the level of enrichment of these peptides is different and MIPS2673 also induces accumulation of a distinct set of basic peptides consistent with the substrate preference of the PfA-M1 enzyme. As we show that MIPS2673 does not inhibit other parasite aminopeptidases, a likely explanation for the profile overlap is that the build-up of substrates that cannot be processed by PfA-M1 leads to secondary dysregulation of other aminopeptidases. Our analyses (sequence mapping, MS/MS analysis and sequence similarities to all infected red blood cell proteins) strongly indicate that the majority of elevated peptides (but not all) originate from haemoglobin. Combined with our proteomics and recombinant enzyme data indicating direct engagement of PfA-M1, and with previous literature indicating the enzyme functions to cleave amino acids from haemoglobin-derived peptides, our data indicates MIPS2673 likely directly perturbs the haemoglobin digestion pathway through PfA-M1 inhibition.

      (7) Finally, the potency of this compound on parasites grown in vitro is 300 nM - this would need improvements in potency and demonstration of in vivo efficacy in the SCID mouse model to consider this a candidate for a drug.

      We do not propose MIPS2673 as an antimalarial candidate. The experiments presented here were centred on target validation rather than identification of an antimalarial lead, which may be the focus of future studies. To avoid this confusion, we have amended the manuscript title and language throughout to clarify this point.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Khan et. al., investigated the functional redundancy of the non-canonical L-cysteine synthases of M. tuberculosis, CysM and CysK2, focussing on their role in mitigating the effects of host-derived stress. They found that while deletion mutants of the two synthases (Rv∆cysM, Rv∆cysK2) have similar transcriptomes under standard conditions, their transcriptional response to oxidative stress is distinct. The impact of deleting the synthases also differentially affected the pools of L-cysteinederived metabolites. They show that the mutants (Rv∆cysM, Rv∆cysK2) have impaired survival in peritoneal macrophages and in a mouse model of infection. Importantly, they show that the survival of the mutants increases when the host is defective in producing reactive oxygen and nitrogen species, linking the phenotype to a defect in combating host-derived stress. Finally, they show that compounds inhibiting L-cysteine synthases reduce the intracellular survival of M.

      tuberculosis.

      Strengths:

      (1) The distinct transcriptome of the Rv∆cysM and Rv∆cysK2 mutants in the presence of oxidative stress provides solid evidence that these mutants are distinct in their response to oxidative stress, and suggests that they are not functionally redundant.

      (2) The use of macrophages from phox-/- and INF-/- mice and an iNOS inhibitor for the intracellular survival assays provides solid evidence that the survival defect seen for the Rv∆cysM and Rv∆cysK2 mutants is related to their reduced ability to combat host-derive oxidative and nitrosative stress. This is further supported by the infection studies in phox-/- and INF-/- mice.

      Weaknesses:

      (1) There are several previous studies looking at the transcriptional response of M. tuberculosis to host-derived stress, however, the authors do not discuss initial RNA-seq data in the context of these studies. Furthermore, while several of the genes in sulfur assimilation and L-cysteine biosynthetic pathway genes are upregulated by more than one stress condition, the data does not support the statement that it is the "most commonly upregulated pathway in Mtb exposed to multiple host-like stresses".

      We have made changes in the manuscript in line with reviewer’s suggestion.  

      “Thus RNA-Seq data suggest that genes involved in sulfur assimilation and L-cysteine biosynthetic pathway are upregulated during various host-like stresses in Mtb (Figure S2). Given the importance of sulphur metabolism genes in in vivo survival of Mtb [1, 2], it is not surprising that these genes are dynamically regulated by diverse environment cues. Microarray studies have shown upregulation of genes encoding sulphate transporter upon exposure to hydrogen peroxide and nutrient starvation [3-7] Similarly, ATP sulfurlyase and APS kinase is induced during macrophage infection and by nutrient depletion. Induction of these genes that coordinate first few steps of sulphur assimilation pathway indicate that probable increase in biosynthesis of sulphate containing metabolites that may be crucial against host inflicted stresses. Furthermore, genes involved in synthesis of reduced sulphur moieties (cysH, sirA and cysM) are also induced by hydrogen peroxide and nutrient starvation. Sulfur metabolism has been postulated to be important in transition to latency. This hypothesis is based on transcriptional upregulation of cysD, cysNC, cysK2, and cysM upon exposure to hypoxia. Multiple transcriptional profiling studies have reported upregulation of moeZ, mec, cysO and cysM genes when cells were subjected to oxidative and hypoxic stress [1, 6-11] further suggesting an increase in the biosynthesis of reduced metabolites such as cysteine and methionine and sulfur containing cell wall glycolipids upon exposure to oxidative stress [12]. We have modified the sentence to “significantly upregulated pathway in Mtb exposed to multiple host-like stresses”

      (2) For the quantification of the metabolites, it isn't clear how the abundance was calculated (e.g., were standards for each metabolite used? How was abundance normalised between samples?), and this information should be included to strengthen the data.

      Thanks for picking up this. We have extended our description of metabolomics methods. It now reads: “Due to the tendency of M. tuberculosis to form clamps, which significantly skews any cell number estimation we normalized samples to protein/peptide concentration using the BCA assay kit (Thermo). Therefore, our LC-MS data is expressed as ion counts/mg protein or ratios of that for the same metabolite. This is a standard way to express ion abundance data as it was done previously [13, 14].

      Furthermore, labelling with L-methionine was performed to determine the rate of synthesis of the L-cysteine-derived metabolites. L-cysteine is produced from L-methionine via the transsulfuration pathway, which is independent of CysM and CysK2. It is therefore difficult to interpret this experiment, as the impact of deleting CysM and CysK2 on the transsulfuration pathway is likely indirect.

      The reviewer may have misunderstood the experiment and the results presented. Labelling was not performed with L-methionine. We use 34S derived from SO42-, to monitor reductive assimilation of sulfur and its transit from S2- until L-methionine, passing through cysteine. We specified in material and methods that we have used sodium sulfate-34S (Merck 718882), as our label source of sulfur. This method was first employed in M. tuberculosis by the Bertozzi group to identify sulfolipids in mycobacteria. Therefore, we are not measuring transsulfuration, but instead direct synthesis of L-methionine via cysteine, and consequently we are indeed assessing the importance of cysK2 and cysM in this process. We have now added to the results section (page 9) that we employed (Na34SO4) for labeling, to make sure other readers will not think we are measuring transulfuration.

      (3) The ability of L-cysteine to rescue the survival defect of the Rv∆cysM and Rv∆cysK2 mutants in macrophages is interpreted as exogenous L-cysteine being able to compensate for reduced intracellular levels. However, there is no evidence that L-cysteine is being taken up by the mutants and an alternate explanation is that L-cysteine functions as an antioxidant within cells i.e., it reduces intracellular ROS.

      The concentration of L-cysteine used for peritoneal macrophage survival rescue experiments was titrated to have no minimum survival advantage in case of wild-type Rv. Thus, at the given concentration, we believe that the contribution of cysteine in reducing intracellular ROS within cells does not have a major role since there is no significant difference in the survival of wild-type Rv strain. Had cysteine reduced intracellular ROS, we would expect increased bacterial survival of Rv due to diminished oxidative stress. 

      Furthermore, L-cysteine addition also mitigates CHP induced survival defect in vitro [15] and nullifies observed effect of Cysteine inhibitors in vitro [16] suggesting that cysteine or cystine can be transported into Mtb. This has also been previously shown in case of AosR mutant strain [15], CysH [2] and over 70% uptake of exogenously added [35S] cysteine to a growing culture of Mtb [17].

      The authors sought to investigate the functional redundancy of the non-canonical L-cysteine synthases CysM and CysK2. While their distinct transcriptional response to oxidative stress suggests distinct physiological roles, the study did not explore these differences and therefore provides only preliminary insight into the underlying reasons for this observation. In the context of drug development, this work suggests that while L-cysteine synthase inhibitors do not have high potency for killing intracellular M. tuberculosis, they have the potential to decrease the pathogen's survival in the presence of host-derive stress.

      Reviewer #2 (Public Review):

      Summary:

      The paper examines the role L-cysteine metabolism plays in the biology of Mycobacterium tuberculosis. The authors have preliminary data showing that Mycobacterium tuberculosis has two unique pathways to synthesize cysteine. The data showing new compounds that act synergistically with INH is very interesting.

      Strengths:

      RNAseq data is interesting and important.

      Weaknesses:

      The paper would be strengthened if the authors were to add further detail to their genetic manipulations.

      The authors provide evidence that they have successfully made a cysK2 mutant by recombineering. This data looks promising, but I do not see evidence for the cysM deletion. It is also important to state what sort of complementation was done (multicopy plasmid, integration proficient vector, or repair of the deletion). Since these mutants are the basis for most of the additional studies, these details are essential. It is important to include complementation in mouse studies as unexpected loss of PDIM could have occurred.

      The details of CysM knockout generation have been previously published ([15]; Appendix Figure S4), and complementation strain details are provided in the methods section.  

      Reviewer #3 (Public Review):

      In this work, the authors conduct transcriptional profiling experiments with Mtb under various different stress conditions (oxidative, nitrosative, low pH, starvation, and SDS). The Mtb transcriptional responses to these stress conditions are not particularly new, having been reported extensively in the literature over the past ~20 years in various forms. A common theme from the current work is that L-cysteine synthesis genes are seemingly up-regulated by many stresses. Thus, the authors focused on deleting two of the three L-cysteine synthesis genes (cysM and cysK2) in Mtb to better understand the roles of these genes in Mtb physiology.

      The cysM and cysK2 mutants display fitness defects in various media (Sautons media, starvation, oxidative and nitrosative stress) noted by CFU reductions. Transcriptional profiling studies with the cysM and cysK2 mutants revealed that divergent gene signatures are generated in each of these strains under oxidative stress, suggesting that cysM and cysK2 have non-redundant roles in Mtb's oxidative stress response which likely reflects the different substrates used by these enzymes, CysO-L-cysteine and O-phospho-L-serine, respectively. Note that these studies lack genetic complementation and are thus not rigorously controlled for the engineered deletion mutations.

      The authors quantify the levels of sulfur-containing metabolites (methionine, ergothioneine, mycothiol, mycothionine) produced by the mutants following exposure to oxidative stress. Both the cysM or cysK2 mutants produce more methionine, ergothioneine, and mycothionine relative to WT under oxidative stress. Both mutants produce less mycothiol relative to WT under the same condition. These studies lack genetic complementation and thus, do not rigorously control for the engineered mutations.

      Next, the mutants were evaluated in infection models to reveal fitness defects associated with oxidative and nitrosative stress in the cysM or cysK2 mutants. In LPS/IFNg activated peritoneal macrophages, the cysM or cysK2 mutants display marked fitness defects which can be rescued with exogenous cysteine added to the cell culture media. Peritoneal macrophages lacking the NADPH oxidase (Phox) or IFNg fail to produce fitness phenotypes in the cysM or cysK2 mutants suggesting that oxidative stress is responsible for the phenotypes. Similarly, chemical inhibition of iNOS partly abrogated the fitness defect of the cysM or cysK2 mutants. Similar studies were conducted in mice lacking IFNg and Phox establishing that cysM or cysK2 mutants have fitness defects in vivo that are dependent on oxidative and nitrosative stress.

      Lastly, the authors use small molecule compounds to inhibit cysteine synthases. It is demonstrated that the compounds display inhibition of Mtb growth in 7H9 ADC media. No evidence is provided to demonstrate that these compounds are specifically inhibiting the cysteine synthases via "ontarget inhibition" in the whole Mtb cells. Additionally, it is wrongly stated in the discussion that "combinations of L-cys synthase inhibitors with front-line TB drugs like INH, significantly reduced the bacterial load inside the host". This statement suggests that the INH + cysteine synthase inhibitor combinations reduce Mtb loads within a host in an infection assay. No data is presented to support this statement.

      We agree with the reviewer that the experiments do not conclusively prove that these compounds specifically inhibit the cysteine synthases via "on-target inhibition" in the whole Mtb cells. However, the inhibitors used in this study have been previously profiled in vitro (https://www.sciencedirect.com/science/article/abs/pii/S0960894X17308405?via%3Dihub).  We have modified the sentence to “a combination of L-cysteine synthase inhibitors with front-line TB drugs like INH, significantly reduced the bacterial survival in vitro”

      References

      (1) Hatzios, S.K. and C.R. Bertozzi, The regulation of sulfur metabolism in Mycobacterium tuberculosis. PLoS Pathog, 2011. 7(7): p. e1002036.

      (2) Senaratne, R.H., et al., 5'-Adenosinephosphosulphate reductase (CysH) protects Mycobacterium tuberculosis against free radicals during chronic infection phase in mice. Mol Microbiol, 2006. 59(6): p. 1744-53.

      (3) Betts, J.C., et al., Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling. Mol Microbiol, 2002. 43(3): p. 717-31.

      (4) Hampshire, T., et al., Stationary phase gene expression of Mycobacterium tuberculosis following a progressive nutrient depletion: a model for persistent organisms? Tuberculosis (Edinb), 2004. 84(3-4): p. 228-38.

      (5) Schnappinger, D., et al., Transcriptional Adaptation of Mycobacterium tuberculosis within Macrophages: Insights into the Phagosomal Environment. J Exp Med, 2003. 198(5): p. 693-704.

      (6) Voskuil, M.I., et al., The response of mycobacterium tuberculosis to reactive oxygen and nitrogen species. Front Microbiol, 2011. 2: p. 105.

      (7) Voskuil, M.I., K.C. Visconti, and G.K. Schoolnik, Mycobacterium tuberculosis gene expression during adaptation to stationary phase and low-oxygen dormancy. Tuberculosis (Edinb), 2004. 84(3-4): p. 218-27.

      (8) Brunner, K., et al., Profiling of in vitro activities of urea-based inhibitors against cysteine synthases from Mycobacterium tuberculosis. Bioorg Med Chem Lett, 2017. 27(19): p. 4582-4587.

      (9) Manganelli, R., et al., Role of the extracytoplasmic-function sigma factor sigma(H) in Mycobacterium tuberculosis global gene expression. Mol Microbiol, 2002. 45(2): p. 365-74.

      (10) Burns, K.E., et al., Reconstitution of a new cysteine biosynthetic pathway in Mycobacterium tuberculosis. J Am Chem Soc, 2005. 127(33): p. 11602-3.

      (11) Manganelli, R., et al., The Mycobacterium tuberculosis ECF sigma factor sigmaE: role in global gene expression and survival in macrophages. Mol Microbiol, 2001. 41(2): p. 423-37.

      (12) Tyagi, P., et al., Mycobacterium tuberculosis has diminished capacity to counteract redox stress induced by elevated levels of endogenous superoxide. Free Radic Biol Med, 2015. 84: p. 344-354.

      (13) de Carvalho, L.P., et al., Metabolomics of Mycobacterium tuberculosis reveals compartmentalized co-catabolism of carbon substrates. Chem Biol, 2010. 17(10): p. 1122-31.

      (14) Agapova, A., et al., Flexible nitrogen utilisation by the metabolic generalist pathogen Mycobacterium tuberculosis. Elife, 2019. 8.

      (15) Khan, M.Z., et al., Redox homeostasis in Mycobacterium tuberculosis is modulated by a novel actinomycete-specific transcription factor. EMBO J, 2021. 40(14): p. e106111.

      (16) Brunner, K., et al., Inhibitors of the Cysteine Synthase CysM with Antibacterial Potency against Dormant Mycobacterium tuberculosis. J Med Chem, 2016. 59(14): p. 6848-59.

      (17) Wheeler, P.R., et al., Functional demonstration of reverse transsulfuration in the Mycobacterium tuberculosis complex reveals that methionine is the preferred sulfur source for pathogenic Mycobacteria. J Biol Chem, 2005. 280(9): p. 8069-78.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure S1 it would be useful to include the reverse transsulfuration pathway given that it contributes to the L-cysteine pool, and that L-methionine was used for metabolite labelling experiments.

      We are in agreement with the reviewer’s suggestion, and we have included reverse transsulfuration in Fig S1. Please note that Labelling was not performed with L-methionine. We used 34S derived from SO42-to monitor the reductive assimilation of sulfur and its transit from S2- until Lmethionine, passing through cysteine. We specified in material and methods that we have used sodium sulfate-34S (Merck 718882), as our label source of sulfur. This method was first employed in M. tuberculosis by the Bertozzi group to identify sulfolipids in mycobacteria. Therefore, we are not measuring transsulfuration but instead a direct synthesis of Lmethionine via cysteine, and consequently, we are indeed assessing the importance of cysK2 and cysM in this process. We have now added to the results section (page 9) that we employed (Na34SO4) for labeling to make sure other readers will not think we are measuring transulfuration.

      Author response image 1.

      (2) In Figure S2 it is unclear why the control is included in this figure given that the stress conditions were compared to the control. What is the control being compared to here?

      The heat maps of controls have been included to demonstrate relative gene expression in independent/each of the replicates. The normalized count for the differentially expressed genes are plotted. To better understand the RNA-seq results, we plotted the fold change of differentially expressed genes due to different stress conditions (New figure & table- Figure S3 & Table S2). This allowed us to understand the expression profile of genes in all the stress conditions simultaneously, regardless of whether they were identified as differentially expressed. The data revealed that specific clusters of genes are up- and downregulated in oxidative, SDS, and starvation conditions. In comparison, the differences observed in the pH 5.5 and nitrosative conditions were limited (Figure S3 & Table S2).  

      (3) In Figure S3 it would be more informative to show fold-enrichment than gene counts in (b) to (f).

      In our opinion, gene counts are more informative when plotting GO enrichments, as the number of genes in each GO category can vary drastically. The significance values are already calculated based on the fold enrichment of a category compared to the background, and hence, p-adj values plotted on the x-axis can be sort of a proxy for fold enrichment. Hence, instead of plotting two related variables, plotting the total gene counts that belonged to a category is usually helpful for the reader in understanding the “scale” in which a category is affected.

      (4) Figure 1c standard Sautons is a defined media, and is not nutrient-limiting - the authors should clarify the composition of the media that they used here.

      The composition of Sautons media used in the study is 0.5g/L MgSO4.7H20, 2 g/L citric acid, 1g/L L-asparagine, 0.3 g/L KCl.H20, 0.2% glycerol, 0.64 g/L FeCl3, 100 μM NH4Cl and 0.7 g/L K2HPO4.3H20. We have modified the sentence in line with reviewer’s suggestion.  

      (5) The authors claim that the distinct transcriptomes for the two mutants indicate that "CysM and CysK2 distinctly modulate 324 and 1104 genes". The effect is likely due to distinct downstream consequences of the deletions, rather than direct regulation by the synthases. This section should be reworded for clarity.

      We have modified the sentence in line with reviewer’s suggestion.

      (6) In Figure 3 it would be useful to express mycothione levels as a percentage of the total mycothiol pool to give an indication of the extent to which the thiol is being oxidised.

      While we appreciate reviewer’s suggestion, we cannot make ratios of IC for two different compounds, as they ionize different. 100 ion counts of one does NOT equal to 100 ion counts of the other.

      (7) Figure 6 is difficult to interpret as the concentrations used in the INH + inhibitor wells are not clear. It would be useful to indicate the concentrations of each compound added next to the wells in the figure.

      We have modified the figure and legends in line with reviewer’s suggestion

      Reviewer #2 (Recommendations For The Authors):

      (1) Document the cysM deletion.

      The details of CysM knockout generation have been previously published ([15]; Appendix Figure S4), and complementation strain details are provided in the methods section. 

      (2) The oxidative stress CHP is not defined in the figure legend.

      We have modified the legend in line with the reviewer’s suggestion.

      (3) Can we see the structures of the compounds?

      Kindly refer to Fig 6a for the structures of compounds 

      (4) Fix the genetics and the paper is very interesting.

      I might be missing something. The authors do provide promising complementation data for several of the stresses. Provide evidence for the cysM deletion and complementation and the data will be very compelling. The focus of the paper is important for our understanding of the biology of Mycobacterium tuberculosis.

      Thank you for appreciating our study. The details of CysM knockout and complementation strain generation have been previously published ([15]; Appendix Figure S4 & Methods)). CysK2 mutant and complementation strain details are included in the present manuscript (Figure 1b & Methods).

      Reviewer #3 (Recommendations For The Authors):

      The transcriptional profiling studies do not rigorously control for the engineered mutations using genetic complementation.

      The complementation strains used in all in vitro, ex vivo and in vivo experiments showcase that the phenotypes associated with knockouts are gene specific. We choose not to include complementation strains in RNA sequencing experiments due to the large number of samples handling and associated costs.  

      Figure 3. These data are not rigorously controlled without genetic complementation, explain why some data in Figure 3 was generated at 24 hr and other data was generated at 48 hr, remove subbars in 3g. Please provide more clarification on Fig 3e-g because the normalization in these panels makes it appear as if there is little- or no-difference in the levels of 34S incorporation into the thiol metabolites.

      The complementation strains used in all in vitro, ex vivo, and in vivo experiments showcase that the phenotypes associated with knockouts are gene-specific. We chose not to include complementation strains in Figure 3 experiments due to the large number of sample handling and associated costs. 

      The time points in the given experiment were chosen based on an initial pilot experiment. It is apparent that a longer duration is required to see the phenotypes associated with labelling compared to pool size. The differences observed are statistically significant. 

      Surfactant and SDS stress are used interchangeably in the text, legends, and figures. Please be consistent here.

      We have modified the text in line with reviewer’s suggestion.

      Consider re-wording the 1st paragraph on page 5 to better clarify how Trp, Lys, and His interact with the host immune cells.

      We have modified the text in line with reviewer’s suggestion.

      Cite the literature associated with the sulfur import system in Mtb on page 3 in the 2nd paragraph.

      We have modified the text in line with reviewer’s suggestion.

      The manuscript nicely describes the construction of a cysK2 mutant. It is unclear how the cysM mutant was generated. Please clarify, cite, or add the cysM mutant construction to this manuscript.

      The details of CysM knockout and complementation strain generation has been previously published ([15]; Appendix Figure S4 & Methods)). We have included the citation in the methods section of current manuscript.

      Provide evidence that the small molecules used in Fig 6 are on target and inhibit the cysteine biosynthetic enzymes in whole bacteria. It is unclear how a MIC can be determined with these compounds in 7H9 ADC when deletion mutants grow just fine in this media. Is this because the compounds inhibit multiple cysteine synthesis enzymes and/or enzymatic targets in other pathways? To me, the data suggests that the compounds are hitting multiple enzymes in whole Mtb cells. Does cysteine supplementation reverse the inhibitory profiles with the compounds in Figure 6?

      As mentioned in the text, all the compounds were ineffective in killing Mtb, likely because Lcysteine synthases are not essential during regular growth conditions. Hence, the MIC for cysteine inhibitors was very high - C1 (0.6 mg/ml), C2 (0.6 mg/ml), and C3 (0.15 mg/ml) opposed to the standard drug, isoniazid with MIC of 0.06 ug/ml. We agree with the reviewer that the experiments do not conclusively prove that these compounds specifically inhibit the cysteine synthases via "on-target inhibition" in  Mtb cells. The inhibitors used in this study have been previously profiled in vitro [8]. However, one cannot rule out the hypothesis that these compounds might also have some off-target effects.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study advances our understanding of the allosteric regulation of anaerobic ribonucleotide reductases (RNRs) by nucleotides, providing valuable new structural insight into class III RNRs containing ATP cones. The cryo-EM structural characterization of the system is solid, but some open questions remain about the interpretation of activity/binding assays and the newly incorporated HDX-MS results. The work will be of interest to biochemists and structural biologists working on ribonucleotide reductases and other allosterically regulated enzymes.

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of this study is to understand the allosteric mechanism of overall activity regulation in an anaerobic ribonucleotide reductase (RNR) that contains an ATP-cone domain. Through cryo-EM structural analysis of various nucleotide-bound states of the RNR, the mechanism of dATP inhibition is found to involve order-disorder transitions in the active site. These effects appear to prevent binding of substrate and a radical transfer needed to initiate the reaction.

      Strengths of the manuscript include the comprehensive nature of the work - including both numerous structures of different forms of the RNR and detailed characterization of enzyme activity to establish the parameters of dATP inhibition. The manuscript has been improved in a revision by performing additional experiments to help corroborate certain aspects of the study. But these new experiments do not address all of the open questions about the structural basis for mechanism. Additionally, some questions about the strength of biochemical data and fit of binding or kinetic curves to data that were raised by other referees still remain. Some experimental observations are not consistent with the proposed model. For example, why does dATP enhance Gly radical formation when the proposed mechanism of dATP inhibition involves disorder in the Gly radical domain?

      The work is impactful because it reports initial observations about a potentially new mode of allosteric inhibition in this enzyme class. It also sets the stage for future work to understand the molecular basis for this phenomenon in more detail.

      We express our gratitude to the reviewer for dedicating time to review our work and for the overall favorable assessment. We agree that the question of exactly how much the glycyl radical domain becomes more mobile without losing the glycyl radical entirely is an unresolved one but we also think that our work sets a solid basis for future experiments by us and others.

      Reviewer #3 (Public Review):

      The manuscript by Bimai et al describes a structural and functional characterization of an anaerobic ribonucleotide reductase (RNR) enzyme from the human microbe, P. copri. More specifically, the authors aimed to characterize the mechanism by how (d)ATP modulates nucleotide reduction in this anaerobic RNR, using a combination of enzyme kinetics, binding thermodynamics, and cryo-EM structural determination, complemented by hydrogen-deuterium exchange (HDX). One of the principal findings of this paper is the ordering of a NxN 'flap' in the presence of ATP that promotes RNR catalysis and the disordering (or increased protein dynamics) of both this flap and the glycyl radical domain (GRD) when the inhibitory effector, dATP, binds. The latter is correlated with a loss of substrate binding, which is the likely mechanism for dATP inhibition. It is important to note that the GRD is remote (>30 Ang) from the binding site of the dATP molecule, suggesting long-range communication of the structural (dis)ordering. The authors also present evidence for a shift in oligomerization in the presence of dATP. The work does provide evidence for new insights/views into the subtle differences of nucleotide modulation (allostery) of RNR, in a class III system, through long-range interactions.

      The strengths of the work are the impressive, in-depth structural analysis of the various regulated forms of PcRNR by (d)ATP using cryo-EM. The authors present seven different models in total, with striking differences in oligomerization and (dis)ordering of select structural features, including the GRD that is integral to catalysis. The authors present several, complementary biochemical experiments (ITC, MST, EPR, kinetics) aimed at resolving the binding and regulatory mechanism of the enzyme by various nucleotides. The authors present a good breadth of the literature in which the focus of allosteric regulation of RNRs has been on the aerobic orthologues.

      The addition of hydrogen-deuterium exchange mass spectrometry (HDX-MS) complements the results originating from cryo-EM data. Most notably, is the observation of the enhanced exchange (albeit quite subtle) of the GRD domain in the presence of dATP that matches the loss of structural information in this region in the cryo-EM data. The most pronounced and compelling HDX results are seen in the form of dATP-induced protection of peptides immediately adjacent to the b-hairpin at the s-site, where dATP is expected to bind based on cryo-EM. It is clear that the presence of dATP increases the rigidity of this region.

      We are happy that both reviewers find the HDX-MS experiments to be a valuable addition to the existing data.

      Weaknesses:

      The discussion of the change in peptide mobility in the N-terminal region is complicated by the presence of bimodal mass spectral features and this may prevent detailed interpretation of the data, especially for select peptide region that shows opposite trends upon nucleotide association.

      Further, the HDX data in the NxN flap is unchanged upon nucleotide binding (ATP, dATP, or CTP), despite changes observed in the cryo-EM data.

      We are grateful to the reviewer for the comprehensive feedback on the HDX-MS part and for identifying areas for improvement. The HDX analysis was of course undertaken with the intention of identifying differences in disorder of the NxN flap and GRD region. From an HDX perspective both regions were found to be highly susceptible to HDX regardless of state/ligand, due to surface accessibility and/or very fast dynamics. However, this does not mean that there is no difference in the degree of order of these regions upon ligand addition, simply that we with HDX-MS, in the limited time span of 30-3000 seconds, could not conclusively support an increased disorder. We have rephrased the discussion text to reflect this fact

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On page 5 (and throughout the manuscript) there are some inconsistencies in how dissociation constants for effectors and inhibitors are described - for example, D in KD is sometimes subscripted and sometimes not.

      Thank you for noticing these remaining errors. We hope that we have fixed all of them now.

      Reviewer #3 (Recommendations For The Authors):

      The authors addressed many of the initial concerns raised. The addition of the HDX-MS data in this revision is a welcomed contribution to the work and complements the cryo-EM data. In select cases, the data may be over-interpreted. This reviewer suggests that the authors revise the text in this section so that it is more consistent with the presented data.

      Specific points:

      (1) The bimodal mass spectral features in the N-terminal domain complicate the data interpretation. Specifically for peptides in 81-99 region, the fast exchanging feature shows protection in the presence of (d)ATP/CTP, but the opposite trend is observed for the slow exchanging species. It is therefore advisable to not make absolutes about the HDX results in this region, as the data are complicated.

      As stated by the reviewer, it is not possible from the presented HDX data to deduce if this is a result of 50% loaded dimer or the oligomerization state of the protein. We have remedied this by removing mentions of a difference between the dATP and ATP in bimodality. Also, we have addressed this in the text by stating that the main reason is most likely the different oligomerization states present in solution. Nevertheless, it is clear from the HDX data that the N-terminal region and 81-99 are very interesting, and it was somewhat disappointing that due to the dynamics of the oligomerization it was not possible to SEC-purify pure dimer or tetramer samples for HDX-MS, in order to deconvolute the cause.

      (2) Related to #1, the authors assign the bimodal HDX behavior to EX1 mechanism, but this is not necessarily (and unlikely) true based on the limited time points. The authors also state that it originates from the heterogeneity of the sample: "a mixture of states" which could reflect the mixture of oligomerization states. The authors should be careful assigning EX1 mechanism unless there are compelling results to support it.

      We apologize for the unfortunate phrasing. It was not our intention to imply that the bimodality is due to true EX1 kinetics. See the above answer. The mention of EX1 has been removed from the discussion text.

      (3) The deuterium uptake for peptide 118-126 is very small (~1Da) compared to the length of the peptide. The change in deuterium uptake (<0.25Da) from dATP is very small; the authors should proceed with caution when presenting interpretations of such small differences.

      We agree with the reviewer that extra caution should be taken when dealing with such a small difference. However, the 118-126 peptide has been significance tested in both HDExaminer and Deuteros 2.0, and we also observed this for more than one run. The difference in uptake is small but increases to significance at the longer labelling times. The proximity to the NxN flap makes it interesting in context of an allosteric conformational change. i.e the dynamics of the NxN might be too fast so we can only see some secondary effects. We would like to keep the data  in Figure 10 for reasons of transparency. In essence this is similar to the observed bimodality mentioned above: we cannot fully explain the observation but present the data as it was observed.

      (4) On p. 22, the authors should consider revising the following statement: "confirming dATP binding to the s-site." Even though the HDX data are most compelling for the protection of peptides 178-204 and 330-348 that are adjacent to the beta-hairpin at the s-site, these data cannot "confirm" a binding site for a small molecule, such as dATP.

      We appreciate that the reviewer has pointed out that the statement can be misleading, and we agree that the binding site of small molecules can’t be confirmed based solely on HDX data. The sentence reformulated to clarify that the binding site was confirmed based on the combined evidence of HDX data and the previously presented biochemical and structural data on the s-site.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This manuscript reports important in vitro biochemical and in planta experiments to study the receptor activation mechanism of plant membrane receptor kinase complexes with non-catalytic intracellular kinase domains. Several lines of evidence convincingly show that one such putative pseudokinase, the immune receptor EFR achieves an active conformation following phosphorylation by a co-receptor kinase, and then in turn activates the co-receptor kinase allosterically to enable it to phosphorylate down-stream signaling components. This manuscript will be of interest to scientists focusing on cell signalling and allosteric regulation.

      We wish to clarify that EFR is itself, not a pseudokinase. We could show in previous work (Bender et al., 2021; https://doi.org/10.1073/pnas.2108242118 ) that EFR has catalytic activity in vitro. This catalytic activity is, however, not required for elf18-induced immune signaling in planta.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      The authors use an elegant but somewhat artificial heterodimerisation approach to activate the isolated cytoplasmic domains of different receptor kinases (RKs) including the receptor kinase BRI1 and EFR. The developmental RK BRI1 is known to be activated by the co-receptor BAK1. Active BRI1 is then able to phosphorylate downstream substrates. The immune receptor EFR is also an active protein kinase also activated by the co-receptor BAK1. EFR however appears to have little or no kinase activity but seems to use an allosteric mechanism to in turn enable BAK1 to phosphorylate the substrate kinase BIK1. EFR tyrosine phosphorylation by BAK1 appears to trigger a conformational change in EFR, activating the receptor. Likewise, kinase activating mutations can cause similar conformational transitions in EFR and also in BAK1 in vitro and in planta.

      We wish to clarify that we make no strong link between tyrosine phosphorylation and the conformational change leading to activation of the complex. Rather, the HDX-MS data demonstrate the structural importance of Tyr836 for the activation mechanism. At present, we do not know how phosphorylation of the residue would affect the activation process.

      Strengths:

      I particularly liked The HDX experiments coupled with mutational analysis (Fig. 2) and the design and testing of the kinase activating mutations (Fig. 3), as they provide novel mechanistic insights into the activation mechanisms of EFR and of BAK1. These findings are nicely extended by the large-scale identification of EFR-related RKs from different species with potentially similar activation mechanisms (Fig. 5).

      Weaknesses:

      In my opinion, there are currently two major issues with the present manuscript. (1) The authors have previously reported that the EFR kinase activity is dispensible for immune signaling (https://pubmed.ncbi.nlm.nih.gov/34531323/) but the wild-type EFR receptor still leads to a much better phosphorylation of the BIK1 substrate when compared to the kinase inactive D849N mutant protein (Fig. 1). (2) How the active-like conformation of EFR is in turn activating BAK1 is poorly characterized, but appears to be the main step in the activation of the receptor complex. Extending the HDX analyses to resting and Rap-activated receptor complexes could be a first step to address this question, but these HDX studies were not carried out due to technical limitations.

      Overall this is an interesting study that aims to advance our understanding of the activation mechanisms of different plant receptor kinases with important functions in plant immunity.

      Reviewer #2 (Public Review):

      Summary:

      Transmembrane signaling in plants is crucial for homeostasis. In this study, the authors set out to understand to what extent catalytic activity in the EFR tyrosine kinase is required in order to transmit a signal. This work was driven by mounting data that suggest many eukaryotic kinases do not rely on catalysis for signal transduction, relying instead on conformational switching to relay information. The crucial findings reported here involve the realisation that a kinase-inactive EFR can still activate (ie lead to downstream phosphorylation) of its partner protein BAK1. Using a convincing set of biochemical, mass spectrometric (HD-exchange) and in vivo assays, the team suggest a model in which EFR is likely phosphorylated in the canonical activation segment (where two Ser residues are present), which is sufficient to generate a conformation that can activate BAK1 through dimersation. A model is put forward involving C-helix positioning in BAK1, and the model extended to other 'non-RD' kinases in Arabidopsis kinases that likely do not require kinase activity for signaling.

      We prefer not to describe EFR as a tyrosine kinase. It may be the case that EFR can function under certain conditions as a dual-specificity protein kinase, but this has never been demonstrated experimentally. We therefore describe EFR as a Ser/Thr protein kinase, since it is known that the isolated cytoplasmic domain can phosphorylate on Ser and Thr residues (Wang et al., 2014; https://doi.org/10.1016/j.jprot.2014.06.009).

      Strengths:

      The work uses logical and well-controlled approaches throughout, and is clear and convincing in most areas, linking data from IPs, kinase assays (including clear 32P-based biochemistry), HD-MX data (from non-phosphorylated EFR) structural biology, oxidative burst data and infectivity assays. Repetitions and statistical analysis all appear appropriate.

      Overall, the work builds a convincing story and the discussion does a clear job of explaining the potential impact of these findings (and perhaps an explanation of why so many Arabidopsis kinases are 'pseudokinases', including XPS1 and XIIa6, where this is shown explicitly).

      Weaknesses:

      No major weaknesses are noted from reviewing the data and the paper follows a logical course built on solid foundations; the use of Tables to explain various experimental data pertinent to the reported studies is appreciated.

      (1) The use of a, b,c, d in Figures 2C and 3C etc is confusing to this referee, and is now addressed in the latest version

      (2) The debate about kinase v pseudokinases is well over a decade old. For non-experts, the kinase alignments/issues raised are in PMID: 23863165 and might prove useful if cited.

      We have cited the suggested reference in the second paragraph of the discussion.

      (3) Early on in the paper, the concept of kinases and pseudokinases related to R-spine (and extended R-spine) stability and regulation really needs to be more adequately introduced to explain what comes next; e.g. some of the key work in this area for RAF and Tyr kinases where mutual F-helix Phe amino acid changes are evaluated (conceptually similar to this study of the E-helix Tyr to Phe changes in EFR) should be cited (PMID: 17095602, 24567368 and 26925779).

      As an alternative, we have amended the text in several places to focus on conformational toggling between active/inactive states rather than R-spine stability. We think that this keeps the message of our manuscript focused. We hope that the reviewer finds this acceptable.

      (4) In my version, some of the experimental text is also currently in the wrong order (and no page numbers, so hard for me to state exactly where in the manuscript); However, I am certain that Figure 2C is mentioned in the text when the data are actually shown in Figure 3C for the EFR-SSAA protein.

      Indeed, some references to Figure 2 in the text were incorrect. We have corrected these. References in the text to Figure 3 and the data reported therein are correct.

      (5) Tyr 156 in PKA is not shown in Supplement 1, 2A as suggested in the text; for readers, it will be important to show the alignment of the Tyr residue in other kinases; this has been updated in the second version. Although it is clearly challenging to generate phosphorylated EFR (seemingly through Codon-expansion here?), it appears unlikely that a phosphorylated EFR protein, even semi-pure, couldn't have been assayed to test the idea that the phosphorylation drives/supports downstream signaling. What about a DD or EE mutation, as commonly used (perhaps over-used) in MEK-type studies?

      Our aim with codon expansion was to generate recombinant protein carrying high-stoichiometry phosphorylation at sites which we have previously documented to be required for downstream signaling (Macho et al., 2014; Bender et al., 2021). We additionally demonstrated previously that a DD mutant of the activation loop sites in EFR does not fully complement the efr-1 mutant (Bender et al., 2021), suggesting that the Asp mutations are not good phospho-mimics in this context. We therefore did not generate DD or EE mutations for in vitro studies.

      Impact:

      The work is an important new step in the huge amount of follow-up work needed to examine how kinases and pseudokinases 'talk' to each other in (especially) the plant kingdom, where significant genetic expansions have occurred. The broader impact is that we might understand better how to manipulate signaling for the benefit of plants and mankind; as the authors suggest, their study is a natural progression both of their own work, and the kingdom-wide study of the Kannan group.

      Reviewer #3 (Public Review):

      The study presents strong evidence for allosteric activation of plant receptor kinases, which enhances our understanding of the non-catalytic mechanisms employed by this large family of receptors.

      Plant receptor kinases (RKs) play a critical role in transducing extracellular signals. The activation of RKs involves homo- or heterodimerization of the RKs, and it is believed that mutual phosphorylation of their intracellular kinase domains initiates downstream signaling. However, this model faces a challenge in cases where the kinase domain exhibits pseudokinase characteristics. In their recent study, Mühlenbeck et al. reveal the non-catalytic activation mechanisms of the EFR-BAK1 complex in plant receptor kinase signaling. Specifically, they aimed to determine that the EFR kinase domain activates BAK1 not through its kinase activity, but rather by utilizing a "conformational toggle" mechanism to enter an active-like state, enabling allosteric trans-activation of BAK1. The study sought to elucidate the structural elements and mutations of EFR that affect this conformational switch, as well as explore the implications for immune signaling in plants. To investigate the activation mechanisms of the EFR-BAK1 complex, the research team employed a combination of mutational analysis, structural studies, and hydrogen-deuterium exchange mass spectrometry (HDX-MS) analysis. For instance, through HDX-MS analysis, Mühlenbeck et al. discovered that the EFR (Y836F) mutation impairs the accessibility of the active-like conformation. On the other hand, they identified the EFR (F761H) mutation as a potent intragenic suppressor capable of stabilizing the active-like conformation, highlighting the pivotal role of allosteric regulation in BAK1 kinase activation. The data obtained from this methodology strengthens their major conclusion. Moreover, the researchers propose that the allosteric activation mechanism may extend beyond the EFR-BAK1 complex, as it may also be partially conserved in the Arabidopsis LRR-RK XIIa kinases. This suggests a broader role for non-catalytic mechanisms in plant RK signaling.

      The allosteric activation mechanism was demonstrated for receptor tyrosine kinases (RTKs) many years ago. A similar mechanism has been suggested for the activation of plant RKs, but experimental evidence for this conclusion is lacking. Data in this study represent a significant advancement in our understanding of non-catalytic mechanisms in plant RK signaling. By shedding light on the allosteric regulation of BAK1, the study provides a new paradigm for future research in this area.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have considered points 1-5 raised in my initial review and the revised manuscript contains a more balanced discussion and limitation section. No additional experiments have been performed to substantiate the envisioned allosteric activation mechanism of the co-receptor kinase BAK1 by the receptor EFR. I rewrote the public statement accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Thanks for responding to my comments.

      Reviewer #3 (Recommendations For The Authors):

      The revised manuscript has fully addressed my previous concerns and is now suitable for publication in eLife.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      Using concurrent in vivo whole-cell patch clamp and dendritic calcium imaging, the authors characterized how functional synaptic inputs across dendritic arborizations of mouse primary visual cortex layer 2/3 neurons emerge during the second postnatal week. They were able to identify spatially and functionally separated domains of clustered synapses in these neurons even before eye-opening and characterize how the clustering changes from P8 to P13. 

      Strengths: 

      The work is technically challenging and the findings are novel. The results support previous EM and immunostaining studies but provide in vivo evidence on the time course and the trajectory of how functional synaptic input develops. 

      Weaknesses: 

      There are some missing details about how the experiments were performed, and I also have some questions about the analyses. 

      We have now added a more detailed description of the methods and added new supplemental figures and descriptions to clarify our analyses. Please find our responses to the specific points of this reviewer in the section “Recommendations for the authors” below.

      Reviewer #2 (Public Review):

      In this study, Leighton et al performed remarkable experiments by combining in-vivo patch-clamp recording with two-photon dendritic Ca2+ imaging. The voltage-clamp mode is a major improvement over the pioneer versions of this combinatorial experiment that has led to major breakthroughs in the neuroscience field for visualizing and understanding synaptic input activities in single cells in-vivo (sharp electrodes: Svoboda et al, Nature 1997, Helmchen et al, Nature Neurosci 1999; whole-cell current-clamp: Jia et al, Nature 2010, Chen et al, Nature 2011. I suggest that these papers would be cited). This is because in voltage-clamp mode, despite the full control of membrane voltage in-vivo not being realistic, is nevertheless most effective in preventing back-propagation action potentials, which would severely confound the measurement of individual synaptically-induced Ca2+ influx events. Furthermore, clamping the cell body at a strongly depolarized potential (here the authors did -30mV) also facilitates the detection of synaptically-induced Ca2+ influx. As a result, the authors successfully recorded high-quality Ca2+ imaging data that can be used for precise analysis. To date, even in view of the rapid progress of voltage-sensitive indicators and relevant imaging technologies in recent years, this very old 'art' of combining single-cell electrophysiology and two-photon imaging (ordinary, raster-scanned, video-rate imaging) of Ca2+ signals still enables measurements of the best level precision. 

      We thank the reviewer for reminding us of these important previous studies that we cite now in the revised manuscript. 

      On the other hand, the interpretation of data in this study is a bit narrow-minded and lacks a comprehensive picture. Some suggestions to improve the manuscript are as follows: 

      (1) The authors made a segregation of 'spine synapse' and 'shaft synapse' based solely on the two photon images in-vivo. However, caution shall be taken here, because the optical resolution under in vivo imaging conditions like this cannot reliably tell apart whether a bright spot within or partially overlapping a segment of the dendrite is a spine on top of (or below) it. Therefore, what the authors consider as a 'shaft synapse' (by detecting Ca2+ hotspots) has an unknown probability of being just a spine on top or below the dendrite. If there is other imaging data of higher axial resolution to validate or calibrate, the authors shall take some further considerations or analysis to check the consistency of their data, as the authors do need such a segregation between spine and shaft synapses to show how they evolve over the brain development stages. 

      We agree with the reviewer that the differentiation between spine and sha synapses can be difficult for those spines that are located above or below the dendric sha in the z-dimension because of the lower resolution of 2-photon microscopy in the z-dimension compared to the image plane. We have now added a new paragraph to the Methods section to describe in more detail how we identify spine and sha synapses and provide more examples in a new supplementary figure (Fig S5). We believe that we can identify spine and sha synapses reliably in most cases, but added a cautionary note to make the reader aware of potential misidentifications.

      (2) The use of terminology 'bursts of spontaneous inputs' for describing voltage-clamp data seems improper. Conventionally, 'burst' refers to suprathreshold spike firing events, but here, the authors use 'burst' to refer to inward synaptic currents collected at the cell body. Not every excitatory synaptic input (or ensemble of inputs) activation will lead to spike firing under naturalistic conditions, therefore, these two concepts are not equivalent. It is recommended to use 'barrage of inputs' instead of 'burst of inputs'. Imagine a full picture of the entire dendritic tree, the fact that the authors could always capture spontaneous Ca2+ events here and there within a few pieces of dendrites within an arbitrary field-of-view suggests that, the whole dendritic tree must have many more such events going on as a barrage while the author's patch electrode picks up the summed current flow from the whole dendritic tree. 

      We agree with the reviewer that “barrage” is a clearer term for multiple synaptic inputs occurring simultaneously and therefore we changed the terminology throughout the manuscript.

      (3) Following the above issue, an analysis of the temporal correlation between synaptic (not segregating 'spine' or 'shaft') Ca2+ events and EPSCs is absent. Again, the authors drew arbitrary time windows to clump the events for statistical analysis. However, the demonstrated example data already shows that the onset times of individual synaptic Ca2+ events do not necessarily align with the beginning of a 'barrage' inward current event. 

      The reviewer writes that “an analysis of the temporal correlation between synaptic calcium events and EPSCs is absent”. We would like to point out that we did determine the percentage of calcium transients that occurred during barrages of synaptic inputs (~60%, page 7). This is important, since the barrages in our patch-clamp recordings most likely reflect spontaneous network events as described in the developing cortex previously by us and many other labs . The time window we chose was not “arbitrary” as the reviewer suggests, but based on the duration of the barrages of synaptic inputs as defined in the Methods section. 

      The reason, why we did not perform a more in-depth analysis of the temporal relationship between synaptic calcium transients and synaptic input currents is that it is essentially impossible to relate calcium transients at individual synapses to specific synaptic input events. First, during barrages of synaptic inputs many synapses are active simultaneously, both in the mapped dendrites as well as in the un-observed parts of the dendric arborization as the reviewer notes above. Thus, barrages cannot be broken down into individual synaptic transmission events. Second, since our acquisition frequency is ~10 Hz, we can identify the onset of individual synaptic calcium transients with 100-200 ms precision (1 or 2 frames). However, throughout any 100-200 ms period of recording, several synapses are active across the entire dendric arborization such that we cannot assign a given calcium transient to a specific EPSC within a 100-200 ms epoch. Third, due to the limited clamping capacity of in vivo patch recordings, we cannot be certain that individual transmission events in distal dendrites can be resolved in the patch recording.

      (4) The authors claim that "these observations indicate that the activity patterns investigated here are not or only slightly affected by low-level anesthesia". It would be nice to show some of the recordings in this work without any anesthesia to support this claim. 

      Indeed, the conclusion that the patterns of activity are only slightly affected by low levels of anesthesia is based on our previous recordings on the network level. Unfortunately, we are still not able to record calcium imaging with single synapse resolution in unanesthezed developing mice (and no one else is as far as we know), because the skull of these young animals is not firm, yet. As a consequence, movements cannot be reduced sufficiently for patching and imaging with single synapse resolution. Our previously published (Siegel et al., 2012) and unpublished work on the cellular level suggests that activity patterns during light anesthesia are very similar to those during sleep in mouse pups at this age.

      Reviewer #3 (Public Review):

      Summary: 

      There is a growing body of litterature on the clustering of co-active synapses in adult mice, which has important implications for understanding dendritic integration and sensory processing more broadly. However, it has been unclear when this spatial organization of co-active synapses arises during development. In this manuscript, Leighton et al. investigate the emergence of spatially organized, coactive synapses on pyramidal dendrites in the mouse visual cortex before eye-opening. They find that some dendrite segments contain highly active synapses that are co-active with their neighbors as early as postnatal day (P) 8-10, and that these domains of co-active synapses increase their coverage of the dendritic arbor by P12-13. Interestingly, Leighton et al. demonstrate that synapses co-active with their neighbors are more likely to increase their activity across a single recording session, compared to synapses that are not co-active with their neighbors, suggesting local plasticity driven by coincident activity before eye-opening. 

      The current manuscript includes some replication of earlier results from the same research group (Winnubst et al., 2015), including the presence of clustered, co-active synapses in the visual cortex of mouse pups, and the finding that synapses co-active with their neighbors show an increase in transmission frequency during a recording session. The main novelty in the current study compared to Winnubst et al. (2015) is the inclusion of younger animals (P8-13 in the current study compared to P10-15 in Winnubst et al., 2015). The current manuscript is the first demonstration that active synapses are clustered on specific dendrite segments as early as P8-10 in the mouse visual cortex, and the first to show the progression in active synapse distribution along the dendrite during the 2nd postnatal week. These results from the visual cortex may help inform our understanding of sensory development more broadly. 

      Strengths: 

      The authors ask a novel question about the emergence of synaptic spatial organization, and they use well-chosen techniques that directly address their questions despite the challenging nature of these techniques. To capture both structural and functional information from dendrites simultaneously, the authors performed a whole-cell voltage clamp to record synaptic currents arriving at the soma while imaging calcium influx at individual synaptic sites on dendrites. The simultaneous voltage clamp and calcium imaging allowed the authors to isolate individual synaptic inputs without their occlusion by widespread calcium influx from back-propagating action potentials. Achieving in vivo dendrite imaging in live mice that are as young as P8 is challenging, and the resulting data provides a unique view of synaptic activity along individual dendrites in the visual cortex at an early stage in development that is otherwise difficult to assess. 

      The authors provide convincing evidence that synapses are more likely to be co-active with their neighbors compared to synapses located farther away (Fig. 6F-H), and that synapses co-active with their neighbors increase their transmission frequency during a recording session (Figure 7C). These findings are particularly interesting given that the recordings occur before eye-opening, suggesting a relationship between co-activity and local synaptic plasticity even before the onset of detailed visual input. These results replicate previously published findings from P10-15 pups (Winnubst et al., 2015), increasing confidence in the reproducibility of the data. 

      The authors also provide novel data documenting for the first time spatially organized, co-active synapses in pups as young as P8. Comparing the younger (P8-10) and older (P12-13) pups, provides insight into how clusters of co-active synapses might emerge during development. 

      Weaknesses: 

      This manuscript provides insufficient detail for assessing the rigor and reproducibility of the methods, particularly for age comparisons. The P8-10 vs P12-13 age comparisons are the primary novel finding in this manuscript, and it is, therefore, critical to avoid systematic age differences in the methods and analysis whenever possible. Specific concerns related to the age comparisons are listed below: 

      (1) Given that the same research group previously published P12-13 data (Winnubst et al., 2015), it is unclear whether both age groups in the current study were imaged/analyzed in parallel by the same researcher(s), or whether previous data was used for the P12-13 group. 

      While indeed the approach in the present study is similar to that of our previous study (Winnubst et al. 2015), the data set presented here is entirely new. The current study was made possible by a new microscope that allows combining resonant scanning with piezo-focusing to image large fractions of the dendric arborization. In fact, we could now image almost 10 times larger dendric segments including branch points than in our previous study. One author contributed to the experiments in both studies. Image analysis of all experiments was performed by the first author of the present study who was not involved in the Winnubst et al. work.

      (2) The authors mention that they used 2 different microscopes, and used a fairly wide range of imaging frame rates (5-15 Hz). It is unclear from the current manuscript whether the same imaging parameters were used across the two age groups. If data for the two experimental groups was collected separately, perhaps at different times, by a different person, or on a different microscope, there is a concern that some differences between the groups may not necessarily be due to age. 

      The reviewer mentions that the experimental settings are not identical across the experiments of this study. In the original manuscript we erroneously reported in the Methods section that 2 different setups were used for this study; however, all experiments were performed on the same microscope. We have corrected this in the new manuscript. We took timelapse recordings of small stacks of varying depth to cover as many dendrites as possible in each recording, therefore, we needed to adjust the rate of acquired stacks within a certain range as the reviewer points out. The data were acquired by two scientists during an overlapping period. And while the different ages were not recorded in a strictly randomized fashion, they were not acquired in sequence according to ages, but rather involved many attempts on animals of different ages from many different litters. For each litter a small percentage of animals would generate successful recordings, and the ages of these successes were random. Therefore, we believe that neither the collection of data nor the analysis (see point above) affected the differences we describe here for the two age groups.

      (3) It is unclear whether the image analysis was performed blind to age. Blinding to age during analysis is particularly important for this study, in which it was not possible to blind to age during imaging due to visible differences in size and developmental stage between younger and older pups. 

      The analysis was not setup to be performed blind to age. Not only is the age of the animal apparent at the stage (as the reviewer points out), also the number of spines and the activity levels clearly show differences between neurons only a few days apart. However, all age-related findings reported in this study - except the increase in synapse density and activity - became apparent to us only after the full set of synaptic transmission events was determined and the analysis was performed on the entire data set, making it very unlikely that event detection was biased.

      (4) The relatively low N (where N is the number of dendrites or the number of mice) in this study is acceptable due to the challenging nature of the techniques used, but unintentional sampling bias is a concern. For example, if higher-order dendrites from the apical tuft were imaged at P12-13, while more segments of the apical trunk were imaged at P8-10, this could inadvertently create apparent age differences that were in fact due to dendrite location on the arbor or dendrite depth. 

      The reviewer points out that sampling bias with respect to synapse location along dendrites in the dataset could lead to falsely apparent age differences. In all experiments we imaged dendrites of layer 2/3 neurons that were relatively close to the cortical surface to optimize image quality. In addition, we confirmed that the mean distance of the imaged dendric stretches from the cell body was similar between the dendrites of each age group (Young: 392 +/-  104 µm, Old: 323 +/- 118 µm; mean +/- STD). Therefore, we do not think that sampling bias affected these results.

      Additional general methodological concerns, which are not specifically related to the age comparisons, are listed below: 

      (5) The authors assert that clustered, co-active synapses emerge in the visual cortex before eye-opening, which is an important finding in that it suggests this phenomention is driven by spontaneous activity rather than visual input. However, this finding hinges on the imaged cells being reliably located in the visual cortex, which is difficult to identify with certainty in animals that have not yet opened their eyes and therefore cannot undergo intrinsic signal imaging to demarcate the boundaries of the visual cortex. If the imaged cells were in, for example, nearby somatosensory cortex, then the observed spatial organization could be due to sensory input rather than spontaneous activity. 

      The reviewer argues that if the neurons included in our analysis were located in non-visual sensory cortex, e.g. the somatosensory cortex, sensory experience might have shaped clustered inputs instead of spontaneous activity. We are, however, certain that the neurons were located inside the primary visual cortex. In previous experiments where we performed the same craniotomies, we mapped spontaneous activity across the sensory areas in the occipital neocortex and we know the exact location of V1 which is already very consistent during the second postnatal week. (See for example Supplemental Figure 4 in Leighton et al., 2021).  

      (6) It is unclear how the authors defined a synaptic transmission event in the GCaMP signal (e.g. whether there was a quantitative deltaF/F threshold). 

      In the revised manuscript, we describe the procedure of identifying synaptic calcium transients in more detail and added a new supplemental figure to clarify this aspect of the analysis. In short, we use an automated detection with a 2x standard deviation threshold and a subsequent manual control and selection step. Please, find all details in the Methods section and Figure S4 of the revised manuscript.

      (7) The authors' division of synapses into spine vs shaft is unconvincing due to the difficulty of identifying Z-projecting spines in images from 2-photon microscopy, where the Z resolution is insufficient to definitively identify Z-projecting spines, and the fact that spines in young animals may be thin and dim. The authors' examples of spine synapses (e.g. in Fig. 2A) are convincing, but some of the putative shaft synapses may in fact be on spines. 

      We agree with the reviewer that the differentiation between spine and sha synapses can be difficult for those spines that are located above or below the dendric sha in the z-dimension because of the lower resolution of 2-photon microscopy in the z-dimension compared to the image plane (see also response to Reviewer 2, point 1). We have now added a new paragraph to the Methods section to describe in more detail how we identify spine and sha synapses and provide more examples in a new supplementary figure (Fig S5). We believe that we can identify spine and sha synapses reliably in most cases, but added a cautionary note to make the reader aware of potential misidentifications.

      Reviewer #1 (Recommendations For The Authors):

      I think the experiments performed were very technically challenging (probably one of the few labs that can do this in the field), and the findings provide in vivo evidence on how structured synaptic inputs are assembled during development that has never been reported. 

      I suggest improving the writing and presentation and really explaining how they conducted the experiments and how they defined shaft synapses. 

      Line 96: 12 dendritic areas from 11 mice at ages between postnatal day 8 to 13. 

      - Do the authors know how many neurons were imaged? It is unclear if the authors patch on all the imaged neurons and only imaged (or analyzed) the dendrites of those patched neurons. If yes, how sparse are the neurons labelled from IUE? From 1B, it looks like there are two cells adjacent to each other. Can the authors really distinguish whether the imaged dendrites are from the patched neuron? 

      The reviewer wonders whether we can tell apart dendrites of patched cells from those of neighboring neurons that were not patched. This is actually very straight forward: the experiment included a depolarization step (see Methods section) which leads to an immediate, but temporary, increase in fluorescence in all of the patched neurons’ dendrites, but none of the neighboring dendrites. We have added this information to the Methods section of the new manuscript and provide now an example (Fig S3). Furthermore, as these cells normally fire frequently, it would immediately become clear that an unpatched cell is being imaged if backpropagating action potentials are predominantly observed rather than synaptic signals. The visualization of these synaptic signals is only possible due to the blockade of Na+ channels with QX314 in the intracellular solution (see Methods). 

      - In the methods section, it says 'dendrites were imaged in single plane or small stacks with plane...'. How do the authors do calcium imaging with small stacks of plane using Nikon MP scope? 

      Small stacks were acquired by using the piezo focusing device of our Nikon A1 microscope. Since we combined this fast focusing approach with resonant scanning, we were able to acquire z-stacks of 3-5 frames at a rate of up to 15 Hz (per stack).

      - I also assume this is not chronic imaging, and there are different mice for each postnatal day. If it's true, this is somewhat important for all the correlation analysis as there are only 2 mice for each postnatal day (other than day 12) and day 13 only has 1 animal. 

      Yes, indeed these are not chronic experiments and dendrites imaged on different days are from different neurons and different mice. We agree with the reviewer that if it had been possible to image the same neurons across these developmental stages, we would have detected even clearer correlations. Therefore, we see our results as conservative estimates of the developmental trajectory of the analyzed parameters.

      Line 104 - 109: I don't understand why the authors need to hold at -30mV to facilitate calcium influx through NMDA receptors? I assume this helps them to visualize as many synapses as possible? but wouldn't that also make the 'event frequency' not reflect the true value? 

      Indeed depolarizing the imaged neurons to -30 mV was necessary to get sufficient calcium influx to map synaptic inputs. We don’t think that this affects the frequency of inputs, because the frequency of synaptic inputs is determined by the presynaptic firing rate and the release probability of the presynaptic terminal, which are not affected by the depolarization of the dendrite.

      Figure 2A - It says in the method section that ROIs are manually selected. However, it's not explained what the criteria are. For spine synapses, it's easy to define but for shaft synapses like in Fig 2B, why are there 2 synapses on the shaft? And in Fig 4a, 5a, Fig S1 P13, some of the dendrites are packed with ROIs. What's the distance between those shaft synapses? Can the imaging resolution really separate them? 

      The reviewer asks for a better description of how we identified individual ROIs and thus synapse locations and whether this is actually feasible. We have now added a more detailed description of how we select synaptic sites based on the occurrence of synaptic calcium transients. In addition, we have added a new supplemental Figure (S4) to give the reader an impression of the image quality and the ability to locate individual synapses reliably. We find that separating sha synapses was possible for inter-synapse distances of ~4 µm or more. The mean sha synapse distance in our data set is 21 µm.

      - Similar issue applies to Figure 4A that I'm not sure what's the resolution of each 'hot spot'. They all seem very close together. Maybe additional raw dendrite images with fluorescence changes like 1C or 2A could be helpful (or movies in the supplementary?) 

      As the reviewer suggests, we have added now additional supplemental figures to illustrate better how we identify synaptic transmission events as well as spine and sha synapses.

      - Also for line 164, it says that 76% of high-activity synapses were located on spines. This could also maybe support that only the spine synapses are real synapses and many shaft synapses are actually not synapses and they were just categorized as shaft synapses from manual ROI? 

      We are actually quite sure that sha synapses are real synapses based on our analysis, since they show repeated synaptic calcium transients that co-occur with barrages of synaptic inputs as measured by patch-clamp recordings. Indeed one would expect to see a number of excitatory synapses on dendric shas of pyramidal neurons at these ages based on previous EM studies (Miller and Peters, 1981; Wildenberg et al., 2023).

      - While this might not impact the overall novelty of the paper, I would be curious to know if the authors can still observe the same findings if they only analyze spine synapses. 

      We repeated several analyses with a dataset that contained only spine synapses. For most analyses we observed the expected result: the effect sizes were similar compared to the entire data set, but the power was reduced. For example the effect of distance to closest high-activity neighbor and own activity (Fig 5E, F) was similar, but p-values were around 0.1 (Similar results for Figure 7B). In contrast, the co-activity with synapses within a domain was significantly higher than the co-activity with synapses in other domains also for the spine-synapse only data set. 

      Fig 6 - Does the domain co-activity also contribute to the synaptic current recorded (related to Fig 4). 

      Yes, the synaptic activity measured by calcium imaging contributes to the recorded EPSCs. However, the exact relationship between synaptic inputs measured by calcium imaging and those measured by patch-clamping is complicated by 3 factors: first, during barrages of synaptic inputs many synapses are active simultaneously, both in the mapped dendrites as well as in the un-observed parts of the dendric arborization. Thus, barrages cannot be broken down into individual events. Second, since our acquisition frequency is ~10 Hz, we can identify the onset of individual synaptic calcium transients with 100-200 ms precision (1 or 2 frames). However, throughout any 100-200 ms period of recording several synapses are active across the entire dendric arborization such that we cannot assign a given calcium transient to a specific EPSC within a 100-200 ms epoch. Third, due to the limited clamping capacity of in vivo patch recordings, we cannot be certain that individual transmission events in distal dendrites can be resolved in the patch recording as EPSCs.

      Reviewer #2 (Recommendations For The Authors):

      (1) I suggest the authors should provide the number of cells and mice recorded in the figure legends. 

      The number of dendrites and mice is the same across all analyses: 12 dendrites from 11 mice for all experiments, 6/6 for P8-10 and 6/5 for P12-13. All dendrites and synapses (and their ages) are shown in the supplemental figures S1 and S2. We mention the number of imaged dendrites now at the beginning of the Results section and when we split ages for the first me.

      (2) Instead of showing only cartoon illustrations of dendrites in Figures 3-6, I suggest showing the two photon images as well together with the cartoon. 

      The 2-photon images of all dendrites of the dataset are available in Figure S1. To allow the reader to compare the cartoon representations in the main figures and the 2-photon images of each neuron, we have now labeled each dendrite in the dataset (D1-D12, see figures S1 and S2). For every figure, where we show example neurons (cartoons or zoom ins) we now provide this identifier.

      Reviewer #3 (Recommendations For The Authors):

      To address the weaknesses outlined above, we recommend that the authors do the following: 

      • To address concerns about the rigor and reproducibility of the methods specifically related to age comparisons, please confirm the following: 

      - Both age groups were run in parallel by the same researcher(s). 

      Experiments were run partly overlapping and experiments from different age groups were performed in parallel by both researchers.

      - Both age groups were imaged on the same microscope, or animals from each age group were imaged on both microscopes. If it was necessary to use different microscopes for the different age groups for biological or practical reasons, please explain. 

      All experiments were run on the same microscope, a Nikon A1 2-photon microscope. In the original methods description we erroneously mentioned two microscopes (copy and paste error from a previous publication). We corrected that in the revised manuscript.

      - There was no difference in imaging frame rates or other imaging parameters between age groups. If it was necessary to use different parameters for different age groups for biological reasons, please explain. 

      We varied the frame rates somewhat to allow larger z-stacks for some experiments where dendrites traversed different depths; however the mean frame rates were similar between the experiments in P8-10 vs P12-13 dendrites, 8.5 vs 10 Hz, respectively.

      - Images were analyzed blind to age. 

      The analysis was not setup to be performed blind to age. The number of spines and the activity levels clearly show obvious differences between neurons only a few days apart. However, all findings reported in this study related to age - except the increase in synapse density and activity - became apparent to us only after the full set of synaptic transmission events was determined and the analysis was performed on the entire data set, making it unlikely that event detection was biased.

      - There was no difference in the location of analyzed dendrites (e.g. depth from the pia, branch order) between age groups. 

      In all experiments we imaged dendrites of layer 2/3 neurons that were relatively close to the cortical surface to optimize image quality. In addition, we determined the mean distance of the imaged dendric stretches from the cell body and found that this distance was similar between the dendrites of each age group (Young: 392 +/-  104 µm, Old: 323 +/- 118 µm; mean +/- STD). Therefore, we do not think that sampling bias affected these results.

      • To address general methodological concerns, please provide additional description of the following points: 

      - Please clarify how the visual cortex was identified in P8-13 pups. If there was ambiguity about identifying the visual cortex in these pups, please discuss the implications of this ambiguity. 

      The reviewer asks how we identified V1 in these experiments. We are indeed certain that the neurons were located inside the primary visual cortex. We have ample experience with mapping V1 in these animals based on patterns of spontaneous activity as well as post-hoc stainings. V1 is quite large already at these ages (> 2 mm long and > 1 mm wide) and its extent very consistent across animals. Thus, we would argue it is actually hard to miss.

      - Please clarify how synaptic transmission events were identified in the GCaMP signal. 

      We have now added a more detailed description of how we identify synaptic calcium transients. In addition, we have added a new supplemental Figure (S3) to give the reader an impression of the image quality and the ability to locate individual synapses reliably. 

      - It is acceptable to use the spine vs shaft analysis despite the inevitable difficulty resolving Z-projecting spines, but this caveat should be mentioned in the discussion of the spine vs shaft results. 

      We added a more detailed description of spine and sha synapse identification, a new supplemental figure (S5) and we now mention the caveat related to the limited z-resolution of 2-photon microscopy in the revised manuscript.

      • Two additional minor details should be clarified in the text of the manuscript: 

      - Please specify the volume of DNA solution injected into each embryo. 

      The injected volume was 1 µl. We added this information in the Methods section of the revised manuscript.

      - In Fig S1, please specify whether the scale bar applies to all images. 

      The scale bar applies to all images. This information was added to the figure legend.

      References

      Leighton AH, Cheyne JE, Houwen GJ, Maldonado PP, De Winter F, Levelt CN, Lohmann C. 2021. Somatostatin interneurons restrict cell recruitment to renally driven spontaneous activity in the developing cortex. Cell Rep 36:109316. doi:10.1016/j.celrep.2021.109316

      Miller M, Peters A. 1981. Maturation of rat visual cortex. II. A combined Golgi-electron microscope study of pyramidal neurons. JComp Neurol 203:555–573.

      Siegel F, Heimel JA, Peters J, Lohmann C. 2012. Peripheral and central inputs shape network dynamics in the developing visual cortex in vivo. Current Biology 22:253–258.

      Wildenberg G, Li H, Sampathkumar V, Sorokina A, Kasthuri N. 2023. Isochronic development of cortical synapses in primates and mice. Nat Commun 14:8018. doi:10.1038/s41467-02343088-3

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting and well-written paper reporting on a novel approach to studying cerebellar function based on the idea of selective recruitment using fMRI. The study is well-designed and executed. Analyses are sound and results are properly discussed. The paper makes a significant contribution to broadening our understanding of the role of the cerebellum in human behavior.

      We thank the reviewer for the positive assessment of our paper.

      (1) While the authors provide a compelling case for the link between BOLD and the cerebellar cortical input layer, there remains considerable unexplained variance. Perhaps the authors could elaborate a bit more on the assumption that BOLD signals mainly reflect the input side of the cerebellum (see for example King et al., elife. 2023 Apr 21;12:e81511).

      Our paper is based on the assumption that the cerebellar BOLD signal reflects solely the input to the cerebellum and does not reflect the changes in firing rates of Purkinje cells. This assumption relies on two lines of arguments: Studies that have directly looked at the mechanism of vasodilation in the cerebellum, and studies that try to infer the contributions of different neurophysiological mechanisms to overall cerebellar metabolism (Attwell and Iadecola, 2002).

      Vasodilatory considerations: The mechanisms that causes vasodilation in the cerebellum, and hence BOLD signal increases, has been extensively studied: Electrical stimulation of mossy fibers (Gagliano et al., 2022; Mapelli et al., 2017), as well as parallel fibers (Akgören et al., 1994; Iadecola et al., 1996; Mathiesen et al., 1998; Yang and Iadecola, 1997) lead to robust increases in cerebellar blood flow. In contrast to the neocortex, the regulation of blood flow in the cerebellum depends nearly purely on the vasodilator Nitric Oxide (NO) (Akgören et al., 1994; Yang and Iadecola, 1997) with stellate cells playing a key role in the signaling cascade (Yang et al., 2000).

      Electrical (Mathiesen et al., 2000) and pharmacological (Yang and Iadecola, 1998) stimulation of climbing fibers also leads to robust increases in blood flow. Simultaneous parallel and climbing fiber stimulation seems to combine sub-additively to determine the blood flow changes (K. Caesar et al., 2003).

      Importantly, even dramatic changes in spiking rate of Purkinje cells do not lead to changes in vasodilation. For starters, parallel fiber stimulation leads to blood flow increases, even though the net effect on Purkinje cell firing is inhibitory (Mathiesen et al., 1998). More importantly, complete inhibition of the Purkinje cell using a GABA agonist does not change baseline cerebellar blood flow (Kirsten Caesar et al., 2003). Conversely, even a 200-300% increase in simple (and complex) spike firing rate through application of a GABA antagonist does not show any measurable consequences for blood flow, even though it clearly increases the metabolic rate of oxygen consumption in the tissue (Thomsen et al., 2009, 2004).

      In sum, this extensive set of studies clearly argues that the cerebellar blood flow response is mostly dictated by synaptic input, and that the firing rate of Purkinje cells does not influence vasodilation. Because the BOLD signal is caused by an supply of oxygen over and above the level of oxygen consumption, this would argue that increases in Purkinje cell firing would not lead to BOLD increases. What is less clear is the degree to which changes in BOLD signal during normal activity are determined by changes in mossy fiber or climbing fiber input. Disruption of either pathway leads to 60-70% reductions in the evoked blood flow response during whisker stimulation (Yang et al., 2000; Zhang et al., 2003) – but it remains unclear to what degree this reflects the distribution of contributions in the healthy animal, as these powerful disruptions may have a number of side-effects.

      Metabolic considerations: To estimate the relative contributions climbing fiber / mossy fiber input to the variations in BOLD signal under natural conditions, it is useful to consider the contributions of different cerebellar processes to the overall metabolism of the cerebellum. Assuming an average firing rate of 40Hz for mossy fibers, ~3Hz for Granule cells, and 1Hz for climbing fibers, Howarth et al. (Howarth et al., 2012, 2010) estimated that the transmission from mossy fibers to granular cells, dominates the energy budget with 53%. The subsequent stage, encompassing the transfer of information from Granular cells to Purkinje cells, accounts for 32% of energy expenditure. In contrast, integration within Purkinje cells and the spiking (simple and complex) of these cells represents only 15% of the total energy consumption.

      More important for the BOLD signal, however, are the activity-induced variations in metabolic consumption: Purkinje cells fire relatively constantly at a very high frequency (~50Hz) both during awake periods and during sleep (Shin et al., 2007). When providing a signal to the neocortex, firing rate decreases, actually lowering the metabolic demand. Climbing fibers normally fire at ~0.5 Hz and even during activity rarely fire much above 2Hz (Streng et al., 2017). In contrast, granule cells show a low firing rates during rest (typically <1hz) and can spike during activity well above 100Hz. Combined with the sheer number of granule cells, these considerations would suggest that the vast majority of the variation in metabolic demand are due to mossy fiber input and granule cell activity.

      Overall, we therefore think it is likely that the main determinant of the cerebellar cortical BOLD signal is mossy fiber input and the transmission of information from mossy fibers to granule cells to Purkinje cells. We admit that the degree to which climbing fiber input contribute to BOLD signal changes is much less clear. We can be quite certain, however, that the firing rate of Purkinje cells does not contribute to the cerebellar BOLD signal, as even dramatic changes in the firing rate do not cause any changes in vasodilation.  We have clarified our line of reasoning in the paper, and hope this more extensive response here will give the reader a better overview over the pertaining literature.

      (2) The current approach does not appear to take the non-linear relationships between BOLD and neural activity into account.

      Thank you for raising this concern. We did not stress this point in the paper, but one big advantage of our selective recruitment approach is that it is – to some degree- robust against non-linearities in the relationship between neural activity and BOLD signal. This is the case, as long as the shape of the non-linearity is similar in the cerebellum and the neocortex. The results of our motor task (Figure 3) provide a clear example of this: The BOLD signal both in the neocortex and cerebellum incases non-linearly as a function of force – the increase from 2.5N to 6N (a 3.5N increase) is larger than the increase from 6N to 10N (a 4N increase). A similar non-linearity can be observed for tapping speed (6, 10 to 18 taps / s). However, within each condition, the relationship between cortical and cerebellar activity is nearly perfectly linear, reflecting the fact that the shape of the non-linearity for the cerebellum and cortex is very similar.

      Most importantly, even if the non-linearity across the two structures is different, any non-linear relationship between neural activity and BOLD signal (of vasodilatory nature) should apply to different conditions (here force and speed increases) similarly. Therefore, if two conditions show overlapping activity levels (as observed for force and speed across medium and high levels, Figure 3), a offset between conditions cannot be caused by a non-linearity in the relationship of cortical and cerebellar activity. Because all conditions are subject to the same non-linearity, all points should lie on a single (likely monotonically increasing) non-linear function. Both for the motor and working memory task, the pattern of results clearly violates this assumption.

      (3) The authors may want to address a bit more the issue of closed loops as well as the underlying neuroanatomy including the deep cerebellar nuclei and pontine nuclei in the context of their current cerebello-cortical correlational approach. But also the contribution of other brain areas such as the basal ganglia and hippocampus. 

      Cortical-cerebellar communication is of course bi-directional. As discussed in King at al., (2023), however, we are restricting our model to the connections from the neocortex to the cerebellum for the following reasons: First, cerebellar BOLD activity likely reflects mostly neocortical input (see our answer to pt. 1), whereas neocortical activity is determined by a much wider array of projections, including striato-thalamo-cortical and cortico-cortical connections. Secondly, the output of the cerebellum cannot be predicted from the BOLD signal of the cerebellar cortex, as it is unlikely that the firing rate of Purkinje cells contribute to cerebellar BOLD signal (see pt. 1). For these reasons we believe that the relationship between neocortical and cerebellar activity patterns is mostly dictated by the connectivity from cortex to cerebellum, and is therefore best modelled as thus. This is now more clearly discussed in a new paragraph (line 318-323) of the revised manuscript.

      We are also ignoring other inputs to the cerebellum, including the spinal chord, the basal ganglia (Bhuvanasundaram et al., 2022; Bostan and Strick, 2018) hippocampus (Froula et al., 2023; Watson et al., 2019), and amygdala (Farley et al., 2016; Jung et al., 2022; Terburg et al., 2024). In humans, however, the neocortex remains the primary source of input to pontine nuclei. Consequently, it stands as the main structure shaping activity within the cerebellar cortex. While it is an interesting question to what degree the consideration of subcortical structures can improve the prediction of cerebellar activity patterns, we believe that considering the neocortex provides a good first approximation.

      Reviewer #1 (Recommendations):

      (4)  A few sentences to clarify the used models as was done in the King et al. (2024) paper may improve readability.

      We have now added the sentences in the introduction (line 25ff):

      To approach this problem, we have recently developed and tested a range of cortical-cerebellar connectivity models (King et al., 2023), designed to capture fixed, or task-invariant, transmission between neocortex and cerebellum. For each cerebellar voxel, we estimated a regularized multiple regression model to predict its activity level across a range of task conditions (King et al., 2019) from the activity pattern observed in the neocortex for the same conditions. The models were then evaluated in their ability to predict cerebellar activity in novel tasks, again based only on the corresponding neocortical activity pattern. Two key results emerged from this work. First, while rs-FC studies (Buckner et al., 2011; Ji et al., 2019; Marek et al., 2018) have assumed a 1:1 mapping between neocortical and cerebellar networks, models which allowed for convergent input from multiple neocortical regions to a single cerebellar region performed better in predicting cerebellar activity patterns for novel tasks. Second, when given a cortical activation pattern, the best performing model could predict about 50% of the reliable variance in the cerebellar cortex across tasks (King et al., 2023).

      (5) To what extent does this paper demonstrate the limitations of BOLD in neuroscientific research? 

      The primary objective of this study was to shed light on the problems of interpreting BOLD activation within the cerebellum. The problem that the BOLD signal mostly reflect input to a region is not unique to the cerebellum, but also applies (albeit likely to a lesser degree) to other brain structures. However, the solution we propose here critically hinges on three features of the cerebellar circuitry: a) the mossy fiber input for the cerebellar hemispheres mostly arise from the neocortex, b) the BOLD signal is likely dominated by this mossy fiber input (see pt. 1), and c) there is very little excitatory recurrent activity in the cerebellum, so output activity in the cerebellum does not cause direct activity in other parts of the cerebellum.

      These features motivate us to use a directed cortex->cerebellum connectivity model, which does not allow for any direct connectivity within the cerebellum. While the same approach can also be applied to other brain structures, it is less clear that the approach would yield valid results here. For example, due the local excitatory recurrent connectivity within neocortical columns, the activity here will also relate to local processing.

      (6) What if the authors reversed their line of reasoning as in that cerebellum activity is matched to map changes in cerebral cortical activity? Perhaps this could provide further evidence for the assumed directional specificity of the task-dependent gating of neocortical inputs. 

      Given (a) that the cerebellar BOLD signal tells us very little about cerebellar output signals (b) that there are many other input signals to the neocortex that are more powerful than cerebellar inputs, and c) that there strong cortical-cortical connections, we believe that this model would be hard to interpret (see also our answer to pt. 3).

      Therefore, while the inversion of the linear task-invariant mapping between cortical and cerebellar activity is a potentially interesting exercise, it is unclear to us at this point what strong predictions we would be able to test with this approach.

      (7) The statement that cerebellar fMRI activity may simply reflect the transmission of neocortical activity through fixed connections can be better explained. Also in the context of using the epiphenomenon (on page 11) in the paper. To what extent is the issue of epiphenomenon not a general problem of fMRI research?

      We have rephrased the introduction of this idea (line 17):

      This means that increases in the cerebellar BOLD signal could simply reflect the automatic transmission of neocortical activity through fixed anatomical connections. As such, whenever a task activates a neocortical region, the corresponding cerebellar region would also be activated, regardless of whether the cerebellum is directly involved in the task or not.

      Epiphemonal activity: This is indeed a general problem in fMRI research (and indeed research that uses neurophysiological recordings, rather than manipulations of activity). Indeed, we have discussed similar issues in the context of motor activity in ipsilateral motor cortex (Diedrichsen et al., 2009). However, given that we only offer a possible approach to address this issue for the cerebellum (see pt. 5), we thought it best to keep the scope of the discussion focused on this structure.

      Reviewer #2 (Public Review):

      Summary:

      Shahshahani and colleagues used a combination of statistical modelling and whole-brain fMRI data in an attempt to separate the contributions of cortical and cerebellar regions in different cognitive contexts.

      Strengths:

      The manuscript uses a sophisticated integration of statistical methods, cognitive neuroscience, and systems neurobiology.

      The authors use multiple statistical approaches to ensure robustness in their conclusions.

      The consideration of the cerebellum as not a purely 'motor' structure is excellent and important. <br />

      We thank the reviewer for their positive evaluation.

      Weaknesses:

      (1) Two of the foundation assumptions of the model - that cerebellar BOLD signals reflect granule cells > purkinje neurons and that corticocerebellar connections are relatively invariant - are still open topics of investigation. It might be helpful for the reader if these ideas could be presented in a more nuanced light.

      Please see response to the comment 1 of Reviewer 1 for a more extensive and detailed justification of this assumption. We have now also clarified our rationale for this assumption better in the paper on line 10-14. Finally, we now also raise explicitly the possibility that some of the violations of the task-invariant model could be caused by selectively increase of climbing fiber activity in some tasks (line 340).

      (2) The assumption that cortical BOLD responses in cognitive tasks should be matched irrespective of cerebellar involvement does not cohere with the idea of 'forcing functions' introduced by Houk and Wise. 

      We are assuming that you refer to the idea that cerebellar output is an important determinant of the dynamics (and likely also of the magnitude) of neocortical activity. We agree most certainly here. However, we also believe that in the context of our paper, it is justified to restrict the model to the connectivity between the neocortex and the cerebellum only (see reviewer 1, comment 3).

      Furthermore, if increased cerebellar output indeed occurs during the conditions for which we identified unusually high cerebellar activity, it should increase neocortical activity, and bring the relationship of the cerebellar and cortical activity again closer to the predictions of the linear model. Therefore, the identification of functions for which cerebellar regions show selective recruitment is rather conservative.

      Reviewer #2 (Recommendations):

      (3) One of the assumptions stated in the abstract -- that the inputs to the cerebellum may simply be a somewhat passive relay of the outputs of the cerebral cortex -- has been challenged recently by work from Litwin-Kumar (Muscinelli et al., 2023 Nature Neuroscience), which argues for complex computational relationships between cortical pyramidal neurons, pontine nuclei and granule cells, which in turn would have a non-linear impact on the relationship between cortical and cerebellar BOLD. The modelling is based on empirical recordings from Wagner (2019, Cell) which show that the synaptic connections between the cortex and granule cells change as a function of learning, further raising concerns about the assumption that the signals inherent within these two systems should be identical. Whether these micro-scale features are indicative of the macroscopic patterns observed in BOLD is an interesting question for future research, but I worry that the assumption of direct similarity is perhaps not reflective of the current literature. The authors do speak to these cells in their discussion, but I believe that they could also help to refine the authors' hypotheses in the manuscript writ large.

      We absolutely agree with your point. However, we want to make extremely clear here that our hypothesis (that the inputs to the cerebellum are a linear task-invariant function of the outputs of the cerebral cortex) is the Null-hypothesis that we are testing in our paper. In fact, our results show the first empirical evidence that task-dependent gating may indeed occur. In this sense, our paper is consistent with the theoretical suggestion of (Muscinelli et al., 2023).

      You may ask whether a linear task-invariant model of cortical-cerebellar connectivity is not a strawman, given that is most likely incorrect. However, as we stress in the discussion (line 298-), a good Null-model is a useful model, even if it is (as all models) ultimately incorrect. Without it, we would not be able to determine which cerebellar activity outstrips the linear prediction. The fact that this Null-model itself can predict nearly 50% of the variance in cerebellar activity patterns across tasks at a group level, means that it is actually a very powerful model, and hence is a much more stringent criterion for evidence for functional involvement than just the presence of activity.

      (4) Further to this point, I didn't follow the authors' logic that the majority of the BOLD response in the cerebellum is reflective of granule cells rather than Purkinje cells. I read through each of the papers that were cited in defense of the comment: "The cerebellar BOLD signal is dominated by mossy fiber input with very little contribution from the output of the cerebellar cortex, the activity of Purkinje cells" and found that none of these studies made this same direct conclusion. As such, I suggest that the authors soften this statement, or provide a different set of references that directly confirm this hypothesis. 

      Please see response to the comment 1, Reviewer 1. We hope the answer provides a more comprehensive overview over the literature, which DOES show that spiking behavior of Purkinje cells does not influence vasodilation (as opposed to mossy fiber input). We have now clarified our rationale and the exact cited literature on line 9-14 of the paper.

      (5) Regarding the statement: "As such, whenever a task activates a neocortical region, we might observe activity in the corresponding cerebellar regions regardless of whether the cerebellum is directly involved in the task or not." -- what if this is a feature, rather than a bug? That is, the organisation of the nervous system has been shaped over phylogeny such that every action, via efference copies of motor outputs, is filtered through the complex architecture of the cerebellum in order to provide a feed-forward signal to the thalamus/cortex (and other connected structures). Houk and Wise made compelling arguments in their 1995 Cerebral Cortex paper arguing that these outputs (among other systems) could act as 'forcing functions' on the kinds of dynamics that arise in the cerebral cortex. I am inclined to agree with their hypothesis, where the implication is that there are no tasks that don't (in some way) depend on cerebellar activity, albeit to a lesser or greater extent, depending on the contexts/requirements of the task. I realise that this is a somewhat philosophical point, but I do think it is important to be clear about the assumptions that form the basis of the reasoning in the paper. 

      This is an interesting point. Our way of thinking about cerebellar function does indeed correspond quite well to the idea of forcing functions- the idea that cerebellar output can “steer” cortical dynamics in a particular way. However, based on patient and lesion data, it is also clear that some cortical functions rely much more critically on cerebellar input than others. We hypothesize here that cerebellar activity is higher (as compared to the neocortical activity) when the functions require cerebellar computation.

      We also agree with the notion that cerebellar contribution is likely not an all-or-none issue, but rather a matter of gradation (line 324ff).

      (6) Regarding the logic of expecting the cortical patterns for speed vs. force to be matched -- surely if the cerebellum was involved more in speed than force production, the feedback from the cerebellum to the cortex (via thalamus) could also contribute to the observed differences? How could the authors control for this possibility? 

      Our model currently indeed does not attempt to quantify the contributions of cerebellar output to cortical activity. However, given that cerebellar output is not visible in the BOLD signal of the cerebellum (see reviewer 1, comment 1), we believe that this is a rational approach. As argued in our response to your comment 2, increased cerebellar output in the speed compared to the force condition should bring the activity relationship closer to the linear model prediction. The fact that we find increased cerebellar (as compared to neocortical) activity in the speed conditions, suggests that there is indeed task-dependent gating of cortical projections to the cerebellum.

      Akgören N, Fabricius M, Lauritzen M. 1994. Importance of nitric oxide for local increases of blood flow in rat cerebellar cortex during electrical stimulation. Proc Natl Acad Sci U S A 91:5903–5907.

      Attwell D, Iadecola C. 2002. The neural basis of functional brain imaging signals. Trends Neurosci 25:621–625.

      Bhuvanasundaram R, Krzyspiak J, Khodakhah K. 2022. Subthalamic Nucleus Modulation of the Pontine Nuclei and Its Targeting of the Cerebellar Cortex. J Neurosci 42:5538–5551.

      Bostan AC, Strick PL. 2018. The basal ganglia and the cerebellum: nodes in an integrated network. Nat Rev Neurosci 19:338–350.

      Buckner RL, Krienen FM, Castellanos A, Diaz JC, Yeo BTT. 2011. The organization of the human cerebellum estimated by intrinsic functional connectivity. J Neurophysiol 106:2322–2345.

      Caesar K., Gold L, Lauritzen M. 2003. Context sensitivity of activity-dependent increases in cerebral blood flow. Proc Natl Acad Sci U S A 100:4239–4244.

      Caesar K., Thomsen K, Lauritzen M. 2003. Dissociation of spikes, synaptic activity, and activity-dependent increments in rat cerebellar blood flow by tonic synaptic inhibition. Proc Natl Acad Sci U S A 100:16000–16005.

      Farley SJ, Radley JJ, Freeman JH. 2016. Amygdala Modulation of Cerebellar Learning. J Neurosci 36:2190–2201.

      Froula JM, Hastings SD, Krook-Magnuson E. 2023. The little brain and the seahorse: Cerebellar-hippocampal interactions. Front Syst Neurosci 17:1158492.

      Gagliano G, Monteverdi A, Casali S, Laforenza U, Gandini Wheeler-Kingshott CAM, D’Angelo E, Mapelli L. 2022. Non-linear frequency dependence of neurovascular coupling in the cerebellar cortex implies vasodilation-vasoconstriction competition. Cells 11:1047.

      Howarth C, Gleeson P, Attwell D. 2012. Updated energy budgets for neural computation in the neocortex and cerebellum. J Cereb Blood Flow Metab 32:1222–1232.

      Howarth C, Peppiatt-Wildman CM, Attwell D. 2010. The energy use associated with neural computation in the cerebellum. J Cereb Blood Flow Metab 30:403–414.

      Iadecola C, Li J, Xu S, Yang G. 1996. Neural mechanisms of blood flow regulation during synaptic activity in cerebellar cortex. J Neurophysiol 75:940–950.

      Ji JL, Spronk M, Kulkarni K, Repovš G, Anticevic A, Cole MW. 2019. Mapping the human brain’s cortical-subcortical functional network organization. Neuroimage 185:35–57.

      Jung SJ, Vlasov K, D’Ambra AF, Parigi A, Baya M, Frez EP, Villalobos J, Fernandez-Frentzel M, Anguiano M, Ideguchi Y, Antzoulatos EG, Fioravante D. 2022. Novel Cerebello-Amygdala Connections Provide Missing Link Between Cerebellum and Limbic System. Front Syst Neurosci 16:879634.

      King M, Hernandez-Castillo CR, Poldrack RA, Ivry RB, Diedrichsen J. 2019. Functional boundaries in the human cerebellum revealed by a multi-domain task battery. Nat Neurosci 22:1371–1378.

      King M, Shahshahani L, Ivry RB, Diedrichsen J. 2023. A task-general connectivity model reveals variation in convergence of cortical inputs to functional regions of the cerebellum. Elife 12:e81511.

      Mapelli L, Gagliano G, Soda T, Laforenza U, Moccia F, D’Angelo EU. 2017. Granular layer neurons control cerebellar neurovascular coupling through an NMDA receptor/NO-dependent system. J Neurosci 37:1340–1351.

      Marek S, Siegel JS, Gordon EM, Raut RV, Gratton C, Newbold DJ, Ortega M, Laumann TO, Adeyemo B, Miller DB, Zheng A, Lopez KC, Berg JJ, Coalson RS, Nguyen AL, Dierker D, Van AN, Hoyt CR, McDermott KB, Norris SA, Shimony JS, Snyder AZ, Nelson SM, Barch DM, Schlaggar BL, Raichle ME, Petersen SE, Greene DJ, Dosenbach NUF. 2018. Spatial and Temporal Organization of the Individual Human Cerebellum. Neuron 100:977-993.e7.

      Mathiesen C, Caesar K, Akgören N, Lauritzen M. 1998. Modification of activity-dependent increases of cerebral blood flow by excitatory synaptic activity and spikes in rat cerebellar cortex. J Physiol 512 ( Pt 2):555–566.

      Mathiesen C, Caesar K, Lauritzen M. 2000. Temporal coupling between neuronal activity and blood flow in rat cerebellar cortex as indicated by field potential analysis. J Physiol 523:235–246.

      Muscinelli SP, Wagner MJ, Litwin-Kumar A. 2023. Optimal routing to cerebellum-like structures. Nat Neurosci 26:1630–1641.

      Shin S-L, Hoebeek FE, Schonewille M, De Zeeuw CI, Aertsen A, De Schutter E. 2007. Regular patterns in cerebellar Purkinje cell simple spike trains. PLoS One 2:e485.

      Streng ML, Popa LS, Ebner TJ. 2017. Climbing Fibers Control Purkinje Cell Representations of Behavior. J Neurosci 37:1997.

      Terburg D, van Honk J, Schutter DJLG. 2024. Doubling down on dual systems: A cerebellum–amygdala route towards action- and outcome-based social and affective behavior. Cortex 173:175–186.

      Thomsen K, Offenhauser N, Lauritzen M. 2004. Principal neuron spiking: neither necessary nor sufficient for cerebral blood flow in rat cerebellum. J Physiol 560:181–189.

      Thomsen K, Piilgaard H, Gjedde A, Bonvento G, Lauritzen M. 2009. Principal cell spiking, postsynaptic excitation, and oxygen consumption in the rat cerebellar cortex. J Neurophysiol 102:1503–1512.

      Watson TC, Obiang P, Torres-Herraez A, Watilliaux A, Coulon P, Rochefort C, Rondi-Reig L. 2019. Anatomical and physiological foundations of cerebello-hippocampal interaction. Elife 8:e41896.

      Yang G, Huard JM, Beitz AJ, Ross ME, Iadecola C. 2000. Stellate neurons mediate functional hyperemia in the cerebellar molecular layer. J Neurosci 20:6968–6973.

      Yang G, Iadecola C. 1998. Activation of cerebellar climbing fibers increases cerebellar blood flow: role of glutamate receptors, nitric oxide, and cGMP. Stroke 29:499–507; discussion 507-8.

      Yang G, Iadecola C. 1997. Obligatory role of NO in glutamate-dependent hyperemia evoked from cerebellar parallel fibers. Am J Physiol 272:R1155-61.

      Zhang Y, Forster C, Milner TA, Iadecola C. 2003. Attenuation of activity-induced increases in cerebellar blood flow by lesion of the inferior olive. Am J Physiol Heart Circ Physiol 285:H1177-82.

    1. Author response:

      eLife assessment

      This valuable study reveals how a rhizobial effector protein cleaves and inhibits a key plant receptor for symbiosis signaling, while the host plant counters by phosphorylating the effector. The molecular evidence for the protein-protein interaction and modification is solid, though biological evidence directly linking effector cleavage to rhizobial infection is incomplete. With additional functional data, this work could have implications for understanding intricate plant-microbe dynamics during mutualistic interactions.

      Thank you for this helpful comment. In the revised manuscript version, we will be more prudent with directly linking cleavage of Nod factor receptors by NopT and rhizobial infection.

      We plan to modify the Title, the One-Sentence Summary, Abstract, and Discussion regarding this point.

      Public Reviews:

      Reviewer #1 (Public Review):

      Bacterial effectors that interfere with the inner molecular workings of eukaryotic host cells are of great biological significance across disciplines. On the one hand they help us to understand the molecular strategies that bacteria use to manipulate host cells. On the other hand they can be used as research tools to reveal molecular details of the intricate workings of the host machinery that is relevant for the interaction/defence/symbiosis with bacteria. The authors investigate the function and biological impact of a rhizobial effector that interacts with and modifies, and curiously is modified by, legume receptors essential for symbiosis. The molecular analysis revealed a bacterial effector that cleaves a plant symbiosis signaling receptor to inhibit signaling and the host counterplay by phosphorylation via a receptor kinase. These findings have potential implications beyond bacterial interactions with plants.

      Thank you for highlighting the broad significance of rhizobial effectors in understanding legume-rhizobium interactions. We fully agree with your assessment and will emphasize these points in the revised Introduction and Discussion sections of our manuscript. Specifically, we will expand our Discussion regarding the potential impact of the NopT interaction with symbiotic receptor kinases on plant immune signaling and regarding the general significance of our work.

      Bao and colleagues investigated how rhizobial effector proteins can regulate the legume root nodule symbiosis. A rhizobial effector is described to directly modify symbiosis-related signaling proteins, altering the outcome of the symbiosis. Overall, the paper presents findings that will have a wide appeal beyond its primary field.

      Out of 15 identified effectors from Sinorhizobium fredii, they focus on the effector NopT, which exhibits proteolytic activity and may therefore cleave specific target proteins of the host plant. They focus on two Nod factor receptors of the legume Lotus japonicus, NFR1 and NFR5, both of which were previously found to be essential for the perception of rhizobial nod factor, and the induction of symbiotic responses such as bacterial infection thread formation in root hairs and root nodule development (Madsen et al., 2003, Nature; Tirichine et al., 2003; Nature). The authors present evidence for an interaction of NopT with NFR1 and NFR5. The paper aims to characterize the biochemical and functional consequences of these interactions and the phenotype that arises when the effector is mutated.

      Thank you for your positive feedback on our manuscript. In the revised Introduction and Discussion sections, we plan to better emphasize the interdisciplinary significance of our work. We will show how the knowledge gained from our study can contribute to a better understanding of microbial interactions with eukaryotic hosts in general, which may have a stimulating effect on future research in various research areas such as pathogenesis and immunity.

      To ensure that the readers can easily follow the rationale behind our experiments, we will improve the Results section and provide more detailed explanations of how NopT among 15 examined effectors was selected. Additionally, we will provide more background information on NopT and the roles of NFR1 and NFR5 in symbiotic signaling in the Introduction section. As suggested, we will include the references Madsen et al. (2003) and Tirichine et al. (2003) as well as additional references on rhizobial NopT proteins into our revised manuscript version.

      Evidence is presented that in vitro NopT can cleave NFR5 at its juxtamembrane region. NFR5 appears also to be cleaved in vivo. and NFR1 appears to inhibit the proteolytic activity of NopT by phosphorylating NopT. When NFR5 and NFR1 are ectopically over-expressed in leaves of the non-legume Nicotiana benthamiana, they induce cell death (Madsen et al., 2011, Plant Journal). Bao et al., found that this cell death response is inhibited by the coexpression of nopT. Mutation of nopT alters the outcome of rhizobial infection in L. japonicus. These conclusions are well supported by the data.

      We appreciate that you recognize the value of our data.

      The authors present evidence supporting the interaction of NopT with NFR1 and NFR5. In particular, there is solid support for cleavage of NFR5 by NopT (Figure 3) and the identification of NopT phosphorylation sites that inhibit its proteolytic activity (Figure 4C). Cleavage of NFR5 upon expression in N. benthamiana (Figure 3A) requires appropriate controls (inactive mutant versions) that have been provided, since Agrobacterium as a closely rhizobia-related bacterium, might increase defense related proteolytic activity in the plant host cells.

      Thank you for recognizing the use of an inactive NopT variant in Figure 3A. In fact, increased activity of plant proteases induced by Agrobacterium is an important point that should not be neglected. We plan to mention this aspect in our revised Discussion.

      In the context of your comments, we are planning to make the following improvements to the manuscript:

      (1) We will add a more detailed description of the experimental conditions under which the cleavage of NFR5 by NopT was observed in vitro and in vivo.

      (2) We plan to provide more comprehensive data on the phosphorylation of NopT by NFR1, including phosphorylation assays and mass spectrometry results. These additional data support the proposed mechanism by which NFR1 inhibits the proteolytic activity of NopT.

      (3) We will expand the Discussion on the cell death response induced by ectopic expression of NFR1 and NFR5 in Nicotiana benthamiana. We will include more details from Madsen et al. (2011) to contextualize our findings with published literature.

      We believe these additions and clarifications will enhance the clarity and impact of our findings.

      Key results from N. benthamiana appear consistent with data from recombinant protein expression in bacteria. For the analysis in the host legume L. japonicus transgenic hairy roots were included. To demonstrate that the cleavage of NFR5 occurs during the interaction in plant cells the authors build largely on western blots. Regardless of whether Nicotiana leaf cells or Lotus root cells are used as the test platform, the Western blots indicate that only a small proportion of NFR5 is cleaved when co-expressed with nopT, and most of the NFR5 persists in its full-length form (Figures 3A-D). It is not quite clear how the authors explain the loss of NFR5 function (loss of cell death, impact on symbiosis), as a vast excess of the tested target remains intact. It is also not clear why a large proportion of NFR5 is unaffected by the proteolytic activity of NopT. This is particularly interesting in Nicotiana in the absence of Nod factor that could trigger NFR1 kinase activity.

      Thank you for your comments regarding the cleavage of NFR5 and its functional implications. In the revised version, we will change our manuscript taking into account the following considerations:

      (1) We acknowledge that the Western blots indicate only a small proportion of NFR5 is cleaved when co-expressed with NopT. It is worth noting in this context that the proteins were expressed at high levels which likely do not reflect the natural situation in L. japonicus. Low production of cleaved NFR5 in our Western blots with transformed N. benthamiana or L. japonicus cells thus may simply reflect an experimental effect due to high NFR5 protein synthesis. We suggest that the presence of high amounts of intact NFR5 does not have a significant functional impact on plant responses (cell death in N. benthamiana, rhizobial infection of L. japonicus) whereas NFR5 cleavage (or formation of NFR5 cleavage products) may be crucial for the observation of the observed phenotypic changes. The fraction of cleaved NFR5, although small, may be sufficient to disrupt crucial signaling pathways, leading to observable phenotypic changes. We will address possible differences between experimental and natural protein levels in our revised Discussion.

      (2) We studied in our work three biochemical aspects of NopT: (i) physical binding of NopT to NFR1 and NFR5 (ii) proteolytical cleavage of NFR5 by NopT and (iii) phosphorylation of NopT by NFR1. These three biochemical properties appear to influence each other. Phosphorylation of NopT by NFR1 appears to reduce its proteolytic activity, thereby counteracting NFR5 degradation by NopT (NFR5 homeostasis). Moreover, as NopT is a phosphorylation substrate for NFR1, NopT probably interferes with kinase mediated downstream responses of NFR1. Thus, NFR5 cleavage activity of NopT appears to be only one feature of NopT. We plan to mention these considerations in our revised Discussion.

      It is also difficult to evaluate how the ratios of cleaved and full-length protein change when different versions of NopT are present without a quantification of band strengths normalized to loading controls (Figure 3C, 3D, 3F). The same is true for the blots supporting NFR1 phosphorylation of NopT (Figure 4A).

      Thank you for pointing out this aspect. Following your recommendation, we will quantify the band intensities for cleaved and full-length NFR5 in the experiments with different versions of NopT. These values will be normalized to loading controls. Similarly, the Western blots supporting NFR1 phosphorylation of NopT will be quantified. The data for normalized band intensities will be included into the revised figures. The quantifications will provide a clearer understanding of how the ratios of cleaved to full-length proteins change with different NopT variants and also will provide information to which extent NopT is phosphorylated by NFR1.

      It is clear that mutation of nopT results in a quantitative infection phenotype. Nodule primordia and infection threads are still formed when L. japonicus plants are inoculated with ∆nopT mutant bacteria, but it is not clear if these primordia are infected or develop into fully functional nodules (Figure 5). A quantification of the ratio of infected and non-infected nodules and primordia would reveal whether NopT is only active at the transition from infection focus to thread or perhaps also later in the bacterial infection process of the developing root nodule.

      Thank you for pointing this out. In the revised version of our manuscript, we will provide data showing that there are no obvious differences in nodule formation in plants inoculated with ∆nopT and wild-type NGR234, respectively. However, quantification of infection threads containing our GFP-labeled rhizobia in primordia and nodules would be difficult to perform due to strong autofluorescence signals in these tissues. The main goal of our study was to identify and characterize the interaction between NopT and Nod factor receptors. We therefore believe that an in-depth analysis of the bacterial infection process at later symbiotic stages is out of the scope of the present work.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript presents data demonstrating NopT's interaction with Nod Factor Receptors NFR1 and NFR5 and its impact on cell death inhibition and rhizobial infection. The identification of a truncated NopT variant in certain Sinorhizobium species adds an interesting dimension to the study. These data try to bridge the gaps between classical Nod-factor-dependent nodulation and T3SS NopT effector-dependent nodulation in legume-rhizobium symbiosis. Overall, the research provides interesting insights into the molecular mechanisms underlying symbiotic interactions between rhizobia and legumes.

      Strengths:

      The manuscript nicely demonstrates NopT's proteolytic cleavage of NFR5, regulated by NFR1 phosphorylation, promoting rhizobial infection in L. japonicus. Intriguingly, authors also identify a truncated NopT variant in certain Sinorhizobium species, maintaining NFR5 cleavage but lacking NFR1 interaction. These findings bridge the T3SS effector with the classical Nod-factor-dependent nodulation pathway, offering novel insights into symbiotic interactions.

      We appreciate that you recognize the value of our manuscript.

      Weaknesses:

      (1) In the previous study, when transiently expressed NopT alone in Nicotiana tobacco plants, proteolytically active NopT elicited a rapid hypersensitive reaction. However, this phenotype was not observed when expressing the same NopT in Nicotiana benthamiana (Figure 1A). Conversely, cell death and a hypersensitive reaction were observed in Figure S8. This raises questions about the suitability of the exogenous expression system for studying NopT proteolysis specificity.

      We appreciate your attention to these plant-specific differences. In view of your comments, we plan to revise the Discussion and explain the different expression systems used for studying NopT effects in planta. Previous studies showed that NopT expressed in tobacco (N. tabacum) or in specific Arabidopsis thaliana ecotypes (with PBS1/RPS5 genes) causes rapid cell death (Dai et al. 2008; Khan et al. 2022). Our data shown in Fig. S8 confirm these findings. As cell death (effector triggered immunity) is usually associated with induction of protease activities, we considered N. tabacum and A. thaliana plants as not suitable for testing NFR5 cleavage by NopT. In fact, no NopT/NFR5 experiments were performed with these plants in our study. In contrast, the expression of NopT in Nicotiana benthamiana did not lead to cell death in our experiments. Khan et al. 2022 also reported that cell death does not occur in N. benthamiana unless the cells were transformed with PBS1/RPS5 constructs. Thus, N. benthamiana is a suitable expression system to analyze NopT protease activity on co-expressed substrates. Our revision aims to better understand the advantages of the N. benthamiana expression system for studying NopT mediated proteolysis of NFR5.

      (2) NFR5 Loss-of-function mutants do not produce nodules in the presence of rhizobia in lotus roots, and overexpression of NFR1 and NFR5 produces spontaneous nodules. In this regard, if the direct proteolysis target of NopT is NFR5, one could expect the NGR234's infection will not be very successful because of the Native NopT's specific proteolysis function of NFR5 and NFR1. Conversely, in Figure 5, authors observed the different results.

      Our inoculation experiments clearly show that NopT of NGR234 has a negative effect on formation of infection foci (Fig. 5A) and nodule primordia (Fig. 5E). Our biochemical analysis indicates that NopT targets the NFR1/NFR5 complex, which most likely impairs activation of downstream responses such as NIN gene expression. Accordingly, NIN promoter activity was found to be higher in roots inoculated with the Δ_nopT_ mutant as compared to the NGR234 wild-type (Fig. 5B and 5D). It is therefore plausible that NopT impairs rhizobial infection of L. japonicus due to inhibition of NFR1/NFR5 functions. We agree with this Reviewer that it can be expected that “NGR234's infection will not be very successful”. Fig. 5 confirms that Δ_nopT_ mutant is indeed a better symbiont and we do not think that we obtained “unexpectedly different results”. In the revised version, we will try to formulate our discussion text better in order to avoid any misunderstandings. Furthermore, will write as figure title “NopT dampens rhizobial infection…” instead of “NopT regulates rhizobial infection…”. We are also considering changing the title of our manuscript.  

      (3) In Figure 6E, the model illustrates how NopT digests NFR5 to regulate rhizobia infection. However, it raises the question of whether it is reasonable for NGR234 to produce an effector that restricts its own colonization in host plants.

      We acknowledge the potential paradox of NGR234 producing an effector that appears to restrict its own colonization in host plants. In fact, depending on the host plant, most rhizobial effectors are “double-edged swords” that play either a positive or negative role in the symbiosis. In response to your comment, we will discuss the possibility that NopT may confer selective advantages in interactions between NGR234 and host plants where NopT plays a positive symbiotic role (Dai et al. 2008; Kambara et al. 2009). Inhibition of NFR1/NFR5 functions by NopT in these host plants could be a feedback response in cells in which symbiotic signaling has already started. It is tempting speculate that the interaction between NopT and Nod factor receptors reduces Nod factor perception and downstream signaling to avoid a possible overreaction of symbiotic signaling, which may result in hypernodulation or formation of empty nodules without bacteria. Furthermore, it is tempting to speculate that NopT targets not only Nod factor receptors but also other host proteins to promote symbiosis, e.g. by suppressing excessive immune responses triggered by hyperinfection of rhizobia. In our revised manuscript, we will highlight the need for further investigations to elucidate the precise mechanisms underlying the observed infection phenotype and the role of NopT in modulating symbiotic signaling pathways.  

      (4) The failure to generate stable transgenic plants expressing NopT in Lotus japonicus is surprising, considering the manuscript's claim that NopT specifically proteolyzes NFR5, a major player in the response to nodule symbiosis, without being essential for plant development.

      Thank you for your comments. The failure to obtain L. japonicus plants constitutively expressing NopT was indeed surprising and suggests that NopT targets not only NFR5 but also other proteins in L. japonicus. The number of NopT substrates in plants could be greater than assumed. For example, we show in our work that NopT can cleave AtLYK5 and LjLYS11. In our manuscript, we don’t provide protocols and data on our efforts to construct L. japonicus plants stably expressing NopT. Indeed, it cannot be completely ruled out that the observed failure is not due to NopT expression, but rather to other factors that influence the transformation and regeneration of explants into whole plants. Our results should therefore not be over-interpreted. We consider a discussion of our failed transformation experiments to be somewhat preliminary and not central to this manuscript. herefore, we plan to modify our Discussion and delete the sentence reporting that stable transgenic plants expressing NopT have not been successfully generated.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the reviewers for their overall careful evaluation of our work, the constructive criticism, and their many helpful suggestions. We feel that our revision built on the strengths identified by the reviewers, and addressed all the concerns they have raised. Both reviewers recognize that our revisions have improved the paper.  Since the first submission we have:

      • Rewritten large parts of the papers to improve clarity and make it more concise where possible

      • Simulated an alternative working memory model, as recommended by Reviewer 1

      • Included 4 new/revised supplementary figures, following the reviewer’s suggestions for additional analysis.

      Below we provide a brief response to the Reviewers’ comments on our manuscript revision.

      Reviewer #1: Public Review:

      Strengths:

      Overall, the work offers a very interesting approach of a topic which is hard to accomplish experimentally --therefore the computational take is entirely justified and extremely useful. The authors carefully designed the computational experiments to shed light into the demyelination effects on working memory from multiple levels of description, increasing the reliability of their conclusions. I think this work provides now convincing evidence and has the potential to be influential in future studies of myelin alterations (and related disorders such as multiple sclerosis).

      Weaknesses:

      In its current form, the authors have improved the clarity of the results and the model details, and have provided a new set of simulations to complement and reinforce the original ones (including the development of a new spatial working memory model based on silent working memory principles). I do not appreciate any significant weaknesses at this point.

      We thank the reviewer for these positive comments on our revision and for the suggestion of adding the silent memory model, as we feel this has strengthened our findings.

      Reviewer #2: Public Review:

      This paper analyzes the effect of axon de-myelination and re-myelination on action potential speed, and propagation failure. Next, the findings are then incorporated in a standard spiking ring attractor model of working memory.

      I think the results are not very surprising or solid and there are issues with method and presentation.

      The authors did many simulations with random parameters, then averaged the result, and found for instance that the Conduction Velocity drops in demyelination. It gives the reader little insight into what is really going on. My personal preference is for a well understood simple model rather than a poorly understood complex model. The link between the model outcome of WM and data remains qualitative and is further weakened by the existence of known other age-related effects in PFC circuits.

      Comments on revised version:

      The paper has improved in the revision, although I still think a reduced model would have been nice.

      As noted above, in addition to our spiking bump attractor model, our revision includes a second network-level model:  an activity-silent working memory model for continuous features.  We found qualitatively similar effects as in our bump attractor network model, showing that our main conclusions do not critically depend on the exact working memory mechanism (active vs. activity-silent).  This new model was described in two new supplementary figures and a new paragraph in the Results section.

      We did not add a reduced model in our revision to this paper, since neither reviewer explicitly recommended that we add one.  As we noted in our private response to reviewers that accompanied our revision: we share the view that understanding simple models can provide critical insights into brain function (and we believe that many of our papers related to attractor dynamics in working memory and decision-making fall into this category, e.g. Wimmer et al. 2014, Esnaola-Acebes et al. 2022, Ibañez et al 2020). We disagree with the reviewer on an important point: we feel that the model complexity that we have chosen is appropriate and necessary to study the phenomenon at hand. Our modeling efforts are principled, with complexity added as necessary. We started with a biophysical single neuron model with firing dynamics fit to empirical data in pyramidal neurons of rhesus monkey dlPFC (Rumbell et al. 2016) – the same type of neurons and cortical region analyzed in the Peters et al. work on structural changes to myelin seen during aging (e.g., Figure 1).  Because simple models do not accurately capture the CV along thin axons like those in the PFC, we attached a multicompartment axon with detailed myelinated segments, and constructed a cohort of feasible models. We then used this cohort to get quantitative estimates of the effects of variable degrees of demyelination and remyelination. This would not be possible with a simpler model. We then study the consequences of de- and re-myelination in a spiking neural network model. Again, we could not use a simpler model (e.g. a firing rate attractor model) without making gross assumptions about how demyelination affects circuit function. In sum, we believe that our models are relatively simple but comprehensive given the phenomenon that we are studying.

      The reviewer is correct in that there exist “known other age-related effects in PFC circuits”. These are reviewed in the introduction and we discuss future extensions of our model that would incorporate those effects as well. It is important to note that this is the first comprehensive study of demyelination effects in aging PFC, demonstrating that myelin changes alone predict working memory changes associated with aging.

      While we agree that averaging results about different parameter sets provide a limited understanding of the system, we persist in our belief that such analyses provide an important baseline.  We acknowledge that results vary across our model cohort; this is why we included the heatmaps of our single cell model perturbation results (Figure 3 and Supplementary Figure 3), and simulated network models representing a heterogeneity of neuronal axons with healthy and altered myelin sheaths in different degrees, as likely occurs in the aging brain (Figures 7 and 8).  The model framework we present here is well-suited for more targeted analyses and better insights, including those which we are pursuing currently.


      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful evaluation of our work, the constructive criticism, and their many helpful suggestions. We feel that our revision builds on the strengths identified by the reviewers, and addresses all the concerns they have raised. We have:

      • Rewritten large parts of the papers to improve clarity and make it more concise where possible

      • Simulated an alternative working memory model

      • Included 4 new/revised supplementary figures, following the reviewer’s suggestions for additional analysis

      Reviewer #1 (Public Review):

      Summary:

      The authors study the effects of myelin alterations in working memory via the complementary use of two computational approaches: one based on the de- and re-myelination in multicompartmental models of pyramidal neurons, and one based on synaptic changes in a spiking bump attractor model for spatial working memory. The first model provides the most precise angle (biophysically speaking) of the different effects (loss of myelin lamella or segments, remyelination with thinner and shorter nodes, etc), while the second model allows to infer the consequences of myelin alterations in working memory performance, including memory stability, duration, and bump diffusion. The results indicate (i) a slowing down and failure of propagation of spikes with demyelination and partial recovery with remyelination, with detailed predictions on the role of nodes and myelina lamella, and (ii) a decrease in memory duration and an increase in memory drift as a function of the demyelination, in agreement with multiple experimental studies.

      Strengths:

      Overall, the work offers a very interesting approach of a topic which is hard to accomplish experimentally --therefore the computational take is entirely justified and extremely useful. The authors carefully designed the computational experiments to shed light into the demyelination effects on working memory from multiple levels of description, increasing the reliability of their conclusions. I think this work is solid and has the potential to be influential in future studies of myelin alterations (and related disorders such as multiple sclerosis).

      We thank the reviewer for these positive comments on our manuscript.

      Weaknesses:

      In its current form, the study still presents several issues which prevent it from achieving a higher potential impact. These can be summarized in two main items. First, the manuscript is missing some important details about how demyelination and remyelination are incorporated in both models (and what is the connection between both implementations). For example, it is unclear whether an unperturbed axon and a fully remyelinated axon would be mathematically equivalent in the multicompartment model, or how the changes in the number of nodes, myelin lamella, etc, are implemented in the spiking neural network model.

      We thank the reviewer for these suggestions to improve the clarity of our manuscript. A ‘fully remyelinated’ axon is not mathematically equivalent to the unperturbed axon: it has shorter and thinner myelinated segments, and additional nodes in between. This is consistent with empirical observations in rhesus monkey dlPFC, as reviewed in Peters et al. (2009): a 90% increase in paranode profiles, and myelin sheaths that were thinner than expected for the size of the enclosed axon. With no empirical observations of fewer numbers of nodes (but rather, the opposite) or bare sections of axon, we assumed that the remyelination process also creates new nodes (which are identical to existing nodes), as also modeled in Scurfield & Latimer (2018). We have added two new sentences to the results to clarify this fact, before presenting the first set of results for the single cell model: (starting at line 137):

      “To simulate demyelination, we removed lamellae from selected myelinated segments; for remyelination we replaced a fraction of myelinated segments by two shorter and thinner segments with a node in between. As such, a ‘fully remyelinated axon’ had all the demyelinated segments subsequently remyelinated, but with fewer lamellae and additional nodes compared to the unperturbed control case, consistent with empirical observations (Peters, 2009).”

      We also state the maximal amount of remyelination more explicitly in the Results, starting on lines 164-165: "We next examined the extent to which remyelination with shorter and thinner segments, occurring after demyelination, restored axonal AP propagation (Figure 4).”

      Also on line 192-193: “Remyelinating all affected segments with 75% of lamellae (the maximal amount of remyelination) nearly eliminated AP failures (1.8 ± 1.1%).”

      Finally, in Methods we also clarified the structure of the added node (starting at line 634): “Remyelination was performed by replacing an affected (previously demyelinated) segment with two shorter segments, each including paranodes, juxtaparanodes, and an internode, and a new node between them that was identical to existing nodes.”

      We have also provided further details describing how myelin dystrophy was simulated in the network model in Results (lines 243 - 249) and in Methods (lines 722 - 747). How myelin alterations have been implemented in the network model is one of the questions of the reviewer (Question 5 in Reviewer #1: Recommendations for the Authors_)._ We have addressed this question by describing in detail how we adjusted CV and AP failure rate to the values produced by the multicompartment neuron model. Please see our answer to Question 5 for the details.

      Second, it is unclear whether some of the conclusions are strong computational predictions or just a consequence of the model chosen. For example, the lack of effect of decreasing the conduction velocity on working memory performance could be due to the choice of considering a certain type of working memory model (continuous attractor), and therefore be absent under other valid assumptions (i.e. a silent working memory model, which has a higher dependence on temporal synaptic dynamics).

      Whether some conclusions are strong predictions or just a consequence of the model chosen is an important concern and indeed a general problem of computational modeling of working memory. For example, Stein et al. (Stein et al. Towards biologically constrained attractor models of schizophrenia, Curr. Opin. Neurobiol. 2021) showed that opposed manipulations of E/I ratio can produce the same behavioral pattern in different alternative, plausible biological network models. As long as we do not fully understand the neural mechanisms underlying working memory, modeling studies of how alterations (e.g. in E/I ratio or in the reliability and timing of axonal transmission, as we did here) affect circuit function need to be interpreted critically and tested against new experimental data.

      One way to strengthen model predictions is by showing that different computational models make similar predictions. To do this, we implemented an activity-silent working memory model for continuous features, as suggested by the reviewer, and we found qualitatively similar effects as in our bump attractor network model. Thus, our main conclusions do not critically depend on the exact working memory mechanism (active vs. activity-silent).

      In the revised manuscript, we have added two new supplementary figures (Supplementary Figure 8 and 9, see the next page) and a new paragraph in the Results section about activity silent working memory (starting at line 319):

      “Alternative working memory mechanisms. Working memory in our neural network is maintained in an attractor state with persistent neural activity (Compte et al., 2000; Hansel and Mato, 2013). Other mechanisms have been proposed, including that working memory maintenance may rely on activity-silent memory traces (Mongillo et al., 2008; Stokes, 2015; Barbosa et al., 2020). In activity-silent models, a slowly decaying transient of synaptic efficacy preserves information without the need for persistent ongoing activity. We implemented an activity-silent model, to our knowledge the first one for continuous spatial locations, and tested how working memory performance is affected by AP failures and propagation delays. We found that AP failures corresponding to demyelination caused working memory errors qualitatively similar to the delay-active network (Supplementary Figure 8). On the other hand, increasing propagation delays did not lead to additional working memory errors, unless we include unrealistically high values (uniform distribution in the range of 0 to 100 ms; Supplementary Figure 9). These results are qualitatively similar to the delay active network model. Thus, our main findings do not critically depend on the exact working memory mechanism (active vs. activity-silent).”

      Author response image 1.

      Action potential failures impair working memory performance in a network model with activity-silent memory traces. (A) Spiking and synaptic activity in an unperturbed, activity-silent working memory model. Top: Raster plot showing the activity for each excitatory neuron (labeled by its preferred direction) in a single trial with a cue stimulus presented at 180°. We modified our spiking neural network model such that it does not show elevated persistent firing throughout the delay period (see Figure 5B for comparison). In particular, we reduced the external background input to excitatory neurons by a factor of 3.61% and we increased the cue stimulus amplitude by 12.5%. Even though spiking activity decays to baseline (close to 0 Hz), a memory trace is imprinted in enhanced synaptic strength due to short-term synaptic facilitation (Mongillo et al., 2008). Selective spiking activity is recovered by a non-selective constant input applied during 300 ms to all excitatory neurons during the two reactivation periods (marked by yellow and green rectangles in the raster plot). The amplitude of the input was 11 mV during the first and 13 mV during the second reactivation period. Reactivation periods are marked in light gray shading in the remaining panels below and the cue period is indicated by dark gray shading. Firing rates (second row), synaptic facilitation variable u (third row), and synaptic depression variable x (bottom row) for the same trial, averaged for 500 neurons around the neuron with 180° as preferred direction (solid lines) and around the neuron with 0° as preferred direction (dashed lines). Note that reactivation recovers the activity bump (C) but also causes elevated firing and subsequent enhancement of synapses at all positions in the networks. (B) Activity in a network with demyelination of 50% of the myelinated segments by removing 60% of the myelin lamellae. AP failures lead to reduced firing rates in the cue and early delay periods and consequently to weaker synaptic enhancement. (C) Average spike counts of the excitatory neurons during the cue period (black lines), and the two reactivation periods indicated in the raster plots in A and B (yellow and green lines). Solid lines correspond to the control network and dashed lines to the perturbed network. (D) Memory strength as a function of time for the control and perturbed networks. (E-F) Trajectories of the bump center (i.e., remembered cue location) read out from the neural activity across the cue and delay periods using a population vector (see Methods). Cue position was 180° in all trials. The perturbed network (F) shows larger working memory errors towards the end of the delay period compared to the control network (E).

      Author response image 2.

      Effect of propagation delays on control and perturbed activity-silent network models. (A) Memory strength during the whole simulation time for the young, control networks relying on activity-silent working memory (Supplementary Figure 8) with zero propagation delays (blue line), and with propagation delays from a uniform distribution with a range between 0 and 40 ms (yellow line) and between 0 and 100 ms (orange line). (B) Memory strength for perturbed networks when demyelinating 25% of the myelinated segments by removing 50% of the myelin lamellae, without delays (red line), and with uniformly distributed delays between 0 and 40 ms (light gray line) and between 0 and 100 ms (black line). The cue period is indicated by dark gray shading and reactivation periods are marked in light gray. Memory strength was calculated by averaging across 280 trials for one network. Shaded areas indicate SEM for each case. For the young, control networks (A), working memory was not affected by including delays of up to 40 ms. Unrealistically long delays ranging up to 100 ms did cause an impairment (the longest delays found for the most extreme perturbation condition – demyelination of 75% of the segments by removing 100% of the myelin lamellae – were of 49.9 ms on average). When also incorporating AP failures to the networks (B), we observed a similar trend. For this perturbation condition, delays of up to 40 ms were already much larger than the delays quantified in the single neuron model (for the case of 25% of the segments demyelinated by removing 50% of the myelin lamellae, the average delay in the cohort was 3.75 ms).

      With additional simulations to address these issues, I consider that the present study would become a convincing milestone in the computational modeling of myelin-related models, and an important study in the field of working memory.

      Again, we would like to thank the reviewer for the positive comments. We have addressed all the main issues raised (see below our response to the “recommendations for the authors”).

      Reviewer #2 (Public Review):

      This paper analyzes the effect of axon de-myelination and re-myelination on action potential speed, and propagation failure. Next, the findings are then incorporated in a standard spiking ring attractor model of working memory.

      I think the results are not very surprising or solid and there are issues with method and presentation.

      The authors did many simulations with random parameters, then averaged the result, and found for instance that the Conduction Velocity drops in demyelination. It gives the reader little insight into what is really going on. My personal preference is for a well understood simple model rather than a poorly understood complex model. The link between the model outcome of WM and data remains qualitative, and is further weakened by the existence of known other age-related effects in PFC circuits.

      We thank the reviewer for the critical assessment of our work. We share the view that understanding simple models can provide critical insights into brain function (and we believe that many of our papers related to attractor dynamics in working memory and decision making fall into this category, e.g. Wimmer et al. 2014, Esnaola-Acebes et al. 2022, Ibañez et al 2020). However, we respectfully disagree with the reviewer on an important point: the model complexity that we have chosen is appropriate and necessary to study the phenomenon at hand. Our modeling efforts are principled, with complexity added as necessary. We started with a biophysical single neuron model with firing dynamics fit to empirical data in pyramidal neurons of rhesus monkey dlPFC (Rumbell et al. 2016) – the same type of neurons and cortical region analyzed in the Peters et al. work on structural changes to myelin seen during aging (e.g., Figure 1). Because simple models do not accurately capture the CV along thin axons like those in the PFC, we attached a multicompartment axon with detailed myelinated segments, and constructed a cohort of feasible models. We then used this cohort to get quantitative estimates of the effects of variable degrees of demyelination and remyelination. This would not be possible with a simpler model. We then study the consequences of de- and re-myelination in a spiking neural network model. Again, we could not use a simpler model (e.g. a firing rate attractor model) without making gross assumptions about how demyelination affects circuit function. In sum, we believe that our models are relatively simple but comprehensive given the phenomenon that we are studying.

      The reviewer is correct in that there exist “known other age-related effects in PFC circuits”. These are reviewed in the introduction and we discuss future extensions of our model that would incorporate those effects as well. It is important to note that this is the first comprehensive study of demyelination effects in aging PFC, demonstrating that myelin changes alone predict working memory changes associated with aging.

      The specific issues about modeling choices and interpretation of the results are discussed below.

      Both for the de/re myelination the spatial patterns are fully random. Why is this justified?

      We agree that myelin dystrophy during aging could be non-random, that is, localized to certain regions of an axon. Our collaborators (Drs Jennifer Luebke, Maya Medalla, and Patrick Hof) are currently addressing this question using 3D electron microscopy and immunohistochemistry on axons of individual neurons and their associated myelin, but results are not available yet. Early on in this study we examined how the location of myelin alterations affected AP propagation. Focusing demyelination along a section of axon led to more AP slowing and failure than when spatially randomized. Likewise, remyelination of such spatially localized dystrophy led to greater recovery, as there were fewer transitions between long and short internodes (Supplemental Figure 4). Since otherwise the effects in the localized cases were largely similar to those in the spatially random case (see Author response image 3 below), for brevity in this paper we assumed myelin alterations were randomly distributed. Our next paper, extending this study to collateralized axons and which was presented as a poster at the 2023 Society for Neuroscience meeting, will include an examination of localized myelin dystrophy.

      Author response image 3.

      Effect of localized myelin alterations on CV change. Myelin alterations were either focused on the third of myelinated segments closest to the initial segment (‘proximally clustered’), the third of myelinated segments furthest from the initial segment (‘distally clustered’), or distributed according to a uniform distribution as in the current study. For demyelination, all lamellae were removed from 25% of myelinated segments (showing mean +/- SEM of all 50 cohort models, 30 randomized trials each). For remyelination, affected segments were replaced by two shorter segments with 75% of the original lamellae thickness and a node in between.

      We have added two sentences in Methods to justify this assumption more clearly (line 510): “Evidence suggests that aging affects oligodendrocytes in several ways, including the ability for oligodendrocyte precursor cells to mature (Dimovasili et al., 2022). Knowing that individual oligodendrocytes myelinate axons of many different neurons, but without data quantifying how oligodendrocyte dystrophy affects myelination in individual axons, we assumed that myelin alterations were randomly distributed.”

      We have also added a sentence in the Discussion alluding to our upcoming study (line 434): “Our model can also be extended to explore interactions between spatially localized myelin perturbations (such as those seen in multiple sclerosis) and axon collateralization (Sengupta et al., 2023), which would affect the distance-dependence of AP failures.”

      Similarly, to model the myelin parameters were drawn from uniform distributions, Table 1 (I guess). Again, why is this reasonable?

      The reviewer is correct that our initial Latin hypercube sample generated a uniform distribution. However, parameters of the random sample of models selected as biologically feasible were not uniformly distributed. We have added a new figure (Supplementary Figure 1A) to illustrate the parameter distributions, and have added two sentences in Methods (starting on line 596):

      “Of the 1600 simulated models, 138 met these criteria; for the present study, we randomly selected 50 models to comprise the young, control model cohort. Along most dimensions, the chosen cohort was approximately normally distributed (Supplementary Figure 1). The g-ratio (ratio of axon to fiber diameter) among models in the cohort was 0.71 ± 0.02, with total axon lengths of 1.2 ± 0.1 cm.”

      Author response image 4.

      Distribution of parameters and conduction velocities in the single neuron model cohort. (A) Histograms of axon morphology parameters of models selected for the single neuron cohort. Top: axon diameter: middle, length of unperturbed myelin segments; bottom: total myelin thickness in unperturbed segments, computed as the product of lamella thickness and number of lamellae. (B) Histograms of the CV for the 50 axons of the unperturbed model cohort (top), and representative demyelination and remyelination perturbations: mild demyelination (removing 25% of lamellae from 25% of the myelinated segments, second row); severe demyelination (removing all lamellae from 75% of the myelinated segments, third row); and complete (100%) remyelination (where the demyelinated segments from the third row were remyelinated by two shorter segments with 75% of lamellae). CVs averaged over 30 trials in each case. (C) Changes in CV (measured in %) in response to demyelination and remyelination versus the magnitude of current clamp step (+180, +280, or +380 pA). Shown are mean +/- SEM for demyelinating 50% of myelinated segments (removing all lamellae), and subsequent remyelination of those segments by shorter segments with 75% of lamellae.

      The focus of most analysis is on the conduction velocity but in the end, this has no effect on WM, so the discussion of CV remains sterile.

      CV delays likely do affect brain functions that rely on neuronal oscillations and synchrony, as mentioned in the Discussion. As such, we feel that our single neuron model results on CV delays as well as AP failures are valuable for the scientific community. Yet, given the results of our network models here, the reviewer has a valid point. We have clarified in the introduction that AP failures but not CV delays affected the network output (line 115):

      “Higher degrees of demyelination led to slower propagation and eventual failure of APs along the axons of the multicompartment models. In the network models, an increase in AP failure rate resulted in progressive working memory impairment, whereas slower conduction velocities, in the range observed in the multicompartment models, had a negligible effect.”

      We have also revised the single neuron section of the Results throughout, to better highlight the effects of myelin dystrophy on AP failures. Revisions to address this in the demyelination section start on line 148:

      “AP propagation was progressively impaired as demyelination increased (Figure 3): CV became slower, eventually leading to AP failure. Removing 25% of lamellae had a negligible effect on CV, regardless of how many segments were affected. However, when all lamellae were removed, CV slowed drastically – by 38 ± 10% even when just 25% of the segments were demyelinated in this way, and 35 ± 13% of APs failed. When 75% of segments lost all their lamellae, CV slowed by 72 ± 8% and 45 ± 13% of APs failed.”

      Similiarly, we have added several sentences about AP failures that remain after remyelination of the single neuron model (starting on line 190):

      “Results for the percentage of AP failures (Figure 4C,F) were consistent with those for CV recovery. Remyelinating all previously demyelinated segments, even adding just 10% of lamellae, brought AP failure rates down to 14.6 ± 5.1%. Remyelinating all affected segments with 75% of lamellae (the maximal amount of remyelination) nearly eliminated AP failures (1.8 ± 1.1%). Incomplete remyelination, where some segments were still demyelinated, still had relatively high AP failure rates. For example, when one eighth of segments were remyelinated with the maximal amount of lamellae and one eighth were left bare, 25.7 ± 11.5% of APs failed across the cohort (Figure 4C, red dashed line and arrow). AP failure rates were slightly lower when starting with partial demyelination: 10.6 ± 7.6% of APs failed in the analogous paradigm (Figure 4F, red dashed line and arrow). In short: combinations of demyelinated and remyelinated segments often led to sizable CV delays and AP failures.”

      The more important effect of de/re myelination is on failure. However, the failure is, AFAIK, just characterized by a constant current injection of 380pA. From Fig 2 it seems however that the first spike is particularly susceptible to failure. In other words, it has not been justified that it is fine to use the failure rates from this artificial protocol in the I&F model. I would expect the temporal current trace to affect whether the propagation fails or not.

      In general, we did not find the first spike to be more susceptible to failure than latter spikes; the trace in Figure 2 is a representative snapshot intended to illustrate CV slowdown, AP failure, and recovery. Regarding the constant current injection: while the reviewer is correct that neurons do not receive such inputs in vivo, the applied current injections were designed to match in vitro current clamp protocols for these rhesus monkey neurons. While our future studies will include responses to more realistic synaptic inputs, we focused on somatic current injections here. We have added a new panel (C) to Supplementary Figure 1 (see previous response above) showing that the current step magnitude had little effect on the CV change after myelin perturbations; there was little effect on AP failure rates too. We now also state this finding more explicitly in Methods (starting on line 561):

      “As done during in vitro electrophysiological experiments (Chang et al., 2005; Ibanez et al., 2020) and past modeling studies (Coskren et al., 2015; Rumbell et al., 2016), we first applied a holding current to stabilize the somatic membrane potential at -70 mV, then injected a current step into the somatic compartment for 2 seconds. …The CV changes in response to myelin alterations were relatively insensitive to variations in the magnitude of suprathreshold somatic current steps (Supplementary Figure 1C), and whether the current was constant or included Gaussian noise. Therefore, here we quantified CV changes and AP failures from responses to constant +380 pA current steps only.”

      I don't know if there are many axon-collaterals in the WM circuits and or distance dependence in the connectivity, but if so, then the current implementation of failure would be questionable.

      We agree that axon collaterals may affect our results; our unpublished morphological analyses of individual neuron axons indicate that there is a high degree of local axon collateralization in Layer 3 pyramidal neurons in LPFC. In this first study from our group on myelin perturbations, we chose to focus here on unbranched axons. There was some distance dependence of AP failure along the length of the axon. For example, in our most extreme demyelination case (75% of segments losing all their lamellae), about 14% of the axons showed more AP failure at their distal ends relative to the middle (mean difference 6.33%). We are examining this distance dependence more broadly in our next study, now cited in the Discussion (line 434): “Our model can also be extended to explore interactions between spatially localized myelin perturbations (such as those seen in multiple sclerosis) and axon collateralization (Sengupta et al., 2023), which would affect the distance-dependence of AP failures.”

      I would also advise against thresholding at 75% failure in Fig3C. Why don't the authors not simply plot the failure rate?

      We thank the reviewer for this suggestion, and have made this change. As suggested by the reviewer, we now show the AP failure rate in Figure 3 and Figure 4. The trends shown are nearly identical to those from the high failure trials.

      Regarding the presentation, there are a number of dead-end results that are not used further on. The paper is rather extensive, and it would be clearer if written up in half the space. In addition, much information is really supplementary. The issue of the CV I already mentioned, also the Lasso regression for instance remains unused.

      We understand the reviewer’s perspective, and we do value brevity when possible. During the revision process we examined the paper carefully, and made things more concise when it was feasible. As mentioned above, reporting CV results is important, though these revisions increased emphasis on results for AP failures in our revision. We combined the two Supplementary Figures about remyelination in the single neuron model into one (Supplementary Figure 3). We also moved the Lasso figure and associated methods to the Supplementary Material (Supplementary Figure 2), and have separated the Lasso results for demyelination and remyelination into their respective paragraphs (lines 154-160 and lines 200-204 respectively). While we do not use the Lasso explicitly later in Results, we cite them in the Discussion when comparing our findings to previous work (starting on line 417):

      “Since our single neuron cohort sampled a wide range of parameter space, we used Lasso regression to identify which of the complex, interacting parameters contributed most to CV delays (which preceded AP failures). Parameters including axon diameter, node length, length of myelinated segments, and nodal ion channel densities predicted how our models responded to demyelination and remyelination; these findings are consistent with past modeling studies over more limited parameter ranges (e.g., Goldman and Albus, 1968; Moore et al., 1978; Babbs and Shi, 2013; Young et al., 2013; Schmidt and Knösche, 2019).”

      We hope that our revision has struck an appropriate balance between clear and concise writing, and addressing concerns from both reviewers. We greatly value the time you have given to help us to improve our manuscript.

      Response to Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      As I mentioned above, I consider that this study is well designed and it offers very interesting results. I have detailed below some of the issues that should be addressed to improve its potential impact in the field:

      (1) Across the manuscript, it is not entirely clear how the results of the multicompartmental model compare to existing modeling results on demyelination and CV changes (such as in the papers cited by the authors). Is this section confirming previous results with a new (more accurate) computational model, or are there any new insights previously unreported? A new paragraph in the Discussion putting these results in context would be very useful for the reader.

      We thank the reviewer for this suggestion. We have added two new subheadings to organize the Discussion better, and have expanded the single neuron section to three paragraphs. We feel this now clarifies how our model fits in with previous work while stating its novelty more explicitly. Starting on line 391:

      “Myelin changes affect AP propagation in a cohort of model neurons

      The novelty of our neuron model lies in its systematic exploration of a combination of different myelin perturbation types known to occur in myelin dystrophies, across a wide range of biologically feasible models. Our single neuron model assumed that age-related myelin dystrophies (e.g., Figure 1) alter the insulative properties of lamellae analogously to demyelination, and examined interactions between demyelination and remyelination. Past studies of myelin dystrophy examined how either demyelination or remyelination of all segments affected AP propagation for a few representative axon morphologies. For example, Scurfield and Latimer (2018) explored how remyelination affected CV delays, finding that axons with more transitions between long and short myelinated segments had slower CV (Supplementary Figure 4), and was first to explore how remyelination interacts with tight junctions. However, their study did not couple remyelination and demyelination together or examine AP failures. Other basic findings from our single neuron cohort are consistent with past modeling studies, including that demyelination caused CV slowing and eventual AP failures (Stephanova et al., 2005; Stephanova and Daskalova, 2008; Naud and Longtin, 2019), and, separately, that remyelination with shorter and thinner myelinated segments led to CV slowing (Lasiene et al., 2008; Powers et al., 2012; Scurfield and Latimer, 2018). However, by assuming that some previously demyelinated segments were remyelinated while others were not, we found that models could have much higher AP failure rates than previously reported. Such a scenario, in which individual axons have some segments that are normal, some demyelinated, and some remyelinated, is likely to occur. We also found a few neurons in our cohort showing a CV increase after remyelination, which has not generally been reported before and is likely due to an interplay between ion channels in the new nodes and altered electrotonic lengths in the perturbed myelinated segments (e.g., Waxman, 1978; Naud and Longtin, 2019).

      Since our single neuron cohort sampled a wide range of parameter space, we used Lasso regression to identify which of the complex, interacting parameters contributed most to CV delays (which preceded AP failures). Parameters including axon diameter, node length, length of myelinated segments, and nodal ion channel densities predicted how our models responded to demyelination and remyelination; these findings are consistent with past modeling studies over more limited parameter ranges (e.g., Goldman and Albus, 1968; Moore et al., 1978; Babbs and Shi, 2013; Young et al., 2013; Schmidt and Knösche, 2019). Better empirical measurements of these parameters in monkey dlPFC, for example from 3-dimensional electron microscopy studies or single neuron axon studies combined with markers for myelin, would help predict the extent to which myelin dystrophy and remyelination along individual axons with aging affect AP propagation.

      Another important feature of our multicompartment model is that it was constrained by morphologic and physiological data in rhesus monkey dlPFC —an extremely valuable dataset from an animal model with many similarities to humans (Upright and Baxter, 2021; Tarantal et al., 2022). While beyond the scope of the current study, this computational infrastructure –with a detailed axon, initial segment, soma, and apical and basal dendrites– enables simultaneous investigations of signal propagation through the dendritic arbor and axon. Our model can also be extended to explore interactions between spatially localized myelin perturbations (such as those seen in multiple sclerosis) and axon collateralization (Sengupta et al., 2023), which would affect the distance-dependence of AP failures. Integrating such results from single neuron models into network models of working memory, as we have done here, is a powerful way to connect empirical data across multiple scales.”

      (2) Although the authors provide a well-designed study for the multi-compartmental model, it would be useful to add more details about how an unperturbed model and a completely remyelinated model differ in practice, perhaps right before the first results on the single cell model are presented. Are the new myelin sheaths covering the same % of axon as in the original case? Are there the same number of nodes? It is hard to distinguish which of these results are due to a compensation by the new myelin sheaths and which ones are just the model coming back to its original (and mathematically equivalent) starting point.

      A ‘fully remyelinated’ axon is not mathematically equivalent to the unperturbed axon. Newly remyelinated segments had at most 75% of the original number of myelin wraps, with a new node in between, consistent with empirical observations in rhesus monkey dlPFC. Our manuscript changes in response to this recommendation are described in detail above in our response to the public review of the same reviewer.

      (3) The authors observe a directed component in the bias that is known to be caused by heterogeneities in network connectivity, as stated in the text. It occurs to me that similar effects could be also caused by an heterogeneous demyelination in parts of the network. Inducing these biases could be another potential effect of demyelination in practice, and could be easily revealed by the author's current model (and displayed in a supplementary figure).

      As suggested by the reviewer, we have tested heterogeneous demyelination in parts of the network and the results confirm the reviewer’s intuition. We have included these new results as new Supplementary Figure 7 (see below) and we have added the following sentences in the Legend of Figure 5, line 1265: “When demyelination is restricted to a part of the network, diffusion only increases in the perturbed zone (Supplementary Figure 7).” and in the Discussion (line 457): “In addition to age-related changes in memory duration and precision, our network model predicts an age-related increase in systematic errors (bias) due to an increased drift of the activity bump (Supplementary Figure 11). Moreover, if demyelination is spatially localized in a part of the network, the model predicts a repulsive bias away from the memories encoded in the affected zone (Supplementary Figure 7).”

      Author response image 5.

      Effect of spatially heterogeneous demyelination of the model neurons according to their preferred angle. We also tested working memory performance in the network when demyelination affects only parts of the network. The figure shows the decoded bump center position during the cue and delay period for the eight possible cue directions when a fraction of neurons was perturbed and the rest of the neurons in the circuit were unaltered (Figure 5B). We perturbed 10% of the neurons around the neuron with preferred direction 90° (left panel), 25% of the neurons around -90° (middle panel), and 50% of the neurons around 180° (right panel). Bump traces for cues that lie inside the perturbed portion of the circuit are shown in blue. Network perturbation in the three cases consisted in demyelinating 25% of the segments along the axons of model neurons, by removing 70% of the myelin lamellae. In each case, 280 trials were simulated for one network. These simulations show an increased drift and diffusion inside the perturbed zone, consistent with the increased drift and diffusion when perturbing the entire network (Figure 6B and Supplementary Figure 11). In particular, spatially heterogeneous demyelination in our network leads to a bias away from the affected zone and to increased trial-to-trial variability. Note that this is a model prediction, but we are not aware of empirical data showing heterogeneous demyelination with aging. Further, note that while our network model has a topological ring structure, neurons in PFC are not anatomically arranged depending on their preferred features. Thus, spatially heterogeneous demyelination would likely affect neurons with different feature preferences (i.e., neurons throughout our ring model).

      (4) The bump attractor model of WM relies on a continuous attractor dynamics to encode the information stored in memory --a fixed point dynamics that can only vary via the slow noise-driven drift. This means, as the authors mention, that changes in CV won't affect the performance of WM in their model. This seems to be a limitation of the model, or at least an effect which is highly dependent on the modeler's choice, rather than an accurate prediction. While testing the effects of oscillations (as the authors argue in the Discussion) might be out of the scope of this work, there are other WM models which are more sensitive to temporal differences in activity. The authors should test whether the same (lack of) effects are also found in other WM models. A silent WM model seems to be the ideal candidate for this, as the authors already have the key dynamics of that model incorporated in their computational framework (namely, short-term synaptic facilitation in excitatory synapses).

      We fully agree that considering the effects of demyelination in networks with alternative mechanisms would strengthen our manuscript. As suggested by the reviewer, we have simulated demyelination effects (AP failures and changes in CV) in an activity silent working memory model. The results are described in detail above in our response to the public review of the same reviewer.

      We also would like to mention that we have now also tested larger conduction delays in the bump attractor model, revealing additional working memory errors. This is shown in the revised version of Supplementary Figure 6 (see below). However, those delays are unrealistically large and thus the main effect in both the bump attractor and the activity-silent model is due to AP failures.

      Author response image 6.

      Effect of propagation delays on control and perturbed networks. (A) Memory strength (left panels) and diffusion (right panels) for the young, control networks with zero propagation delays (blue solid line), as in Figure 5, and with propagation delays from a uniform distribution with a range between 0 and 100 ms (yellow dashed line). (B) Memory strength and diffusion for perturbed networks when demyelinating 50% of the segments along the axons of model neurons, by removing 60% of the myelin lamellae without delays (red solid line), and with delays from a uniform distribution with a range between 0 and 40 ms (gray dashed line) and between 0 and 85 ms (black dash-dotted line). The measures of working memory performance were calculated by averaging across 20 networks and 280 trials for each network. Shaded areas indicate SEM for each case. For the young, control networks, there was no difference with and without propagation delays, even though the delays used in the network simulations were much larger than the delays quantified in the single neuron model (the longest delays found for the most extreme perturbation condition –demyelination of 75% of the segments by removing 100% of the myelin lamellae– were of 49.9 ms on average; A). Working memory performance was also unaffected in the perturbed network with AP failures for delays ranging between 0 and 40 ms, also larger than the ones quantified in the single neuron model (for the case of 50% of the segments demyelinated by removing 60% of the myelin lamellae, the average delay in the cohort was 4.6 ms and the maximum delay was 15.7 ms; B). However, including extremely long delays of up to 85 ms did further impair memory compared to the impairment level introduced by AP failures alone (B).

      (5) Impact of demyelination and remyelination on working memory: Could the authors explain here how these biologically detailed alterations are implemented in the bump attractor model? Is the CV and AP failure rate adjusted to the values produced by the multicompartment neuron model with these myelin alterations?

      Yes, the reviewer is right, the CV and AP failure rate have been adjusted to the values produced by the multicompartment neuron model. To clarify this in the manuscript, we have restated the text as follows:

      Lines 243 - 249 (Results):

      To investigate how myelin alterations affect working memory maintenance, we explored in the network model the same demyelination and remyelination conditions as we did in the single neuron model. Because our network model consists of point neurons (i.e., without detailed axons), we incorporated CV slowing as an effective increase in synaptic transmission delays (see Methods). To simulate AP failures, we adjusted the AP failure rate to the values given by the single neuron model, by creating a probabilistic model of spike transmission from the excitatory presynaptic neurons to both the excitatory and inhibitory postsynaptic neurons (see Methods).

      Lines 722 - 747 (Methods):

      Modeling action potential propagation failures in the network. The network model is composed of point neurons without an explicit model of the axon. To effectively model the action potential failures at the distal end of the axons quantified with the single neuron model under the different demyelination and remyelination conditions, the AP failure rate was adjusted to the values produced by the single neuron model. To do this, we perturbed the 10 control networks by designing a probabilistic model of spike transmission from the excitatory presynaptic neurons to both the excitatory and inhibitory postsynaptic neurons. From the single neuron model, for each demyelination/remyelination condition, we quantified the probability of AP failure for each of the neurons in the control cohort, as well as the percentage of those neurons that shared the same probabilities of failure. That is, the percentage of neurons that had probability of failure = 0, probability of failure = 1 or any other probability. Then, we computed the probability of transmission, , and we specified for the corresponding percentages of excitatory neurons in the networks. Thus, in the network model, we took into account the heterogeneity observed in the single neuron model under each demyelination/remyelination condition.

      Modeling conduction velocity slowing in the network. To explore the effect of CV slowing along the axons of model neurons, we simulated 20 young, control networks and 20 perturbed networks with AP failure rates adjusted for the case of single model neurons with 50% of the segments demyelinated along the axons by removing 60% of the myelin lamellae (we ran 280 trials for each network). Then, we added random delays uniformly distributed with a minimum value of 0 ms in both cases, a maximum value of 100 ms in the control networks, and a maximum values of 40 ms and 85 ms in the perturbed networks, in both the AMPA and NMDA excitatory connections to both E and I neurons (Supplementary Figure 6). These large values were chosen because we wanted to illustrate the potential effect of CV slowing in our network and smaller, more realistic, values did not have any effect.

      (6) "We also sought to reveal the effect on working memory performance of more biologically realistic network models with AP transmission probabilities matched to both axons with intact and with altered myelin sheaths, as likely occurs in the aging brain (Figure 1). Thus, we ran network model simulations combining AP failure probabilities corresponding to groups of neurons containing intact axons and axons presenting different degrees of demyelination." I fail to see the difference with respect to the results in previous sections. Is it that now we have subnetworks in which axons are intact and subnetworks with significant AP failures, while before there was no topological separation between both cases? Please clarify.

      In Figures 5 and 6 the AP failure rate of the neural population in the network simulations was matched to the AP failure rate of the cohort of single model neurons for each demyelination/remyelination condition. Since not all model neurons have equal features, a given condition produces different levels of impairment in its neuron. Thus, we quantified the probability of AP failure for each neuron in the control cohort, as well as the percentage of those neurons that shared the same probabilities of failure. Then, we computed the probability of AP transmission for the corresponding percentages of excitatory neurons in the networks. Thus, in the network model, we took into account the heterogeneity observed in the single neuron model under each demyelination/remyelination condition.

      However, In Figures 7 and 8, we consider additional heterogeneity due to a different degree of demylination/remyelination of different neurons. Here, excitatory neurons in the network model are not perturbed according to a single demyelination/remyelination condition. Instead, we allowed that different percentages of excitatory neurons had AP failure rates corresponding to different demyelination/remyelination conditions: some were unperturbed, while others had different degrees of demyelination (Figure 7) and different degrees of remyelination (Figure 8). We have modified the text for clarification in several places.

      First, when we describe the impact of demyelination on working memory, we already mention that (line 271): “In each of the 10 networks, we set the AP failure rate of the excitatory neurons according to the distribution of failure probabilities of the neurons in the single neuron cohort for the given demyelination or remyelination condition. Thus, we took into account the heterogeneity of demyelination and remyelination effects from our single neuron cohort (Figure 3A; Supplementary Figure 3). Note that this heterogeneity originates from differences in axon properties, but probabilities of failure for all neurons in the network correspond to the same degree of demyelination (Figure 6). We will also consider networks that contain different combinations of axons with either intact or perturbed myelin (Figure 7 and Figure 8).”

      Second, we have combined the text describing Figures 7 and 8 under a single section title, which reads “Simulated heterogenous myelin alterations match empirical data” (line 334) and start this section with (line 337): “Up to this point we have studied network models with AP failure probabilities corresponding to a single degree of myelin alterations (i.e., with all excitatory neurons in the network having AP failure rates matched to those of the single neuron cohort for one particular demyelination or remyelination condition). Next, we sought to reveal the effect on working memory performance of more biologically realistic network models, where excitatory neurons in the networks were perturbed according to a combination of different demyelination or remyelination conditions. That is, we simulated networks with excitatory neurons having AP failure probabilities matched to both neuronal axons with intact and with altered myelin sheaths in different degrees, as likely occurs in the aging brain (Figure 1).”

      (7) "Unexpectedly, our model indicates that compared to the performance of networks composed of neurons possessing axons with intact myelin sheaths, both demyelination and remyelination leads to an impaired performance." This conclusion is quite interesting, but I lack intuition from the paper as of why it is happening. In fact, the authors say in the Discussion that "complete remyelination of all the previously demyelinated segments with sufficient myelin, with fewer transitions between long and short segments, recovered working memory function." Would we then see a minimum and then an increase in memory duration in Figure 9B if we extended the X-axis until we hit 100% of new myelin sheaths?

      This is a very important question that we have carefully addressed in Results and Discussion. We distinguish between two remyelination cases in the models. Complete remyelination: when all (100%) the previously demyelinated segments have been subsequently remyelinated, and incomplete remyelination: when less than 100% (25%, 50% or 75%) of the demyelinated segments have been remyelinated. Figure 6 (middle and right columns) shows the two cases (black lines for any percentage of lamellae added vs. colored lines): for 100% of the segments remyelinated, the network performance is nearly or completely (when enough lamellae are added) recovered to the young network performance. In fact, with the single neuron model we observe that (lines 192 - 193 in Results): “Remyelinating all affected segments with 75% of lamellae (the maximal amount of remyelination) nearly eliminated AP failures (1.8 ± 1.1%)”. However, incomplete remyelination recovers the performance compared to demyelination (middle and right columns in Figure 6 vs left column), but this performance is worse than the performance of the young networks. The single neuron model shows that (lines 194 - 197 in Results): “Incomplete remyelination, where some segments were still demyelinated, still had relatively high AP failure rates. For example, when one eighth of segments were remyelinated with the maximal amount of lamellae and one eighth were left bare, 25.7 ± 11.5% of APs failed across the cohort (Figure 4C, red dashed line and arrow).”

      In Figure 9B (now Figure 8B), we combine intact axons with axons that are only partially remyelinated (i.e., incomplete remyelination). Extending the X-axis in Figure 8B until 100% of new myelin sheaths would not imply a minimum and a subsequent increase, but a continuous impairment: the more axons we perturb (remyelinate) the higher is the impairment compared to the young cases where all the axons are intact.

      The sentence "Unexpectedly, our model indicates that compared to the performance of networks composed of neurons possessing axons with intact myelin sheaths, both demyelination and remyelination leads to an impaired performance.", now reads as (lines 379 380 in Results): “Therefore, both demyelination and incomplete remyelination lead to impaired performance in our networks, compared to networks with intact myelin sheaths”. We have also rewritten the corresponding section in Discussion (lines 486 - 489) as follows: “Therefore, it is reasonable to assume that ineffective remyelination may lead to working memory impairment. In fact, complete remyelination of all previously demyelinated segments with sufficient myelin, with fewer transitions between long and short segments, led to full recovery of working memory function.”

      (8) [minor] "Our recent network model found that age-related changes in firing rates and synapse numbers in individual neurons can lead to working memory impairment (Ibañez et al., 2020), but did not consider myelin dystrophy." Could you be more precise about which age-related changes were studied in Ibanez et al. 2020? From the paper it seems like it was mostly cellular excitability and synaptic density, so this should be added here for more context.

      To clarify this, we have added the following sentences in the Introduccion (line 105):

      “Our recent network model revealed that the empirically observed age-related increase in AP firing rates in prefrontal pyramidal neurons (modeled through an increased slope of the f-I curve) and loss of up to 30% of both excitatory and inhibitory synapses (modeled as a decrease in connectivity strength) can lead to working memory impairment (Ibañez et al., 2020), but this model did not incorporate the known changes to myelin structure that occur during normal

      aging.”

      (9) [minor] "Recurrent excitatory synapses are facilitating, which promotes robust and reliable persistent activity despite spatial heterogeneities in the connectivity or in the intrinsic properties of the neurons." It would be great to add a reference here to justify the inclusion of this type of plasticity in the excitatory circuit (for example Wang, Markram et al. Nat Neuro 2006).

      We have added the references suggested by the reviewer and a further one in the Results (line 216):

      “Recurrent excitatory synapses are facilitating, as has been empirically observed in PFC (Hempel et al., 2000; Wang et al., 2006), which promotes robust and reliable persistent activity despite spatial heterogeneities in the connectivity or in the intrinsic properties of the neurons.”

      References:

      Hempel, C. M., Hartman, K. H., Wang, X. J., Turrigiano, G. G., and Nelson, S. B. (2000). Multiple forms of short-term plasticity at excitatory synapses in rat medial prefrontal cortex. J. Neurophysiol. 83, 3031–3041. doi: 10.1152/jn.2000.83.5.3031

      Wang, Y., Markram, H., Goodman, P. H., Berger, T. K., Ma, J., and Goldman- Rakic, P. S.(2006). Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat.Neurosci. 9, 534–542. doi: 10.1038/nn1670

    1. Reviewer #2 (Public Review):

      Summary:

      The goal of the paper was to trace the transitions hippocampal microglia undergo along aging. ScRNA-seq analysis allowed the authors to predict a trajectory and hypothesize about possible molecular checkpoints, which keep the pace of microglial aging. E.g. TGF1b was predicted as a molecule slowing down the microglial aging path and indeed, loss of TGF1 in microglia led to premature microglia aging, which was associated with premature loss of cognitive ability. The authors also used the parabiosis model to show how peripheral, blood-derived signals from the old organism can "push" microglia forward on the aging path.

      Strengths:

      A major strength and uniqueness of this work is the in-depth single-cell dataset, which may be a useful resource for the community, as well as the data showing what happens to young microglia in heterochronic parabiosis setting and upon loss of TGFb in their environment.

      Weaknesses:

      That said, given what we recently learned about microglia isolation for RNA-seq analysis, there is a danger that some of the observations are a result of not age, but cell stress from sample preparation (enzymatic digestion 10min at 37C; e.g. PMID: 35260865). Changes in cell state distribution along aging were made based on scRNA-seq and were not corroborated by any other method, such as imaging of cluster-specific marker expression in microglia at different ages. This analysis would allow confirming the scRNA-seq data and would also give us an idea of where the subsets are present within the hippocampus, and whether there is any interesting distribution of cell states (e.g. some are present closer to stem cells?). Since TGFb is thought to be crucial to microglia biology, it would be valuable to include more analysis of the mice with microglia-specific Tgfb deletion e.g. what was the efficiency of recombination in microglia? Did their numbers change after induction of Tgfb deletion in Cx3cr1-creERT2::Tgfb-flox mice.

      Overall:

      In general, I think the authors did a good job following the initial observations and devised clever ways to test the emerging hypotheses. The resulting data are an important addition to what we know about microglial aging and can be fruitfully used by other researchers, e.g. those working on microglia in a disease context.

    1. There's a lot of really great content here. But, for readers like me (the technical/design/engineering/research side of the visualization community), I think the writing isn't landing with quite the impact that it could for a few reasons:

      (1) In my interdisciplinary collaborations, I've noticed a difference in writing styles/norms between the humanities and the design/engineering disciplines. The latter tend to favor a top-down argument structure (e.g., a crisply articulated thesis that is then unpacked via clearly signposted topic sentences). I think that's because readers like me are trying to figure out how to operationalize the things we're reading/learning. So, right from the get go, we need a clearly articulate conceptual model so that, over the course of the rest of the writing, we can figure out how to integrate it with our existing mental models of practice/research.

      In contrast, this piece takes a very bottom-up approach to the argument. For me, the experience of reading bottom-up writing is of assembling a mental model that feels more like a wobbly house of cards: ad hoc, duct taped together, and needing to constantly swap/rearrange it as more pieces of the conceptual contribution reveal themselves to me.

      As a concrete example, for the first third of this chapter, I wasn't actually sure what I was supposed to be taking away. I almost wondered whether I should suggest titling the chapter "preface" instead of "introduction" because it opens by being focused inwardly (i.e., on the presentation of the homepage) in a way I'm more accustomed to with prefaces than introductions. Although that whole chunk of writing was very pleasant to read (which may also be a function of the fact that I had the pleasure of meeting y'all and learning about how the project came together!), I wasn't entirely sure what this chunk was hoping to do/communicate—or how it was hoping to influence my thinking.

      (2) Related to the first point, while I personally find the exploration of a visualization counterhistory exciting and thought-provoking, I wonder if the writing could better motivate the goals of the counterhistory a bit more explicitly and clearly? That is, if someone isn't already bought into valuing the history of the field (or doesn't know how a counterhistory may/should affect their current practice today), how might the writing persuade them to care? Or, put another way, how can the writing speak and evangelize to an audience who is open-minded, but not yet "on side." To me, this feels like a particularly important thing for an introductory chapter to do, that seems missing in the current iteration.

      (3) I'm on the fence about how central a role Tufte is given here. I think this depends on the audience you are trying to reach—I'm not sure that many (most?) visualization researchers/designers/practitioners (i.e., visualization "thought leaders") consider Tufte to play as influential a role as this chapter purports him to do. If this was the core audience, then I think the focus on Tufte could be watered down without losing much of the overall framing of "counterhistory"—because I think what this chapter describes is very much the history the field tells itself (relatively independently of Tufte, I think?).

      On the other hand, if the intended audience are the folks one hop removed (i.e., people who produce/consume visualizations in their daily lives/jobs, but aren't necessarily plugged into conversations on the bleeding edge), I think Tufte serves as a useful foil. But, something about his treatment in this chapter feels a little caricatured to me (and I say this as someone relatively ambivalent about his role). I'm not quite able to put my finger on what specifically about the writing left me with that feeling, though.

      (4) Starting with the "Two Stories of Data Visualization" (and particularly the subsequent chapter on "Every Datapoint is a Person"), I wondered whether the target of the book's critique is indeed visualization (i.e., the graphic representation of data) or whether it's more fundamental and broader practices of data (i.e., definition, collection, etc. more similar to the set of issues y'all discussed in Data Feminism). I really enjoyed all of the detail and discussion here—and I was convinced about the role that data played. But, I was perhaps less convinced about visualization's central/facilitating/empowering role in it. It's likely impossible to fully disentangle data from its representation (as the data table examples do a great job) but, if the book wants to maintain visualization as its target, I wonder if the writing could be refined a little to make its focus clearer/crisper?

      (5) I wonder if the writing can be more explicit about its positionality? I think some of the early sections (and occasional passages throughout) set up an incorrect expectation for me of a much broader (i.e., more global) counterhistory. So, I was then surprised that this chapter maintains a relatively fixed focus on Western history. In fact, I might go further to say that the writing seems to be particularly fixated on an American point of view (e.g., I raised an eyebrow at the description of the United States as "the exemplary" colonial state; as a non-American and citizen of a former colonized nation, I would consider the British Empire to be the ultimate colonial power...). I think this focus is fine if the writing is explicit that it is primarily concerned with developing a counterhistory rooted in the West (and, at that, the United States).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study identifies differential Orsay virus infection of C. elegans when animals are fed on different bacteria. The evidence for this is however, incomplete, as experiments to control for feeding rate and bacterial pathogenicity are needed as well as direct quantification of viral load. 

      We appreciate that the editors and reviewers felt that our manuscript addressed an important problem. We appreciate the constructive critiques provided by the reviewers and have worked to address all of the concerns, including a number of additional experiments as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This manuscript explores the importance of food type on virus infection dynamics using a nematode virus as a model system. The authors demonstrate that susceptibility to viral infection can change by several orders of magnitude based on the type of bacterial food that potential hosts consume. They go on to show that, for the bacterial food source that reduces susceptibility, the effect is modulated by quorum sensing molecules that the bacteria produce. 

      Strengths: 

      This manuscript shows convincingly that nematode susceptibility to viral infection changes by several orders of magnitude (i.e. doses must be increased by several orders of magnitude to infect the same fraction of the population) depending on the bacterial food source on which hosts are reared. The authors then focus on the bacteria that reduce host susceptibility to viral infection and demonstrate that certain bacterial quorum-sensing compounds are required to see this effect of reduced susceptibility. Overall, sample sizes are large, methods are generally rigorous, experiments are repeated, and patterns are clear. 

      Weaknesses: 

      Although the molecular correlate of reduced susceptibility is identified (i.e. quorum sensing compounds) the mechanisms underlying this effect are missing. For example, there are changes in susceptibility due to altered nutrition, host condition, the microbiome, feeding rate, mortality of infected hosts, etc. In addition, the authors focus almost entirely on the reduction in susceptibility even though I personally find the increased susceptibility generated when reared on Ochrobactrum to be much more exciting. 

      I was a bit surprised that there was no data on basic factors that could have led to reductions in susceptibility. In particular, data on feeding rates and mortality rates seem really important. I would expect that feeding rates are reduced in the presence of Pseudomonas. Reduced feeding rates would translate to lower consumed doses, and so even though the same concentration of virus is on a plate, it doesn't mean that the same quantity of virus is consumed. Likewise, if Pseudomonas is causing mortality of virus-infected hosts, it could give the impression of lower infection rates. Perhaps mortality rates are too small in the experimental setup to explain this pattern, but that isn't clear in the current version of the manuscript. Is mortality greatly impacted by knocking out quorum-sensing genes? Also, the authors explored susceptibility to infection, but completely ignored variation in virus shedding. 

      We have added data on feeding rates (Line numbers 141-148 and 176-182, Supplementary Figure 4). After six hours of exposure no differences in feeding rate were observed. After 24 hours minor differences emerged between O. vermis MYb71 and each Pseudomonas species, however feeding rate inversely correlated with susceptibility to Orsay virus in that O. vermis MYb71 displayed the lowest feeding rate while P. aeruginosa PA14 displayed the highest feeding rate.

      We have also added data on mortality rates (Line numbers 183-200, Supplementary Figure 6). No significant mortality was observed within the 24-hour exposure period used for our Orsay infection and transmission assays. P. aeruginosa virulence is dependent upon temperature and as our assays are done at 20°C rather than 25°C this may account for reduced mortality compared to other published results. Regardless, we noted that O. vermis MYb71 killed C. elegans as quickly as P. aeruginosa PA14 under these conditions and these two bacteria led to the shortest lifespan compared to the other tested bacteria. Interestingly, P. lurida MYb11 was observed to be more virulent than P. aeruginosa PA01 under these conditions. These results suggest that there is no direct correlation between mortality and susceptibility to Orsay virus, although it does not rule out that virulence effects unique to each bacterium could contribute to alterations in host susceptibility.  

      The reviewer is correct to assert that differences in viral shedding could exist. However, our susceptibility assays using exogenous Orsay virus remove this source of variation and yet we still observe the same trends such that O. vermis MYb71 promotes infection while P. lurida MYb11, P. aeruginosa PA01, and P. aeruginosa PA14 attenuate infection. Further we measured the amount of virus shed into the lawns in the presence of different bacteria and did not observe differences in shed virus that could account for the differences we observe in incidence proportion (Line numbers 241-254, Fig. 3 F). Viral stability could be an issue in both the transmission and susceptibility assays. We therefore tested viral stability in the presence of E. coli, P. lurida MYb11, P. aeruginosa PA01, and P. aeruginosa PA14 and successfully recovered virus from all lawns, suggesting virus is not rapidly degraded in the presence of any bacterium (Fig. 3D and 3E). However, we noted that the recovery of Orsay virus from lawns of E. coli OP50 and P. lurida MYb11 within 30 minutes was decreased compared to a spike-in control suggesting recovery from each lawn is not equivalent. This complicates a comparison of viral stability and shedding rates between different bacteria, but our ability to recover substantial amounts of virus in the shedding assay from the three Pseudomonas strains we examined precludes a substantial decrease in shedding rates as an explanation for the robust attenuation of Orsay virus observed in transmission assays.  

      I was also curious why the authors did not further explore the mechanism behind the quorumsensing effect. Not sure whether this is possible, but would it be possible to add spent media to the infection plates where the spent media was from Pseudomonas that produce the quorum sensing compound but the plates contain OP50, Pseudomonas, or the quorum sensing knockout of Pseudomonas? That would reveal whether it is the compound itself vs. something that the compound does. 

      We observed that quorum sensing mutants suppressed the attenuation of Orsay virus infection and we agree that this could be a consequence of the compounds themselves, or more likely an effect of the downstream consequences of quorum signaling. We added culture supernatant from each bacterium to lawns of E. coli OP50 to assess the effect on host susceptibility and did not observe any potent effect (Line numbers 311-318, Supplementary Figure 9). This supports an interpretation that it is not the compound itself that is responsible, however we cannot rule out that the compounds themselves may be responsible if provided at a higher concentration.

      In addition, I was surprised by how much focus there was on the attenuation of infection and how little there was on the enhancement of infection. To me, enhancement seems like the more obvious thing to find a mechanism for -- is the bacteria suppressing immunity, preventing entry to gut cells, etc? 

      We are also intrigued by the enhancement of infection by Ochrobactrum spp, however we chose to focus on attenuation given the availability of Pseudomonas aeruginosa genetic mutants for study. We have added data (Line numbers 371-402, Figure 7, and Supplemental Figure 12) that inform our current hypothesis regarding Ochrobactrum mediated enhancement of Orsay virus infection.

      I was a bit concerned about the "arbitrary units", which were used without any effort to normalize them. David Wang and Hongbing Jiang have developed a method based on tissue culture infectious dose 50 (TCID50) that can be used to measure infectious doses in a somewhat repeatable way. Without some type of normalization, it is hard to imagine how this study could be repeated. The 24-hour time period between exposure and glowing suggests very high doses, but it is still unclear precisely how high. Also, it is clear that multiple batches of virus were used in this study, but it is entirely unclear how variable these batches were. 

      We have clarified that we also measured the (TC)ID50 for every batch of virus used similar to the methods suggested by the Wang laboratory (Line numbers 107-119 and 499-506). We have added a figure showing the virus batch variability for all batches used in this study (Supp. Fig. 2). We have further clarified that the arbitrary units correspond to the actual microliters of viral filtrate used during infection and provided clear methods to replicate our viral batch production to assist with issues of reproducibility (Line numbers 107-119 and 499-506).

      The authors in several places discuss high variability or low variability in incidence as though it is a feature of the virus or a feature of the host. It isn't. For infection data (or any type of binomial data) results are highly variable in the middle (close to 50% infection) and lowly variable at the ends (close to 0% or 100% infection). This is a result that is derived from a binomial distribution and it should not be taken as evidence that the bacteria or the host affect randomness. If you were to conduct dose-response experiments, on any of your bacterial food source treatments, you would find that variability is lowest at the extremely high and extremely low doses and it is most variable in the middle when you are at doses where about 50% of hosts are infected. 

      Thank you for pointing this out, we have removed all reference to this throughout the manuscript.

      Reviewer #2 (Public Review):

      Summary and Major Findings/Strengths:

      Across diverse hosts, microbiota can influence viral infection and transmission. C. elegans is naturally infected by the Orsay virus, which infects intestinal cells and is transmitted via the fecal-oral route. Previous work has demonstrated that host immune defense pathways, such as antiviral RNAi and the intracellular pathogen response (IPR), can influence host susceptibility to virus infection. However, little is known about how bacteria modulate viral transmission and host susceptibility. 

      In this study, the authors investigate how diverse bacterial species influence Orsay virus transmission and host susceptibility in C. elegans. When C. elegans is grown in the presence of two Ochrobactrum species, the authors find that animals exhibit increased viral transmission, as measured by the increased proportion of newly infected worms (relative to growth on E. coli OP50). The presence of the two Ochrobactrum species also resulted in increased host susceptibility to the virus, which is reflected by the increased fraction of infected animals following exposure to the exogenous Orsay virus. In contrast, the presence of Pseudomonas lurida MYb11, as well as Pseudomonas PA01 or PA14, attenuates viral transmission and host susceptibility relative to E. coli OP50. For growth in the presence of P. aeruginosa PA01 and PA14, the attenuated transmission and susceptibility are suppressed by mutations in regulators of quorum sensing and the gacA two-component system. The authors also identify six virulence genes in P. aeruginosa PA14 that modulate host susceptibility to virus and viral transmission, albeit to a lesser extent. Based on the findings in P. aeruginosa, the authors further demonstrate that deletion of the gacA ortholog in P. lurida results in loss of the attenuation of viral transmission and host susceptibility. 

      Taken together, these findings provide important insights into the species-specific effects that bacteria can have on viral infection in C. elegans. The authors also describe a role for Pseudomonas quorum sensing and virulence genes in influencing viral transmission and host susceptibility. 

      Major weaknesses: 

      The manuscript has several issues that need to be addressed, such as insufficient rigor of the experiments performed and questions about the reproducibility of the data presented in some places. In addition, confounding variables complicate the interpretations that can be made from the authors' findings and weaken some of the conclusions that are stated in the manuscript. 

      (1) The authors sometimes use pals-5p::GFP expression to indicate infection, however, this is not necessarily an accurate measure of the infection rate. Specifically, in Figures 4-6, the authors should include measurements of viral RNA, either by FISH staining or qRT-PCR, to support the claims related to differences in infection rate. 

      Following the reviewers comment we have corroborated our pals-5::GFP data using FISH staining (Line numbers 291-292 and 357-359, Figure 4D & 4E, and Figure 6C).  

      (2) In several instances, the experimental setup and presentation of data lack sufficient rigor. For example, Fig 1D and Fig 2B only display data from one experimental replicate. The authors should include information from all 3 experimental replicates for more transparency. In Fig 3B, the authors should include a control that demonstrates how RNA1 levels change in the presence of E. coli OP50 for comparison with the results showing replication in the presence of PA14. In order to support the claim that "P. aeruginosa and P. lurida MYb11 do not eliminate Orsay virus infection", the authors should also measure RNA1 fold change in the presence of PA01 and P. lurida in the context of exogenous Orsay virus. Additionally, the authors should standardize the amount of bacteria added to the plate and specify how this was done in the Methods, as differing concentrations of bacteria could be the reason for species-specific effects on infection. 

      All experimental replicates are now included within the supplementary information. 

      We have also measured RNA1 fold change following infection in the presence of P. aeruginosa PA01 and P. lurida MYb11 (Line numbers Fig 3B and 3C) and found that these bacteria also do not eliminate Orsay virus replication. 

      We thank the reviewer for their comment on controlling the amount of bacteria and have clarified our methods section to more clearly explain that we seed our plates with equivalent amounts (based on volume) of overnight bacterial culture before allowing the bacteria to grow on the plates for 48 hours.  

      (3) The authors should be more careful about conclusions that are made from experiments involving PA14, which is a P. aeruginosa strain (isolated from humans), that can rapidly kill C. elegans. To eliminate confounding factors that are introduced by the pathogenicity of PA14, the authors should address how PA14 affects the health of the worms in their assays. For example, the authors should perform bead-feeding assays to demonstrate that feeding rates are unaffected when worms are grown in the presence of PA14. Because Orsay virus infection occurs through feeding, a decrease in C. elegans feeding rates can influence the outcome of viral infection. The authors should also address whether or not the presence of PA14 affects the stability of viral particles because that could be another trivial reason for the attenuation of viral infection that occurs in the presence of PA14. 

      We have added data on feeding rates (Line numbers 141-148 and 176-182, Supplementary Figure 4). After six hours of exposure no differences in feeding rate were observed. After 24 hours minor differences emerged between O. vermis MYb71 and each Pseudomonas species, however feeding rate inversely correlated with susceptibility to Orsay virus in that O. vermis MYb71 displayed the lowest feeding rate while P. aeruginosa PA14 displayed the highest feeding rate.

      We have also added data on mortality rates (Line numbers 183-200, Supplementary Figure 6). No significant mortality was observed within the 24-hour exposure period used for our Orsay infection and transmission assays. P. aeruginosa virulence is dependent upon temperature and as our assays are done at 20°C rather than 25°C this may account for reduced mortality compared to other published results. Regardless, we noted that O. vermis MYb71 killed C. elegans as quickly as P. aeruginosa PA14 under these conditions and these two bacteria led to the shortest lifespan compared to the other tested bacteria. Interestingly, P. lurida MYb11 was observed to be more virulent than P. aeruginosa PA01 under these conditions. These results suggest that there is no direct correlation between mortality and susceptibility to Orsay virus, although it does not rule out that virulence effects unique to each bacterium could contribute to alterations in host susceptibility.  

      We tested viral stability in the presence of E. coli OP50 and Pseudomonas spp. and successfully recovered virus from all lawns, suggesting virus is not rapidly degraded in the presence of P. lurida MYb11, P. aeruginosa PA01, and P. aeruginosa PA14 (Line numbers 241-249, Fig 3D and Fig 3E). However, we noted that the recovery of Orsay virus from lawns of E. coli OP50 and P. lurida MYb11 within 30 minutes was decreased compared to a spike-in control suggesting recovery from each lawn is not equivalent. This complicates a comparison of viral stability and shedding rates between different bacteria, but our ability to recover substantial amounts of virus in the shedding assay from each Pseudomonas species precludes a substantial decrease in shedding rates as an explanation for the robust attenuation of Orsay virus observed in transmission assays.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, I really liked this manuscript, I do think there are areas for improvement though. 

      Some smaller things: 

      Line 84: "can be observed spreading from a single animal" -- this isn't really great wording because the virus itself can't be observed (at least not very easily) -- even infection is hard to see. 

      The wording in line 84-85 has now been adjusted to read “can spread from a single animal”.

      Fig 1C: which groups are statistically significantly different from each other? 

      Statistics have now been added to Figure 1C. 

      Line 154: not necessary to do for this paper, but this sentence made me curious whether the effect would have been seen with mixtures of bacteria (i.e. what if 50% were OP50 and 50% were Pseudomonas?) 

      This data has now been added in Line numbers 372-378, Figure 7A, and Supp. Fig. 12A and 12B.

      Line 262-264: I don't find this interesting at all for the reasons mentioned earlier about binomial data being the most variable in the middle. 

      These lines have been removed.

      Figure 4 B: The labels for the first two tick marks on the x-axis are switched I suspect. Otherwise, the controls did not behave as expected. 

      Figure 4B has been corrected.

      Line 288, 297 and several other places: "Orsay Virus" should be "Orsay virus". 

      We have corrected these instances.

      Supplemental Figure 2: Labels in the figure legend are B and C instead of A and B. 

      These labels have been adjusted for their placement within Figure 6.

      Line 411: I suspect this was supposed to be 13,200 xg rather than 13.2 xg. 

      This error has been corrected.

      Line 416-417: This sentence is very hard to interpret. More details are needed. This is the ID50 in which host strain? Is this averaged over all batches of virus? How variable are the batches? 

      This sentence (line number 114) has been amended to clarify that all ID50 values referred to here were calculated for ZD2611 populations in the presence of E. coli OP50. Further, Supplementary Figure 2 now shows all the ID50 values measured for each batch of virus used in this manuscript resulting in an average ID50 of 3.6.

      Lines 467-469: Why exclude these instead of counting them as zeros in the analysis? How many plates fit this description -- were there lots or only a few over the course of all experiments? 

      We have chosen to exclude these plates as these samples lost spreaders at some point during the course of the assay potentially skewing the eventual number of new infections counted depending on when the infected spreader animal crawled off the plate.  We have detailed the number of plates that fit this description in lines 559-562. 

      Line 476: A critical detail that is missing here is what number of worms were counted to score infection. Please say here or in the figure legends. 

      We have added the total number of worms counted and the minimum number counted per plate for each assay in the figure legends.

      Line 546: Why was only a single representative experiment shown? I'm asking for a justification, not necessarily for you to show all the data. 

      We chose to show a single representative experiment for two reasons:  We noted variability between susceptibility assays even when using the same batch of virus such that we could not combine experiments into a single plot as we did for transmission assays. Second, while we could normalize to a control within each experiment and expect to see similar relative differences across experiments, we believe this makes it more difficult to interpret the underlying data. For example, an increase in the infection rate of 80% compared to 10% within a population has only a single interpretation while a relative increase in the infection rate by 8x within a population could have several underlying meanings (e.g. 80% vs 10%, 64%vs 8%, 24% vs 3%). We have now included all experimental replicates in the supplementary material. 

      Reviewer #2 (Recommendations For The Authors):

      Minor concerns: 

      (1) Lines 86-87: "utilized a collection of bacteria isolated from the environment with wild C. elegans". The authors should provide more context on the source of these bacterial strains. 

      More references for the sources of these bacteria have been added to Supplementary Table 2.

      (2) The presentation of data in Fig 1 could be improved. The authors should include the text "pals-5p::GFP" on the images shown in Fig 1B. The red dashed line in Fig. 1D should intersect the dose-response curve at y = 0.5. The column heading for Fig 1E states "ID50 +/- SD (a.u.)", but should read "ID50 ratio" and should not have units. It also might be more intuitive to normalize the ID50 value for O. vermis to E. coli OP50. This way, having an ID50 ratio >1 indicates decreased transmission relative to E. coli, and ID50 ratio <1 indicates increased transmission relative to E. coli. To increase the transparency and rigor of 1E, the authors should plot the ratios from all 3 experimental replicates. The authors should also briefly explain why different viral doses were used in Fig 1D and 1F. 

      The text “pals-5p::GFP” has now been added to Figure 1B and throughout the text. The red dashed line in figure 1D has been corrected. Figure 1E has been adjusted to an actual figure as suggested and the y-axis label is “ID50 Ratio Compared to E. coli OP50”. The ID50 replicates have been plotted in Supplementary Figure 2. We have clarified that the doses used are the same. Briefly, the technical replicates of individual doses from Figure 1D and Supplementary Figure 3A and 3B were pooled and processed for FISH staining to provide each experimental replicate of Figure 1F. 

      (3) Line 110: The claim is that Ochrobactrum and P. lurida MYb11 reduce the variability of infection levels. However, another possibility is that there's simply less dynamic range in the assay because the infection levels have been compressed to 100% and 0% under these conditions. 

      This line has been removed.

      (4) There are discrepancies between what is shown in Fig 2C and what is described in the text. Lines 163-164: "P. aeruginosa PA01 and P. lurida MYb11 attenuated average infection to 33% and 62% of the population respectively". In Fig 2C, the mean for PA01 is ~25% whereas the mean for P. lurida appears to be less than 62%. 

      These values have been corrected.

      (5) Line 196: Provide more context for why rde-1 mutants were tested. This is the first time rde-1 is mentioned in the text (i.e. why show results in rde-1 mutants when the results are in Fig 2). 

      More context has been provided for why rde-1 mutants were tested (Line numbers 228-232). Briefly, using the rde-1 mutant, which has defective antiviral immunity and therefore supports higher viral replication levels than the wild-type (Félix et al. 2011), allows us to potentiate our infection assay in Figure 3B and 3C such that we maximize our chances of detecting viral replication in the presence of the Pseudomonas species, and especially P. aeruginiosa PA14, where fewer animals might be expected to get infected based upon Figure 2B and Supplementary Figure 5. 

      (6) Lines 228-229: "Mutations of any the regulators of the las, rhl, or pqs quorum sensing systems suppressed the attenuation of Orsay virus infection caused by the presence of wild-type P. aeruginosa PA01". Based on this description, PA01 should have a lower fraction of GFP positive relative to the quorum sensing mutants in Fig 4B. It seems that the x-axis labels OP50 and PA01 are swapped. 

      The x-axis labels of Figure 4B have been corrected. 

      (7) To improve clarity, for any figures that have data showing the "fraction of individuals GFP positive", the authors should include "pals-5p::GFP" in the y-axis title and legend. 

      The y-axis labels, legends, and text have been corrected throughout.  

      (8) To improve overall clarity and flow, the order in which the data is presented could be reordered. In particular, Fig. 6 could be better positioned instead of being the last figure, as no further characterization is performed on the mutants, and the findings are not conserved in strains that are more relevant to the C. elegans microbiota, such as P. lurida. The overall story could be strengthened if the authors ended the manuscript with more details related to the mechanism by which regulators of quorum sensing modulate the outcome of viral infection. 

      Figure 5 and Figure 6 have now been swapped.

      (9) Fig 5A: Make arrow sizes consistent across diagrams (i.e. the diagram for gacA deletion). 

      This figure (now Figure 6A) has been adjusted to make arrow sizes consistent across diagrams.  

      (10) Lines 280-282: "These data suggest that gacA has a conserved role across distant Pseudomonas species..." Here, the authors can provide more context on how well-conserved gacA is across Pseudomonas species (i.e. phylogenetic analysis of gacA sequences across different Pseudomonas species/strains). Furthermore, the data in Fig 5 does not provide strong enough support for the conclusion that gacA has a conserved role broadly across Pseudomonas species, as the authors only assess the effects of a gacA deletion in two species, P. aeruginosa and P. lurida. 

      We have adjusted lines 361-362 to “These data suggest that gacA has a conserved role between P. aeruginosa and P. lurida Myb11 in the attenuation of Orsay virus transmission and infection of C. elegans.” to reflect that we only assessed the effects of the gacA deletion in P. aeruginosa and P. lurida MYb11.

      (11) The manuscript can be strengthened by performing additional experiments to elucidate the mechanism by which Pseudomonas modulates viral infection. Does the attenuation of viral transmission and host susceptibility by P. lurida and P. aeruginosa require C. elegans to be in the presence of live bacteria? For example, the authors could measure viral transmission and susceptibility of C. elegans grown on heat-killed Pseudomonas. Additionally, it would be interesting to determine if modulation of viral infection is dependent on a secreted molecule. To assess this, the authors could perform viral infections in the context of Pseudomonas culture supernatant. 

      We added bacterial culture supernatant from each bacterium to lawns of E. coli OP50 to assess the effect on host susceptibility and did not observe any potent effect (Line numbers 311-318, Supplementary Figure 9). This supports an interpretation that attenuation is not mediated by a secreted molecule, however we cannot rule out that attenuation activity would become apparent if supernatant were provided at a higher concentration.

      We have found substantial challenges appropriately controlling live vs. heat-killed experiments particularly with the specifics of our susceptibility experiments. With regards to the underlying question of mechanism we believe that the genetic mutants (e.g. rhlR/gacA) are equally informative and that further comparison of these mutants’ interaction with the C. elegans host as compared to wild-type may be informative. 

      (12) The authors should include a discussion on the relative virulence potential of PA01, PA14, and P. lurida and the relationship between bacterial virulence potential and the outcome of viral infection. 

      We have also added data on mortality rates (Line numbers 183-200, Supplementary Figure 6). No significant mortality was observed within the 24-hour exposure period used for our Orsay infection and transmission assays. P. aeruginosa virulence is dependent upon temperature and as our assays are done at 20°C rather than 25°C this may account for reduced mortality compared to other published results. Regardless, we noted that O. vermis MYb71 killed C. elegans as quickly as P. aeruginosa PA14 under these conditions and these two bacteria led to the shortest lifespan compared to the other tested bacteria. Interestingly, P. lurida MYb11 was observed to be more virulent than P. aeruginosa PA01 under these conditions. These results suggest that there is no direct correlation between mortality and susceptibility to Orsay virus, although it does not rule out that virulence effects unique to each bacterium could contribute to alterations in host susceptibility.  

      (13) More information is needed on strains listed in Supplementary Table 2, particularly when there is no reference listed and the strain is "Gift of XXX lab". For example, the Troemel lab previously published about an Ochrobactrum strain in Troemel et al PLOS Biology 2008 PMID: 19071962 - is this the same strain? Please ensure that there is adequate information about each strain with as many published references as possible so that the work can be more easily reproduced. 

      We have added additional information and references to the strain table in Supplementary Table 2. The strain listed as Ochrobactrum sp. has been amended to Ochrobactrum BH3 as it is the strain described in Troemel et al. 2008.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript uses C. elegans as a model to interrogate the effects of autism-associated variants of previously unknown function in the RNA-binding protein RBM-26/RBM27.

      Despite its potential impact, there are several concerns related to the technical rigor and specificity of the observed effects.

      Major concerns: 1. The effects on PLM are interesting, but why was this neuron selected for study? Was this a lucky guess or are other axons also affected? It is important to clarify whether the effects of RBM-26 are specific to this neuron or act pleiotropically across many or all neurons. According to CeNGEN, rbm-26 is strongly expressed in the well-characterized neurons ASE, PVD, and HSN. Are there morphological defects in these neurons, or others? As a note, there are also functional assays for these neurons (salt sensing, touch response, and egg laying, respectively).

      We have added new data to the supplemental materials showing that loss of rbm-26 function also causes the beading phenotype in the axons and dendrites of the PVD neuron (Figure S4 and lines 196-199). We have focused on the PLM neuron because our preliminary studies indicated that it had a higher penetrance of axon defects relative to the PVD neuron. Moreover, we observed expression of endogenously tagged RBM-26 in the PLM neuron (Figure 3A-C and lines 210-215).

      Similarly, the choice of the MALSU homolog seemed like a shot in the dark. It is ranked 46th (out of 63 genes) for fold-enrichment following RBM-26 pull-down, and 9th for p-value. Were any of the mRNAs with greater fold-enrichment or smaller p-values examined further? It is important to determine whether many or all of these interacting genes are overexpressed in the absence of RBM-26 and whether they are also required for the phenotypic effects of RBM-26 mutants, or if the MALSU homolog is special.

      We have clarified our reasoning for selecting the MALS-1 ortholog of MALSU1 for further study (see lines 283-284 and Table S2). Amongst binding partners with human orthologs, MALS-1 was by far the top ranked candidate. The adjusted p-value for MALS-1 was 0.0008. The next smallest adjusted p-value was two orders of magnitude larger (0.028 for dpy-4). Moreover, the log2fold fold enrichment for MALS-1 was 1.98, about the same as the largest (ACADS with 2.13). Nonetheless, we agree that some of the other interactors may also be of interest and have thus included them in the supplemental table S2. Although these other potential binding partners are outside the scope of this study, we expect that future studies by ourselves or others may focus on the roles of these other binding partners.

      In addition to the specificity controls mentioned above, positive and negative controls are needed throughout the results. While each of these may be relatively minor by itself, as a group they raise questions about the technical rigor of the study. Briefly these include: Fig 1C. Missing loading controls and negative control (rbm-26 null allele). Additional exposures should be included to show whether RBM-26(P80L) protein or the lower band for RBM-26(L13V) are present at all, relative to the null allele.

      We have added no-stain loading controls to figure 1C. We have also switched to using ECL detection, which is much more sensitive and reveals faint bands for RBM-26(P80L) and additional faint bands for RBM-26(L13V). In addition, we have included a longer exposure for the blot (Figure S1). We are unable to test the null, as we can only produce a limited number of small maternally rescued progeny, thereby precluding western blot analysis.

      Fig 2. Controls to distinguish overextension of PLM axon from posterior mispositioning of ALM cell body are needed. Quantification of PLM axon lengths in microns (or normalized to body size) with standard deviation, not error of proportion, should be shown. Measurement of "beading phenotype" should be more rigorous, see for example the approach in Rawson et al. Curr. Biol. 2017 https://doi.org/10.1016/j.cub.2014.02.025 . The developmental stage examined, and the reason for choosing that stage, should be described for this and all figures.

      We have added new data that shows PLM axon length relative to body length for each of the RBM-26 mutants (Figure S2 and lines 183-185). These results indicate that the PLM axon has a larger axon length to body length ration, suggesting that the PLM/ALM overlap phenotype is a result of PLM axon overextension. For most experiments, we retain penetrance, as this has been standard practice in the field and allows for a much larger sample size (see examples listed below). We have also added examples of how the beading phenotype was measured (Figure S3). Moreover, we have now analyzed this phenotype and others at multiple developmental stages (Figures 2D-H and Table S1). In general, we have conducted experiments at the L3 stage because the rbm-26(null) mutants don't survive past this stage. However, for many of our experiments we have also included additional stages as well. We have added this explanation to the methods section of phenotype analysis and also at various locations throughout the text. We have also labeled all graphs to clearly indicate the developmental stages and included.

      10.1038/s41467-019-12804-3 Article by laboratory of Brock Grill

      10.1371/journal.pgen.1002513 Article by laboratory of Ian Chin-Sang

      doi.org/10.1073/pnas.1410263111 Article by laboratory of Chun-Liang Pan

      10.1016/j.neuron.2007.07.009 Article by laboratory of Yishi Jin

      doi.org/10.1523/JNEUROSCI.5536-07.2008 Article by laboratory of William Wadsworth

      Fig 3. Controls without auxin and with neuronal TIR1 expression alone should be included. Controls demonstrating successful RBM-26 depletion, in larvae as well as in embryos at the time of PLM extension, should be included (weak embryonic depletion might explain why the overextension phenotype is only 14% instead of 40% as in the null). According to CeNGEN, rbm-26 expression in PLM is barely detected, thus depletion with a PLM-specific TIR1 should also be tested. To confirm the authors' identification of the cell marked "N" as the PLM cell body, co-expression of rbm-26 and a PLM-specific marker should be added. Rescue of the rbm-26 mutants with neuronal (and PLM-only) expression should be included to test sufficiency in PLM, and as a further control for potential artifacts of the AID system.

      We have added new data showing that an endogenously tagged RBM-26::Scarlet protein is expressed in the PLM neuron (Figure 3A-C). Moreover, we have added rescue experiments, showing that a Pmec-7::rbm-26::scarlet transgene can rescue the beading phenotype and the PLM/ALM overlap phenotype (Figure 3 F-G). We have also added controls without auxin (Figure S7) __and without the rbm-26::scarlet::aid gene (Figure S8). We have added a new figure showing auxin-mediated depletion of RBM-26::Scarlet::AID in the PLM neuron (Figure S10)__. We examined auxin-mediated depletion at the L3 stage for consistency with our auxin-mediated phenotypic experiments. Moreover, these were done at the L3 stage for consistency with other experiments that included the rbm-26(null) mutants, which don't survive past this stage.

      In general, auxin-mediated knockdown tends to be hypomorphic in neurons. This is likely due to the fact that the neuronal TIR1 driver is expressed at much lower levels relative to the other drivers. In addition, the lower penetrance observed in auxin-mediated PLM/ALM overlap phenotype could reflect the fact that this phenotype resolves by the L4 stage in the hypomorphic mutants. For example, in P80L mutants at the L3 stage we see only about a 20% penetrance of the PLM/ALM overlap phenotype (relative to about 15% in auxin-mediated knockdown).

      Fig 4. More rigorous quantification of the distribution of mitochondria along the axon should be included, not only total number, and it should be clarified what region of the axon the images are taken from. Including the AID-depletion strain with and without auxin would further add to the sense of rigor. For the mitoTimer experiments, why is RBM-26(L13V) not included and why do wild-type values differ ~5-fold between experiments (despite error bars being almost non-existent)? A more rigorous approach to standardizing imaging conditions may be needed. Positive controls using compounds that affect oxidation should be included. Measurements of individual mitochondria with standard deviations should be shown, rather than aggregate averages with error of proportion.

      We have changed our methodology for measuring mitochondria, so that we now report the density of mitochondria in the axon (number per 100µm), (Figure 4E-F). We agree that this method is much better than counting the total number of mitochondria per axon, as it corrects for differences in body length and axon length). We also now include data for the whole axon (Figure 4E), proximal axon (Figure 4G), and distal axon (Figure 4H). These data suggest that the mitochondrial density defects occur in the proximal axon but not in the distal axon. Using the null allele, we have also examined the timing of mitochondria defects in the axon and report that the defects begin in the L1 stage and continue throughout larval development (Figure 4F). Individual datapoints have been added for all graphs in Figure 4.

      For the mitoTimer experiments (Figure 5), we have added data for L13V and have added the individual datapoints to the graph. In the prior version, the values did not differ 5-fold between experiments with the same stage, rather the different graphs were from different stages (as noted in the figure legends/main text) and the L4 stage has much more oxidation than the L2 stage. To clear this up, we have added labels to the graphs to indicate the stages for each experiment. We have also added new data, so that we now show results for the L2, L3, and L4 stages for all three rbm-26 mutants (see Figure 5C-E). We didn't test the L1 stage because the signal was not sufficient for accurate quantitation.

      Fig 5. Additional positive and negative controls should be added, including additional rbm-26 alleles, the AID-tagged strain with and without auxin, and a rescued mutant.

      The old Figure 5 has become Figure 6 in the new version. We have added the rbm-26(L13V) allele to each experiment, (Figure 6B-D). We have also added the loading controls for the western blot along with quantification for 3 biological replicates of the western blot analysis (Figure 6D). We agree that these additions significantly strengthen the data because they show that two independent alleles of rbm-26 cause very substantial increase in the expression of mals-1 at both the mRNA and protein levels. We did not do these experiments with the rescuing transgene or with the AID-tagged strain because these experiments are done on whole worm lysates, whereas the AID-tagged and rescuing transgene are neuron-specific.

      Fig 6. Controls showing whether the Scarlet-tagged protein is functional are needed, to rule out dominant negative or toxicity-related effects.

      This is Figure 7 in the new version. For this experiment, we are showing that overexpression of MALS-1 does cause defects. The idea is that excessive amounts of MALS-1 causes deleterious effects to the mitochondria. In fact, these defects could be considered as dominant negative or toxic. We considered the possibility of crossing the Pmec-7::mals-1::scarlet transgene with rbm-26; mals-1 double mutants. However, this does not seem workable, because the single copy Pmec-7::mals-1::scarlet transgene produces the phenotypes at penetrances that are similar to what we observe in rbm-26; mals-1 double mutants. We concede that the results of the overexpression experiments in Figure 7 are limited when considered in isolation. However, we think that they are meaningful when considered in combination with the results on the mals-1;rbm-26 double mutants in Figure 8.

      Fig 8. Controls for other mitochondrial components need to be included. It is important to determine if the decrease in ribosomes is specific or reflects a general decrease in mitochondria. If there are fewer mitochondria as suggested in Fig. 4, then of course mitochondrial ribosomal protein levels are also reduced. Additional rbm-26 alleles should be included here as well. Is this effect dependent on the MALSU homolog?

      This is Figure 8D-E in the new version. We have added new data showing that the decrease in MRPL-58 expression that is caused by the rbm-26(P80L) mutation is dependent on MALS-1. We concede that these experiments cannot be used to determine anything about the mitoribosomes per se, but rather serve as an alternative way of testing the effect of rbm-26 on mitochondria. We have revised the text accordingly (lines 355-357). Given these limitations we have elected not to try additional mitochondrial markers and have also not included additional rbm-26 alleles for this experiment.

      Finally the authors should address concerns about image manipulation, which amplify the concerns about technical rigor outlined above. The image in Fig. 2A appears to have a black box placed over the lower-right portion of the field to hide some features. Black boxes also appear to have been placed over the tops of images in Fig. 4B and 4D and at the left of Fig. 6A, 6B, and 6C. While these manipulations probably do not affect the conclusions, they further undermine confidence in data integrity and experimental rigor.

      We have corrected all of these image processing errors. The box in 2A was for the purpose of squaring off a corner that was clipped during image rotation. The boxes in Figures 4 and 6 (of the prior version) were added to give space for labels (without obscuring image features). We have now used alternative methods to accomplish the same goals. For example, in Figures 4-D we have placed the labels outside of the images.

      Minor points. 1. C. elegans nomenclature conventions should be followed: - C. elegans gene names have three or four letters, thus the MALSU homolog cannot be named "malsu-1". Please have new gene names approved by WormBase BEFORE submitting for publication http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/gene_name.cgi

      We have changed malsu-1 to mals-1. In addition, both mals-1 and mrpl-58 have now been approved by wormbase and will be listed on the website upon its next update.

      • If two sequential CRISPR edits are made on the same gene then they should be listed as a compound allele, such as rbm-26(cue22cue25)

      We have updated our gene names to reflect this convention.

      • Genes on the same chromosome should not be separated with a semicolon, for example rbm-26(cue40) K12H4.2(syb6330)

      We have updated our gene names to reflect this convention.

      Describing the defects as "neurodevelopmental" is misleading in the case of axon beading or degeneration. Similarly, there is no evidence for an "axon targeting" defect as stated in the abstract.

      We have revised such that instead of referring to degeneration phenotypes as neurodevelopmental, we now refer to axon degeneration phenotypes that occur during development. For example, in the abstract we now say, "These observations reveal a mechanism that regulates expression of a mitoribosomal assembly factor to protect against axon degeneration during neurodevelopment.

      Regarding targeting defects, this was meant to refer to the misplacement of the PLM axon tip (which contains electrical synapses). However, our subsequent analysis has revealed that these defects are transient in P80L and L13V mutants, as they resolve by the L4 stage. The rbm-26 null axon development defects do not resolve, though these mutant die prior to the L4 stage. Given these findings, we have decided not to use the term of targeting defects. Instead, we now refer to this as an axon tiling defect or PLM/ALM overlap phenotype.

      In Fig. 5A, the symbol that appears to correspond to F59C6.15 (lowest p-value) is a different size than the others and is colored as ncRNA, whereas WormBase annotates this gene as snoRNA.

      This error has been corrected.

      In the Introduction, the last sentences of the first two paragraphs should be varied ("However, little is known about the [...] mechanisms that protect [...] during neurodevelopment.")

      This has been done.

      Why is RBM-26 protein running as a doublet at both sizes?

      We have improved our western blotting methodology by using 12% gel, allowing for better resolution. We have also switched from colorimetric detection to ECL detection, allowing for greater sensitivity. In our new blots, we identify 6 different RBM-26 protein bands. We don't know the reason for these bands, but speculate that they are the result of post-translational processing (148-150).

      When showing the RBM-26 expression pattern (Fig. 3) please include a lower-magnification image of the entire animal.

      This has been done (Figure S6)

      It is confusing to refer to the RNA IP experiments as an "unbiased screen", which in C. elegans typically refers to a genetic screen.

      We now refer to this as a "biochemical screen".

      The relationship between axon overextension, beading, and mitochondrial localization is not clear. What causal connection between these is being proposed? The causal connections between these phenotypes, if any, should be clarified experimentally. For example, if the axon extension defects develop before mitochondrial localization defects, then it is unlikely that mitochondrial defects cause axon overextension.

      We have added new data showing that the reduction in mitochondrial density within the axon begins during the L1 stage and increases throughout larval development (Figure 4F). We have also added additional data showing that the increase in mitochondrial oxidation is weak in the L2 stage and surges in the L3 stage (Figure 5C-E), coincident with the beginning of the axon degeneration phenotypes. We propose (lines 383-391) that a low level of mitochondrial defects is present in L1 larvae, giving rise to the axon tiling defects. In the L3 stage there is a surge in excessive mitochondrial oxidation, giving rise to the axon degeneration phenotypes. We have added a new section to the discussion that addresses the relationship between defects in axon development and axon degeneration (lines 375-405).

      Please explain how to interpret the difference in axon beading in the two deletion alleles of the MALSU homolog (axon beading defects in tm12122 but not in syb6330). Is syb6330 not a null allele? Or are the defects in tm12122 due to other mutations in this strain background?

      One likely reason for this difference is that tm12122 is predicted to cause a partial deletion of the mals-1 coding sequence, whereas the syb6330 is a full deletion. Thus, the tm12122 could be acting as a dominant negative. In fact, prior work on the MALSU1 ortholog has indicated that this protein is subject to interference by a dominant negative construct (see Rorbach et al, Nucleic Acids Res 2012). Nonetheless, we cannot rule out the possibility of a linked second mutation in tm12122. However, since we have found similar phenotypes and genetic interactions with both alleles, we can conclude that these phenotypes and interactions are due to loss of MALS-1, rather than a second mutation.

      Are mitochondria reduced in number or mislocalized? If they are reduced in number, is this due to altered balance of fission/fusion?

      We have adjusted our methods for quantifying mitochondria and have also analyzed the proximal vs distal axon (Figure 4). We find that the density of mitochondria is decreased in the proximal axon, but not in the distal axon. We speculate that this might reflect a higher demand on mitochondria in the proximal axon, due to a higher amount of trafficking activity in the proximal axon (lines 255-257). We propose that the loss of RBM-26 causes dysfunction in mitochondria. Since fission and fusion are mechanisms that can help to repair damaged mitochondria, it is likely that they would be involved in the phenotypes that we observe.

      In Fig. 3A-D, please keep the labels in the same position in all panels and do not alter brightness settings between single-color and merged panels.

      These images have been moved to the supplemental data section (Figure S5). We have adjusted the labels as suggested. We have not changed the brightness settings, as they were already the same in all panels. However, the blue signal in the merged panel does obscure some of the red signal, giving an appearance of an alteration in color balance.

      The claim that rbm-26 acts cell-autonomously requires PLM-specific depletion and rescue experiments.

      We have added new data indicating that a Pmec-7::rbm-26::scarlet transgene can rescue the beading phenotype (Figure 3F-G).

      **Referees cross-commenting** I appreciate the use of the consultation session to resolve differences between reviewers, but in this case I fully agree with the content and tone of all the comments from the other reviewer -- I think our remarks are very well aligned!

      Reviewer #1 (Significance (Required)):

      The study engineers autism-associated variants in conserved residues of RBM27 into the C. elegans homolog RBM-26 and identifies neuronal phenotypes potentially relevant to autism and a potential molecular mechanism involving regulation of mitochondrial ribosome assembly.

      The key claims of the study are 1} that autism-associated variants in RBM-26 decrease its protein expression; 2} that impaired RBM-26 function leads to a variety of defects in development and maintenance of a single neuron called PLM, including altered axonal localization of mitochondria; 3} that RBM-26 normally binds the mRNA for the C. elegans homolog of MALSU, a mitochondrial ribosomal assembly factor; 4} that loss of RBM-26 leads to overexpression of the MALSU homolog; and 5} that MALSU is required for some of the deleterious effects on the PLM neuron seen in RBM-26 mutants.

      This study will be of interest to the autism research community because it bolsters the idea that variants in RBM27 are likely to disrupt gene function and to affect neuronal health. It will also be of interest to the broader cell biology community because it suggests an interesting potential nucleus-to-mitochondria signaling mechanism, in which a nuclear RNA-binding protein might regulate assembly of mitochondrial ribosomes.

      My field of expertise is developmental biology in C. elegans.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript, the authors studied an ASD-associated gene, rbm-26 in neuronal morphology using the touch receptor neuron PLM in C. elegans, and found that loss-of-function rbp-27 causes overextension and the formation of bulb-like structures in the axon. Using UV-crosslinking RNA immunoprecipitation and RNA-Seq, they identify malsu-1 as a target of rbm-26. Genetic analyses suggest malsu-1 likely functions downstream of rbm-26 in controlling the PLM morphology. Major comments:

      • The authors describe RBM27 is associated with ASD and ID while they only cite SFARI paper that describes a weak association of RBM27 to ASD. The appropriate referenced that show link between RBM27 and ID should be provided. The link with ID was an error. We had meant to say "ASD or other neurodevelopmental disorders." This has been corrected.

      • SFARI database only has three (P79L, R190Q, G348D) mutations listed as ASD-associated. Where are other mutations L13V and R455H, particularly L13V that the authors used to generate the C. elegans mutant come from? Are they associated with intellectual disabilities? The others came from the devovo-DB. We have added a reference for this database and have also added the primary source references for each of the five de novo variants (see line 121).

      • The authors should be very careful when describing 'gene X causes Y diseases'. Many (if not all) of the examples described in this manuscript are disease-associated genes without validation to be causal genes. We have revised accordingly. For example on lines 433-435, we now say," For example, mutations in the EXOSC3, EXOSC8 and EXOSC9 are thought to cause syndromes that include defects in brain development such as hypoplasia of the cerebellum and the corpus callosum". We have decided to use the phrase "thought to cause" because three of the five referenced articles on these genes use titles that indicate causation.

      • The authors refer PLM axon beading and overextension phenotypes to 'axon degeneration and targeting defects'. The authors must provide additional evidence of axon degeneration (see below). Also the term 'targeting defects' is misleading as the authors did not examine if overextension of the PLM axon causes targeting defects. At least they should examine some synaptic markers. To provide more evidence of degeneration we have analyzed several additional phenotypes at multiple developmental stages (Figure 2 and Table S1). Regarding targeting defects, this was meant to refer to the misplacement of the PLM axon tip (which contains electrical synapses). However, our subsequent analysis has revealed that these defects are transient in P80L and L13V mutants, as they resolve by the L4 stage. The rbm-26 null axon development defects do not resolve, though these mutant die prior to the L4 stage. Given these findings, we have decided not to use the term of targeting defects. Instead, we now refer to this as an axon tiling defect or PLM/ALM overlap phenotype.

      • Neuronal phenotypes (axon overextension and beading) should be examined at different developmental timepoints (larval, young adult, and aged animals) to test if these phenotypes are indeed degenerative instead of developmental defects. We have included new data to observe all of these phenotypes at multiple developmental time points (Figure 2 and Table S1).

      • The authors use the blebbing (beading) phenotype in the axon as the sole evidence of neurodegenerative properties of the PLM neuron. A more thorough analysis of this phenotype as done by others (Pan PNAS 2006) must be provided to support the authors' claim that this phenotype represents neurodegeneration. We have included new data on multiple degenerative phenotypes in axons including: blebbing, beading, waviness and breaks (Table S1).

      • The number of beads per axon should be quantified to better represent the severity of rbm-26 mutant. Individual samples should be plotted in the quantification instead of showing the percentage of animals. We have added data on the density of beads in rbm-26(null), rbm-26(P80L), and rbm-26(L13V) mutants (Figure S3). For most experiments we have decided to use penetrance to measure axon degeneration because this is a standard in the field and allows for a larger sample size. For examples please see:

      10.1523/JNEUROSCI.1494-11.2012 (Toth et al, 2012)

      https://doi.org/10.1016/j.cub.2014.02.025 (Rawson et al, 2014)

      10.1073/pnas.1011711108 (Pan et al, 2012)

      https://doi.org/10.7554/eLife.80856 (Czech et al, 2023)

      https://doi.org/10.1016/j.celrep.2016.01.050 (Nichols et al, 2016)

      • Based on the single gel image in Fig. 1C with no loading control, the P80L mutant appears to have no protein expression. How is the P80L viable while the null mutant is lethal? The authors should quantify the protein expression levels from multiple blots with proper loading controls. If P80L mutation is introduced into RBM-26::mScarlet strain can it cause depletion of the signal in vivo? We have added new data showing that the RBM-26::Scarlet signal is diminished by the P80L mutation in vivo (Figure 1E-F). We have also added quantification from 3 biological replicate blots (Figure 1D). Finally, we have improved the sensitivity of our blots by using ECL detection and also show various exposures to highlight the fainter bands (Figures 1C and S1). Therefore, we are now able to detect low level expression of RBM-26(P80L) mutant protein. It is likely that the low level of RBM-26(P80L) and RBM-26(L13V) seen on western blots is sufficient to prevent the lethal phenotype.

      • 'Moreover, loss of either the SPTBN1 or ADD1 genes causes a neurodevelopmental syndrome that includes autism and ADHD' References are missing, and as described above, be extra careful when indicating causality. Very few genes are known to cause ASD and ADHD. We have added the citations for this work (line 81). We also note that the titles for both of the cited articles indicate causation. To be on the safe side we have revised this line to say, "Moreover, loss of either the SPTBN1 or ADD1 genes are thought to cause a neurodevelopmental syndrome that includes autism and ADHD"

      • Fig. 3E F, the authors should use the strains that express TIR1 specifically in the touch receptor neurons to argue cell autonomous function of RBM-26. Alternatively, the authors may conduct PLM neuron-specific rescue experiments to test the sufficiency. We have added new data indicating that a Pmec-7::rbm-26::scarlet transgene can rescue the beading phenotype and the PLM/ALM overlap phenotype (see Figure 3F-G).

      • 'Loss of RBM-26 causes mitochondria dysfunction in axons.' The authors did not examine mitochondria function in axons. They only examined the number of mitochondria, and ROS production in the soma. The authors should provide additional evidence to support the idea that elevated ROS production in the soma is due to mitochondrial dysfunction in axons. Also, the authors should use both P80L and L13V for this experiment, and indicate individual datapoint as dots. Here, they quantified at the L4 stage, which the authors should justify. We have added the L13V data to this experiment and now show the individual data points. In addition, we have now conducted this analysis at the L2, L3 and L4 stages (Figure 5C-E). We have also revised the text to indicate that loss of rbm-26 function causes mitochondrial dysfunction in the cell body which could potentially cause a reduction of mitochondria in the axon (see lines 100-101 and 268-270). We speculate that mitochondria in the axon are also dysfunctional. However, the mitoTimer signal is not bright enough in axons to allow for quantification.

      • Figure 5B and C: the authors should also use L13V to quantify malsu-1 mRNA and protein level, and include quantifications in panel C (from multiple blots). This is Figure 6 in the new version. We have added new data for expression of mals-1 mRNA and protein in rbm-26(L13V) mutants (Figure 6B-D). We have also included quantifications from 3 biological replicates (Figure 6D).

      • In the rbm-26 mutant, the number of mitochondria is reduced, while the amount of MALSU-1 protein is increased. If MALSU-1 is specifically localized at mitochondria in wild type, where does the excessive MALSU-1 go in the rbm-26 mutants? Quantification of MALSU-1 signal intensity should be provided. Our Pmec-7::mals-1::scarlet transgene uses the tbb-2 3'UTR and causes an overexpression phenotype. To address the question posed by the reviewer, we would need to express MALS-1 at endogenous levels. Given that endogenous levels of MALS-1 are very low, it is unlikely that we would be able to visualize its expression. Nonetheless, as a way to address this question we have attempted to create a single copy Pmec-7::mals-1::scarlet transgene that utilizes the mals-1 endogenous 3'UTR. We have tried multiple approaches for generating this construct, but all have failed, likely due to sequence complexities within the mals-1 3'UTR. While we cannot say where the extra MALS-1 protein goes, we think that it is likely overloaded into the remaining mitochondria and could also be in the cytosol as well.

      • Figure 7C: malsu-1 knockout mutants exhibit PLM overextension phenotype, which is not consistent with their model. The authors should discuss this in detail. We have added a paragraph to the discussion explaining that mitochondria function could be disrupted by either MALS-1 overexpression or by MALS-1 loss of function (lines 471-480).

      • 'To validate these findings, we also repeated these experiments with an independent allele of malsu-1, malsu-1(tm12122) and found similar results (Fig. 7A-C).' The malsu-1(tm12122) exhibits beading phenotype and more severe overextension phenotype which the authors must describe and discuss more carefully. One likely reason for this difference is that tm12122 is predicted to cause a partial deletion of the mals-1 coding sequence, whereas the syb6330 is a full deletion. Thus, the tm12122 could be acting as a dominant negative. In fact, prior work on the MALSU1 ortholog has indicated that this protein is subject to interference by a dominant negative construct (see Rorbach et al, Nucleic Acids Res 2012). Nonetheless, we cannot rule out the possibility of a linked second mutation in tm12122. However, since we have found similar phenotypes and genetic interactions with both alleles, we can conclude that these phenotypes and interactions are due to loss of MALS-1, rather than a second mutation (albeit at a slightly different penetrance). We have added these considerations to the results section (lines 342-244).

      • Figure 8: The authors should include data from L13V, malsu-1 and rbm-26; malsu-1 mutants. Quantification from multiple blots should be provided. This is Figure 8D in the new version. We have added the malsu-1 and rbm-26;malsu-1 double mutants to this experiment. We have also added quantification from multiple biological replicate blots. As pointed out by the other reviewer, we think that this experiment does not give specific information about mitoribosomes, but is an alternative approach to looking at the reduction in mitochondria. Given this limitation and considering that we have added L13V data to the mitochondria experiment in Figure 8B, we have elected not to add additional data on L13V to the western blot experiment in Figure 8D

      Minor comments: • 'Consistent with a role for mitochondria in neurodevelopmental disorders, some of these disorders include a neurodegenerative phenotype.' Why is it consistent to have neurodegenerative phenotypes if mitochondria is associated with neurodevelopmental disorders? A better explanation would help.

      We have changed this sentence to, "Some neurodevelopmental syndromes feature neurodegenerative phenotypes that occur during neuronal development."

      • L13V is generally more severe in axon overextension phenotype than P80L while protein level is more abundant. The authors should discuss about this. We have also added a time course for the PLM/ALM overlap phenotype mutants (Figure 2D). This new data shows that the PLM/ALM overlap is quite similar overall between the P80L and L13V mutants. Both of these mutations cause an increase in PLM/ALM overlap in early larval development that is resolved by the L4 stage. The P80L phenotype resolves slightly sooner for reasons that are unknown. This could reflect differences in expression within the PLM that are not reflected in the whole worm lysate. This could also be due to a slight difference in the genetic background or other stochastic factors. The key point is that these two independent alleles cause similar phenotype overall, indicating that this phenotype is the result of loss in RBM-26 function.

      • Fig. 2E, F: 'Beading refers to focal enlargement or bubble-like lesions which were at least twice the diameter of the axon in size.' How are the diameters of axons measured? A more detailed quantification method, and examples of measurement should be provided. We have added example measurements to the supplemental section (Figure S3). Additional detail on the measurements are in the Methods section (lines 517-518).

      • Figure 3: The authors should also include low-magnification images to show where RBM-26 is expressed. The current image does now allow identifying cells. The transgene that labels the nuclei of hypodermis should be indicated in the manuscript. Specifically, the expression of the RBM-26 in the PLM should be shown. We have added a low magnification image (Figure S6) and have also added images of endogenously tagged RBM-26:Scarlet in the PLM (Figure 3A-C). The transgenic label for the hypodermis has been added to the legend of Figure S5.

      • Figure 3: 'Tissue specific degradation of RBM-26::SCARLET::AID was achieved due to cell-type specific TIR-1 driver lines (see methods for details).' This information is not provided in the method section. This information has been added to methods section, "Auxin proteindegredation"

      • Fig. 4 E. Values from individual samples should be indicated as dots. Representative images of P80L and L13V should be included. Conduct quantifications at adult stage as the authors use in other quantifications, or justify use of specific developmental stage (L3) they used. Figure 4 has become Figures 4 and 5 in the revised version. We have updated the graphs to include dots for individual data points. We have added quantifications of the mitoTImer experiments for the L2, L3 and L4 stages (Figure 5C-E). We note that our other experiments were done at the L1, L2, L3 and L4 and adult stages. The mitoTimer signal is not sufficient at the L1 stage for quantification. At the adult stage, the red signal becomes saturated. We have added representative images for mitoTimer in P80L and L13V mutants (Figure S9).

      • The genes malsu-1 and mrpl-58 are not listed on wormbase. If the authors would like to designate names to these gene, they should clearly indicate that along with the sequence name. We have changed malsu-1 to mals-1. In addition, both mals-1 and mrpl-58 have now been approved by wormbase and will be listed on the website upon its next update.

      • The authors found that MRPL-58 amount is reduced in rbm-26 mutants (which require additional verifications). This can be explained by the fact that axonal mitochondria number is reduced in the rbm-26 mutants. How did the authors confirm that the reduction in MRPL-58 level is due to the disruption of mitoribosome assembly? This is Figure 8D-E in the new version. We have added new data showing that the decrease in MRPL-58 expression that is caused by the rbm-26(P80L) mutation is dependent on MALS-1. We concede that these experiments cannot be used to determine anything about the mitoribosomes per se, but rather serve as an alternative way of testing the effect of rbm-26 on mitochondria. We have revised the text accordingly (lines 355-357).

      • 'MALSU-1 is a mitoribosomal assembly factor that functions as part of the MALSU1:LOR8F8:mtACP anti-association module [37-39].' I don't think these are known for C. elegans MALSU-1. We have revised to, "MALS-1 is an ortholog of the MALSU1 mitoribosomal assembly factor that functions as part of the MALSU1:LOR8F8:mtACP anti-association module"

      • 'Moreover, our results also suggest that disruption of this process can give rise to neurodevelopmental disorders.' I feel this is a quite a bit of stretch.

      This has been replaced with, "Therefore, we speculate that human RBM26/27 could function with the RNA exosome complex to protect against neurodevelopmental defects and axon degeneration in infants." (lines 371-373)

      **Referees cross-commenting** Yes, many of our comments overlap, and I fully agree with all comments from the other reviewer too. Reviewer #2 (Significance (Required)):

      I found the manuscript interesting particularly the use of innovative techniques in identifying the target of RBM-26, The genetic analyses of rbm-26 and malsu-1 generally support the authors main conclusions that rbm-26 inhibits malsu-1 and be of potential interest to basic neuroscientists and cell biologists. However, the current manuscript looked premature which made my reading experience less pleasant. The phenotypic analyses is superficial compared to works similar to this work, which are insufficient to support the authors' claim of 'axon degeneration and targeting defects'. A number of issues listed above should be addressed before this manuscript is published. The reviewer's expertise: neurodevelopment in model organisms.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      In this manuscript, the authors studied an ASD-associated gene, rbm-26 in neuronal morphology using the touch receptor neuron PLM in C. elegans, and found that loss-of-function rbp-27 causes overextension and the formation of bulb-like structures in the axon. Using UV-crosslinking RNA immunoprecipitation and RNA-Seq, they identify malsu-1 as a target of rbm-26. Genetic analyses suggest malsu-1 likely functions downstream of rbm-26 in controlling the PLM morphology.

      Major comments:

      • The authors describe RBM27 is associated with ASD and ID while they only cite SFARI paper that describes a weak association of RBM27 to ASD. The appropriate referenced that show link between RBM27 and ID should be provided.
      • SFARI database only has three (P79L, R190Q, G348D) mutations listed as ASD-associated. Where are other mutations L13V and R455H, particularly L13V that the authors used to generate the C. elegans mutant come from? Are they associated with intellectual disabilities?
      • The authors should be very careful when describing 'gene X causes Y diseases'. Many (if not all) of the examples described in this manuscript are disease-associated genes without validation to be causal genes.
      • The authors refer PLM axon beading and overextension phenotypes to 'axon degeneration and targeting defects'. The authors must provide additional evidence of axon degeneration (see below). Also the term 'targeting defects' is misleading as the authors did not examine if overextension of the PLM axon causes targeting defects. At least they should examine some synaptic markers.
      • Neuronal phenotypes (axon overextension and beading) should be examined at different developmental timepoints (larval, young adult, and aged animals) to test if these phenotypes are indeed degenerative instead of developmental defects.
      • The authors use the blebbing (beading) phenotype in the axon as the sole evidence of neurodegenerative properties of the PLM neuron. A more thorough analysis of this phenotype as done by others (Pan PNAS 2006) must be provided to support the authors' claim that this phenotype represents neurodegeneration.
      • The number of beads per axon should be quantified to better represent the severity of rbm-26 mutant. Individual samples should be plotted in the quantification instead of showing the percentage of animals.
      • Based on the single gel image in Fig. 1C with no loading control, the P80L mutant appears to have no protein expression. How is the P80L viable while the null mutant is lethal? The authors should quantify the protein expression levels from multiple blots with proper loading controls. If P80L mutation is introduced into RBM-26::mScarlet strain can it cause depletion of the signal in vivo?
      • 'Moreover, loss of either the SPTBN1 or ADD1 genes causes a neurodevelopmental syndrome that includes autism and ADHD' References are missing, and as described above, be extra careful when indicating causality. Very few genes are known to cause ASD and ADHD.
      • Fig. 3E F, the authors should use the strains that express TIR1 specifically in the touch receptor neurons to argue cell autonomous function of RBM-26. Alternatively, the authors may conduct PLM neuron-specific rescue experiments to test the sufficiency.
      • 'Loss of RBM-26 causes mitochondria dysfunction in axons.' The authors did not examine mitochondria function in axons. They only examined the number of mitochondria, and ROS production in the soma. The authors should provide additional evidence to support the idea that elevated ROS production in the soma is due to mitochondrial dysfunction in axons. Also, the authors should use both P80L and L13V for this experiment, and indicate individual datapoint as dots. Here, they quantified at the L4 stage, which the authors should justify.
      • Figure 5B and C: the authors should also use L13V to quantify malsu-1 mRNA and protein level, and include quantifications in panel C (from multiple blots).
      • In the rbm-26 mutant, the number of mitochondria is reduced, while the amount of MALSU-1 protein is increased. If MALSU-1 is specifically localized at mitochondria in wild type, where does the excessive MALSU-1 go in the rbm-26 mutants? Quantification of MALSU-1 signal intensity should be provided.
      • Figure 7C: malsu-1 knockout mutants exhibit PLM overextension phenotype, which is not consistent with their model. The authors should discuss this in detail.
      • 'To validate these findings, we also repeated these experiments with an independent allele of malsu-1, malsu-1(tm12122) and found similar results (Fig. 7A-C).' The malsu-1(tm12122) exhibits beading phenotype and more severe overextension phenotype which the authors must describe and discuss more carefully.
      • Figure 8: The authors should include data from L13V, malsu-1 and rbm-26; malsu-1 mutants. Quantification from multiple blots should be provided.

      Minor comments:

      • 'Consistent with a role for mitochondria in neurodevelopmental disorders, some of these disorders include a neurodegenerative phenotype.' Why is it consistent to have neurodegenerative phenotypes if mitochondria is associated with neurodevelopmental disorders? A better explanation would help.
      • L13V is generally more severe in axon overextension phenotype than P80L while protein level is more abundant. The authors should discuss about this.
      • Fig. 2E, F: 'Beading refers to focal enlargement or bubble-like lesions which were at least twice the diameter of the axon in size.' How are the diameters of axons measured? A more detailed quantification method, and examples of measurement should be provided.
      • Figure 3: The authors should also include low-magnification images to show where RBM-26 is expressed. The current image does now allow identifying cells. The transgene that labels the nuclei of hypodermis should be indicated in the manuscript. Specifically, the expression of the RBM-26 in the PLM should be shown.
      • Figure 3: 'Tissue specific degradation of RBM-26::SCARLET::AID was achieved due to cell-type specific TIR-1 driver lines (see methods for details).' This information is not provided in the method section.
      • Fig. 4 E. Values from individual samples should be indicated as dots. Representative images of P80L and L13V should be included. Conduct quantifications at adult stage as the authors use in other quantifications, or justify use of specific developmental stage (L3) they used.
      • The genes malsu-1 and mrpl-58 are not listed on wormbase. If the authors would like to designate names to these gene, they should clearly indicate that along with the sequence name.
      • The authors found that MRPL-58 amount is reduced in rbm-26 mutants (which require additional verifications). This can be explained by the fact that axonal mitochondria number is reduced in the rbm-26 mutants. How did the authors confirm that the reduction in MRPL-58 level is due to the disruption of mitoribosome assembly?
      • 'MALSU-1 is a mitoribosomal assembly factor that functions as part of the MALSU1:LOR8F8:mtACP anti-association module [37-39].' I don't think these are known for C. elegans MALSU-1.
      • 'Moreover, our results also suggest that disruption of this process can give rise to neurodevelopmental disorders.' I feel this is a quite a bit of stretch.

      Referees cross-commenting Yes, many of our comments overlap, and I fully agree with all comments from the other reviewer too.

      Significance

      I found the manuscript interesting particularly the use of innovative techniques in identifying the target of RBM-26, The genetic analyses of rbm-26 and malsu-1 generally support the authors main conclusions that rbm-26 inhibits malsu-1 and be of potential interest to basic neuroscientists and cell biologists. However, the current manuscript looked premature which made my reading experience less pleasant. The phenotypic analyses is superficial compared to works similar to this work, which are insufficient to support the authors' claim of 'axon degeneration and targeting defects'. A number of issues listed above should be addressed before this manuscript is published.

      The reviewer's expertise: neurodevelopment in model organisms.

    1. Author response

      The following is the authors’ response to the previous reviews

      eLife assessment 

      This work is an attempt to establish conditions that accurately and efficiently mimic a drought response in Arabidopsis grown on defined agar-solidified media - an admirable goal as a reliable experimental system is key to conducting successful low water potential experiments and would enable high-throughput genetic screening (and GWAS) to assess the impacts of environmental perturbations on various genetic backgrounds. The authors compare transcriptome patterns of plant subjected to water limitation imposed with different experimental systems. The work is valuable in that it lays out the challenges of such an endeavor and points out shortcomings of previous attempts. There was concern, however, that a purely gene expression-based approach may not provide sufficient physiologically relevant information about plant responses to drought, and therefore, despite improvements from a previous version, the new methodology championed by this work remains inadequate.   

      Molecular biologists who study drought stress must make choices about which assays to use in their investigation. Serious resources and effort are put into their endeavor, and choice of assay matters. Our manuscript’s goal was largely practical: to guide molecular biologists employing transcriptomics in their choice of drought stress assay, and thus help ensure their work will discover transcriptional signatures of importance, and not those that may be an artifact from lowering water potential using chemical agents on agar plates.  

      We examine how different approaches of reducing water potential impact the Arabidopsis root and shoot transcriptome. Our manuscript shows that each method of reducing water potential has a different effect on Arabidopsis root transcriptome responses. We acknowledge that drought stress induces a complex physiological response, and can vary depending on the method used. However, by comparing across assays, we find instances where a gene is downregulated by low water potential in one assay, and upregulated by low water potential in another assay. We feel it is only natural to question why this could be, and to hypothesize that it may be caused by secondary effects caused by the way low water potential is imposed.  We note that comparative transcriptomics has been a standard approach for decades. We take it as the reviewer’s opinion that it may not be insightful, but it does not factually impact our findings. 

      Reviewer #2 (Public Review): 

      This manuscript purports to develop a new system to study low water potential (drought) stress responses in agar plates. They make numerous problematic comparisons among transcriptome datasets, particularly to transcriptome data from a vermiculite drying experiment which they inappropriately present as representing an authentic "drought response" to the exclusion of all other data. For some reason, which the reviewer cannot fully understand, the authors seem intent on asserting the superiority of their experimental system to all others. They do not succeed in this and such an effort is ultimately a disservice to the field of drought research as a whole. 

      While they devote considerable effort in comparing transcriptome data among various experimental systems, the potentially more informative experiment at the end of the manuscript of testing growth responses of a number of Arabidopsis accessions is only done for their "LW" system. The focus of this manuscript on transcriptome data to the almost complete exclusion of other types of data which is a symptom of a broader over-emphasis on transcriptome that unfortunately is quite prevalent in plant science now. It is worth reminding that for protein coding genes, which constitute the vast majority of genes, transcriptome data is a proxy measurement. The really important thing is protein amount, and even more so protein activity/function, which we know has an imperfect, at best, correlation with transcript level. We measure transcriptomes because we can, not because it is inherently the most informative thing to do. The author's quixotic quest to see if the transcriptomes of different stress treatments match is of limited value and further diminished by their misleading presentation of one particular transcriptome data set (from their vermiculite drying experiments) as somehow a special data set that everything else must be evaluated against. This study sheds no new light on how to do relevant drought (low water potential) experiments in the lab. 

      Although the reviewer acknowledges that the authors have made some effort to respond to previous comments, the fundamental flaws remain and the present version of this study is little improved from the first submission. 

      One challenge faced by the drought community is establishing consensus regarding the definition of drought itself. According to the criteria followed by the reviewer, any method leading to a reduction in water potential qualifies as drought stress. However, the findings presented in this manuscript demonstrate that transcriptional responses in roots vary considerably across five different methods of reducing water potential. This indicates that beyond responding to a change in water potential itself, root transcriptomes will also respond to the specific way low water potential is introduced. We believe this variability is of interest to the drought research community. 

      Of the five methods we explore, we hold the view that the gene expression changes induced by vermiculite drying as the most analogous to the expression signatures Arabidopsis would exhibit in response to low water potential in the natural environment. In contrast, we posit that Arabidopsis grown on agar plates - where the root system is exposed to air and light, and where water potential is lowered using chemical agents - may contain gene expression signatures plant molecular biologists may not find particularly relevant. However, we acknowledge that this is our opinion, and will make this more explicit on our revised text. 

      More broadly, we believe that the reviewer’s observation regarding the ‘over-emphasis’ on transcriptomics that is prevalent within the plant science community justifies, rather than diminishes, the work presented here. If transcriptomics is a commonly employed method, then we anticipate that the outcomes of this study will hold value for a broad audience. Such researchers are likely not only using transcriptomics as a proxy measure for protein abundance, as the reviewer suggests, but also because it is one of the more straightforward genomic techniques biologists can use to identify candidate genes that may be chosen for further scrutiny. 

      Reviewer #3 (Public Review): 

      Comments on revised version: 

      Specific previous criticisms that were addressed are: 

      (1) that gene expression changes were only compared between the highest dose of each stress assay. In the revised version, the authors changed their framework and are now using linear modelling to detect genes that display a dose response to each specific treatment. I agree that this might be a more robust approach to selecting genes that are specific to a certain treatment. 

      (2) that concentrations of PEG, mannitol, NaCl, and the "low water" agar which were chosen are not comparable in regards to their specific osmotic component. I appreciate that the authors measured the osmotic potential of each treatment. It revealed that both PEG and NaCl at their highest concentration had a much more negative osmotic potential compared to the other treatment. The authors claim that using ANCOVA they did not detect any significant differences between the treatments (lines 113, 114). I do believe that ANCOVA is not the appropriate test in this case. ANCOVA has an assumption of linearity, while the dose response between concentration and osmotic potential is non-linear. This is particularly evident for PEG (Steuter AA. Water potential of aqueous polyethylene glycol. Plant Physiol. 1981 Jan;67(1):64-7. doi: 10.1104/pp.67.1.64.). Since the treatments are not the same at the highest level, I think this could have effects on the validity of comparisons by linear model. One approach could be to remove the treatment level with the highest concentration and compare the results or adjust the treatments to the same osmolarity. 

      (3) that only two biological replicates were collected for RNA sequencing which makes it impossible to know how much variance exists between samples. The authors added a third replicate in the revised version for most treatments. However, some treatments still have only two replicates, which cannot be easily seen from the text or the figure. I would prefer that those differences are pointed out. 

      (4) that the original manuscript did not explore what effect the increase of agar and nutrient concentration in the "low water" agar had on water potentials. The authors conducted additional experiments showing that changes in water potential were exclusively caused by changes in the nutrient concentration (Figure 2-figure supplement 5; lines 222-224). However, the increase in agar strength had also some effect on gene expression. While this is not further discussed in the text, I believe this effect of agar on gene expression could be similar to root responses to soil compaction. 

      (5) That the lower volume of media in the "low water" agar could have an effect on plants. The authors compared these effects in Figure 2-figure supplement 7. They claim that "different volumes of LW agar media do not play a significant part in modulating gene expression". While I can see that they detected 313 overlapping DEGs, there were still 146 and 412 non-overlapping DEGs. The heatmap in subpanel E also shows that there were differences in particular in the up-regulated genes. My conclusion would be that the change in volume does play a role and this should be a consideration in the manuscript. 

      We thank the reviewer for their suggestions. We plan to resubmit the manuscript reflecting the requested changes. Specifically, we will: 

      -       We will detail more thoroughly the effects of agar volume on gene expression changes elicited by LW agar treatment. 

      -       We will investigate whether the tensile stress introduced by hard agar is similar to soil compaction by an analysis with existing literature. 

      -       Assess more rigorously the suitability of the ANCOVA model for assessing water potential changes of different media types.

    1. Author response:

      Reviewer #1 (Public Review):

      How does the brain respond to the input of different complexity, and does this ability to respond change with age?

      The study by Lalwani et al. tried to address this question by pulling together a number of neuroscientific methodologies (fMRI, MRS, drug challenge, perceptual psychophysics). A major strength of the paper is that it is backed up by robust sample sizes and careful choices in data analysis, translating into a more rigorous understanding of the sensory input as well as the neural metric. The authors apply a novel analysis method developed in human resting-state MRI data on task-based data in the visual cortex, specifically investigating the variability of neural response to stimuli of different levels of visual complexity. A subset of participants took part in a placebo-controlled drug challenge and functional neuroimaging. This experiment showed that increases in GABA have differential effects on participants with different baseline levels of GABA in the visual cortex, possibly modulating the perceptual performance in those with lower baseline GABA. A caveat is that no single cohort has taken part in all study elements, ie visual discrimination with drug challenge and neuroimaging. Hence the causal relationship is limited to the neural variability measure and does not extend to visual performance. Nevertheless, the consistent use of visual stimuli across approaches permits an exceptionally high level of comparability across (computational, behavioural, and fMRI are drawing from the same set of images) modalities. The conclusions that can be made on such a coherent data set are strong.

      The community will benefit from the technical advances, esp. the calculation of BOLD variability, in the study when described appropriately, encouraging further linkage between complementary measures of brain activity, neurochemistry, and signal processing.

      Thank you for your review. We agree that a future study with a single cohort would be an excellent follow-up.

      Reviewer #2 (Public Review):

      Lalwani et al. measured BOLD variability during the viewing of houses and faces in groups of young and old healthy adults and measured ventrovisual cortex GABA+ at rest using MR spectroscopy. The influence of the GABA-A agonist lorazepam on BOLD variability during task performance was also assessed, and baseline GABA+ levels were considered as a mediating variable. The relationship of local GABA to changes in variability in BOLD signal, and how both properties change with age, are important and interesting questions. The authors feature the following results: 1) younger adults exhibit greater task-dependent changes in BOLD variability and higher resting visual cortical GABA+ content than older adults, 2) greater BOLD variability scales with GABA+ levels across the combined age groups, 3) administration of a GABA-A agonist increased condition differences in BOLD variability in individuals with lower baseline GABA+ levels but decreased condition differences in BOLD variability in individuals with higher baseline GABA+ levels, and 4) resting GABA+ levels correlated with a measure of visual sensory ability derived from a set of discrimination tasks that incorporated a variety of stimulus categories.

      Strengths of the study design include the pharmacological manipulation for gauging a possible causal relationship between GABA activity and task-related adjustments in BOLD variability. The consideration of baseline GABA+ levels for interpreting this relationship is particularly valuable. The assessment of feature-richness across multiple visual stimulus categories provided support for the use of a single visual sensory factor score to examine individual differences in behavioral performance relative to age, GABA, and BOLD measurements.

      Weaknesses of the study include the absence of an interpretation of the physiological mechanisms that contribute to variability in BOLD signal, particularly for the chosen contrast that compared viewing houses with viewing faces.

      Whether any of the observed effects can be explained by patterns in mean BOLD signal, independent of variability would be useful to know.

      One of the first pre-processing steps of computing SDBOLD involves subtracting the block-mean from the fMRI signal for each task-condition. Therefore, patterns observed in BOLD signal variability are not driven by the mean-BOLD differences. Moreover, as noted above, to further confirm this, we performed additional mean-BOLD based analysis (See Supplementary Materials Pg 3). Results suggest that ∆⃗ MEANBOLD is actually larger in older adults vs. younger adults (∆⃗ SDBOLD exhibited the opposite pattern), but more importantly ∆⃗ MEANBOLD is not correlated with GABA or with visual performance. This is also consistent with prior research (Garrett et.al. 2011, 2013, 2015, 2020) that found MEANBOLD to be relatively insensitive to behavioral performance.

      The positive correlation between resting GABA+ levels and the task-condition effect on BOLD variability reaches significance at the total group level, when the young and old groups are combined, but not separately within each group. This correlation may be explained by age-related differences since younger adults had higher values than older adults for both types of measurements. This is not to suggest that the relationship is not meaningful or interesting, but that it may be conceptualized differently than presented.

      Thank you for this important point. The relationship between GABA and ∆⃗ SDBOLD shown in Figure 3 is also significant within each age-group separately (Line 386-388). The model used both age-group and GABA as predictors of ∆⃗ SDBOLD and found that both had a significant effect, while the Age-group x GABA interaction was not significant. The effect of age on ∆⃗ SDBOLD therefore does not completely explain the observed relationship between GABA and ∆⃗ SDBOLD because this latter effect is significant in both age-groups individually and in the whole sample even when variance explained by age is accounted for. The revision clarifies this important point (Ln 488-492). Thanks for raising it.

      Two separate dosages of lorazepam were used across individuals, but the details of why and how this was done are not provided, and the possible effects of the dose are not considered.

      Good point. We utilized two dosages to maximize our chances of finding a dosage that had a robust effect. The specific dosage was randomly assigned across participants and the dosage did not differ across age-groups or baseline GABA levels. We also controlled for the drug-dosage when examining the role of drug-related shift in ∆⃗ SDBOLD. We have clarified these points in the revision and highlighted the analysis that found no effect of dosage on drug-related shift in ∆⃗ SDBOLD (Line 407-418).

      The observation of greater BOLD variability during the viewing of houses than faces may be specific to these two behavioral conditions, and lingering questions about whether these effects generalize to other types of visual stimuli, or other non-visual behaviors, in old and young adults, limit the generalizability of the immediate findings.

      We agree that examining the factors that influence BOLD variability is an important topic for future research. In particular, although it is increasingly well known that variability modulation itself can occur in a host of different tasks and research contexts across the lifespan (see Garrett et al., 2013 Waschke et al., 2021), to address the question of whether variability modulation occurs directly in response to stimulus complexity in general, it will be important for future work to examine a range of stimulus categories beyond faces and houses. Doing so is indeed an active area of research in Dr. Garrett’s group, where visual stimuli from many different categories are examined (e.g., for a recent approach, see Waschke et.al.,2023 (biorxiv)). Regardless, only face and house stimuli were available in the current dataset. We therefore exploited the finding that BOLD variability tends to be larger for house stimuli than for face stimuli (in line with the HMAX model output) to demonstrate that the degree to which a given individual modulates BOLD variability in response to stimulus category is related to their age, to GABA levels, and to behavioral performance.

      The observed age-related differences in patterns of BOLD activity and ventrovisual cortex GABA+ levels along with the investigation of GABA-agonist effects in the context of baseline GABA+ levels are particularly valuable to the field, and merit follow-up. Assessing background neurochemical levels is generally important for understanding individualized drug effects. Therefore, the data are particularly useful in the fields of aging, neuroimaging, and vision research.

      Thank you, we agree!

      Reviewer #3 (Public Review):

      The role of neural variability in various cognitive functions is one of the focal contentions in systems and computational neuroscience. In this study, the authors used a largescale cohort dataset to investigate the relationship between neural variability measured by fMRI and several factors, including stimulus complexity, GABA levels, aging, and visual performance. Such investigations are valuable because neural variability, as an important topic, is by far mostly studied within animal neurophysiology. There is little evidence in humans. Also, the conclusions are built on a large-scale cohort dataset that includes multi-model data. Such a dataset per se is a big advantage. Pharmacological manipulations and MRS acquisitions are rare in this line of research. Overall, I think this study is well-designed, and the manuscript reads well. I listed my comments below and hope my suggestions can further improve the paper.

      Strength:

      1). The study design is astonishingly rich. The authors used task-based fMRI, MRS technique, population contrast (aging vs. control), and psychophysical testing. I appreciate the motivation and efforts for collecting such a rich dataset.

      2) The MRS part is good. I am not an expert in MRS so cannot comment on MRS data acquisition and analyses. But I think linking neural variability to GABA in humans is in general a good idea. There has been a long interest in the cause of neural variability, and inhibition of local neural circuits has been hypothesized as one of the key factors. 3. The pharmacological manipulation is particularly interesting as it provides at least evidence for the causal effects of GABA and deltaSDBOLD. I think this is quite novel.

      Weakness:

      1) I am concerned about the definition of neural variability. In electrophysiological studies, neural variability can be defined as Poisson-like spike count variability. In the fMRI world, however, there is no consensus on what neural variability is. There are at least three definitions. One is the variability (e.g., std) of the voxel response time series as used here and in the resting fMRI world. The second is to regress out the stimulusevoked activation and only calculate the std of residuals (e.g., background variability). The third is to calculate variability of trial-by-trial variability of beta estimates of general linear modeling. It currently remains unclear the relations between these three types of variability with other factors. It also remains unclear the links between neuronal variability and voxel variability. I don't think the computational principles discovered in neuronal variability also apply to voxel responses. I hope the authors can acknowledge their differences and discuss their differences.

      These are very important points, thank you for raising them. Although we agree that the majority of the single cell electrophysiology world indeed seems to prefer Poisson-like spiking variability as an easy and tractable estimate, it is certainly not the only variability approach in that field (e.g., entropy; see our most recent work in humans where spiking entropy outperforms simple spike counts to predict memory performance; Waschke et al., 2023, bioRxiv). In LFP, EEG/MEG and fMRI, there is indeed no singular consensus on what variability “is”, and in our opinion, that is a good thing. We have reported at length in past work about entire families of measures of signal variability, from simple variance, to power, to entropy, and beyond (see Table 1 in Waschke et al, 2021, Neuron). In principle, these measures are quite complementary, obviating the need to establish any single-measure consensus per se. Rather than viewing the three measures of neural variability that the reviewer mentioned as competing definitions, we prefer to view them as different sources of variance. For example, from each of the three sources of variance the reviewer suggests, any number of variability measures could be computed.

      The current study focuses on using the standard deviation of concatenated blocked time series separately for face and house viewing conditions (this is the same estimation approach used in our very earliest studies on signal variability; Garrett et al., 2010, JNeurosci). In those early studies, and nearly every one thereafter (see Waschke et al., 2021, Neuron), there is no ostensible link between SDBOLD (as we normaly compute it) and average BOLD from either multivariate or GLM models; as such, we do not find any clear difference in SDBOLD results whether or not average “evoked” responses are removed or not in past work. This is perhaps also why removing ERPs from EEG time series rarely influences estimates of variability in our work (e.g., Kloosterman et al., 2020, eLife).

      The third definition the reviewer notes refers to variability of beta estimates over trials. Our most recent work has done exactly this (e.g., Skowron et al., 2023, bioRxiv), calculating the SD even over single time point-wise beta estimates so that we may better control the extraction of time points prior to variability estimation. Although direct comparisons have not yet been published by us, variability over single TR beta estimates and variability over the time series without beta estimation are very highly correlated in our work (in the .80 range; e.g., Kloosterman et al., in prep).

      Re: the reviewer’s point that “It also remains unclear the links between neuronal variability and voxel variability. I don’t think the computational principles discovered in neuronal variability also apply to voxel responses. I hope the authors can acknowledge their differences and discuss their differences.” If we understand correctly, the reviewer maybe asking about within-person links between single-cell neuronal variability (to allow Poisson-like spiking variability) and voxel variability in fMRI? No such study has been conducted to date to our knowledge (such data almost don’t exist). Or rather, perhaps the reviewer is noting a more general point regarding the “computational principles” of variability in these different domains? If that is true, then a few points are worth noting. First, there is absolutely no expectation of Poisson distributions in continuous brain imaging-based time series (LFP, E/MEG, fMRI). To our knowledge, such distributions (which have equivalent means and variances, allowing e.g., Fano factors to be estimated) are mathematically possible in spiking because of the binary nature of spikes; when mean rates rise, so too do variances given that activity pushes away from the floor (of no activity). In continuous time signals, there is no effective “zero”, so a mathematical floor does not exist outright. This is likely why means and variances are not well coupled in continuous time signals (see Garrett et al., 2013, NBR; Waschke et al., 2021, Neuron); anything can happen. Regardless, convergence is beginning to be revealed between the effects noted from spiking and continuous time estimates of variability. For example, we show that spiking variability can show a similar, behaviourally relevant coupling to the complexity of visual input (Waschke et al., 2023, bioRxiv) as seen in the current study and in past work (e.g., Garrett et al., 2020, NeuroImage). Whether such convergence reflects common computational principles of variability remains to be seen in future work, despite known associations between single cell recordings and BOLD overall (e.g., Logothetis and colleagues, 2001, 2002, 2004, 2008).

      Given the intricacies of these arguments, we don’t currently include this discussion in the revised text. However, we would be happy to include aspects of this content in the main paper if the reviewer sees fit.

      2) If I understand it correctly, the positive relationship between stimulus complexity and voxel variability has been found in the author's previous work. Thus, the claims in the abstract in lines 14-15, and section 1 in results are exaggerated. The results simply replicate the findings in the previous work. This should be clearly stated.

      Good point. Since this finding was a replication and an extension, we reported these results mostly in the supplementary materials. The stimulus set used for the current study is different than Garrett et.al. 2020 and therefore a replication is important. Moreover, we have extended these findings across young and older adults (previous work was based on older adults alone). We have modified the text to clarify what is a replication and what part are extension/novel about the current study now (Line 14, 345 and 467). Thanks for the suggestion.

      3) It is difficult for me to comprehend the U-shaped account of baseline GABA and shift in deltaSDBOLD. If deltaSDBOLD per se is good, as evidenced by the positive relationship between brainscore and visual sensitivity as shown in Fig. 5b and the discussion in lines 432-440, why the brain should decrease deltaSDBOLD ?? or did I miss something? I understand that "average is good, outliers are bad". But a more detailed theory is needed to account for such effects.

      When GABA levels are increased beyond optimal levels, neuronal firing rates are reduced, effectively dampening neural activity and limiting dynamic range; in the present study, this resulted in reduced ∆⃗ SDBOLD. Thus, the observed drug-related decrease in ∆⃗ SDBOLD was most present in participants with already high levels of GABA. We have now added an explanation for the expected inverted-U (Line 523-546). The following figure tries to explain this with a hypothetical curve diagram and how different parts of Fig 4 might be linked to different points in such a curve.

      Author response image 1.

      Line 523-546 – “We found in humans that the drug-related shift in ∆⃗ SDBOLD could be either positive or negative, while being negatively related to baseline GABA. Thus, boosting GABA activity with drug during visual processing in participants with lower baseline GABA levels and low levels of ∆⃗ SDBOLD resulted in an increase in ∆⃗ SDBOLD (i.e., a positive change in ∆⃗ SDBOLD on drug compared to off drug). However, in participants with higher baseline GABA levels and higher ∆⃗ SDBOLD, when GABA was increased presumably beyond optimal levels, participants experienced no-change or even a decrease in∆⃗ SDBOLD on drug. These findings thus provide the first evidence in humans for an inverted-U account of how GABA may link to variability modulation.

      Boosting low GABA levels in older adults helps increase ∆⃗ SDBOLD, but why does increasing GABA levels lead to reduced ∆⃗ SDBOLD in others? One explanation is that higher than optimal levels of inhibition in a neuronal system can lead to dampening of the entire network. The reduced neuronal firing decreases the number of states the network can visit and decreases the dynamic range of the network. Indeed, some anesthetics work by increasing GABA activity (for example propofol a general anesthetic modulates activity at GABAA receptors) and GABA is known for its sedative properties. Previous research showed that propofol leads to a steeper power spectral slope (a measure of the “construction” of signal variance) in monkey ECoG recordings (Gao et al., 2017). Networks function optimally only when dynamics are stabilized by sufficient inhibition. Thus, there is an inverted-U relationship between ∆⃗ SDBOLD and GABA that is similar to that observed with other neurotransmitters.”

      4) Related to the 3rd question, can you show the relationship between the shift of deltaSDBOLD (i.e., the delta of deltaSDBOLD) and visual performance?

      We did not have data on visual performance from the same participants that completed the drug-based part of the study (Subset1 vs 3; see Figure 1); therefore, we unfortunately cannot directly investigate the relationship between the drug-related shift of ∆⃗ SDBOLD and visual performance. We have now highlighted that this as a limitation of the current study (Line 589-592), where we state: One limitation of the current study is that participants who received the drug-manipulation did not complete the visual discrimination task, thus we could not directly assess how the drug-related change in ∆⃗ SDBOLD impacted visual performance.

      5) Are the dataset openly available?? I didn't find the data availability statement.

      An excel-sheet with all the processed data to reproduce figures and results has been included in source data submitted along with the manuscript along with a data dictionary key for various columns. The raw MRI, MRS and fMRI data used in the current manuscript was collected as a part of a larger (MIND) study and will eventually be made publicly available on completion of the study (around 2027). Before that time, the raw data can be obtained for research purposes upon reasonable request. Processing code will be made available on GitHub.

    1. Author response:

      Reviewer #1 (Public Review):

      Reviewer #1, comment #1: The study is thorough and systematic, and in comparing three well-separated hypotheses about the mechanism leading from grid cells to hexasymmetry it takes a neutral stand above the fray which is to be particularly appreciated. Further, alternative models are considered for the most important additional factor, the type of trajectory taken by the agent whose neural activity is being recorded. Different sets of values, including both "ideal" and "realistic" ones, are considered for the parameters most relevant to each hypothesis. Each of the three hypotheses is found to be viable under some conditions, and less so in others. Having thus given a fair chance to each hypothesis, nevertheless, the study reaches the clear conclusion that the first one, based on conjunctive grid-by-head-direction cells, is much more plausible overall; the hypothesis based on firing rate adaptation has intermediate but rather weak plausibility; and the one based on clustering of cells with similar spatial phases in practice would not really work. I find this conclusion convincing, and the procedure to reach it, a fair comparison, to be the major strength of the study.

      Response: Thanks for your positive assessment of our manuscript.

      Reviewer #1, comment #2: What I find less convincing is the implicit a priori discarding of a fourth hypothesis, that is, that the hexasymmetry is unrelated to the presence of grid cells. Full disclosure: we have tried unsuccessfully to detect hexasymmetry in the EEG signal from vowel space and did not find any (Kaya, Soltanipour and Treves, 2020), so I may be ranting off my disappointment, here. I feel, however, that this fourth hypothesis should be at least aired, for a number of reasons. One is that a hexasymmetry signal has been reported also from several other cortical areas, beyond entorhinal cortex (Constantinescu et al, 2016); true, also grid cells in rodents have been reported in other cortical areas as well (Long and Zhang, 2021; Long et al, bioRxiv, 2021), but the exact phenomenology remains to be confirmed.

      Response: Thank you for the suggestion to add the hypothesis that the neural hexasymmetry observed in previous fMRI and intracranial EEG studies may be unrelated to grid cells. Following your suggestion, we have now mentioned at the end of the fourth paragraph of the Introduction that “the conjunctive grid by head-direction cell hypothesis does not necessarily depend on an alignment between the preferred head directions with the grid axes”. Furthermore, at the end of section “Potential mechanisms underlying hexadirectional population signals in the entorhinal cortex” (in the Discussion) we write: “However, none of the three hypotheses described here may be true and another mechanism may explain macroscopic grid-like representations. This includes the possibility that neural hexasymmetry is completely unrelated to grid-cell activity, previously summarized as the ‘independence hypothesis' (Kunz et al., 2019). For example, a population of head-direction cells whose preferred head directions occur at offsets of 60 degrees from each other could result in neural hexasymmetry in the absence of grid cells. The conjunctive grid by head-direction cell hypothesis thus also works without grid cells, which may explain why grid-like representations have been observed (using fMRI) in regions outside the entorhinal cortex, where rodent studies have not yet identified grid cells (Doeller et al., 2010; Constantinescu et al., 2016). In that case, however, another mechanism would be needed that could explain why the preferred head directions of different head-direction cells occur at multiples of 60 degrees. Attractor-network structures may be involved in such a mechanism, but this remains speculative at the current stage.” We now also mention the results from Long and Zhang (second paragraph of the Introduction): “Surprisingly, grid cells have also been observed in the primary somatosensory cortex in foraging rats (Long and Zhang, 2021).”

      Regarding your EEG study, we have added a reference to it in the manuscript and state that it is an example for a study that did not find evidence for neural hexasymmetry (end of first paragraph of the Discussion): “We note though that some studies did not find evidence for neural hexasymmetry. For example, a surface EEG study with participants “navigating” through an abstract vowel space did not observe hexasymmetry in the EEG signal as a function of the participants’ movement direction through vowel space (Kaya et al., 2020). Another fMRI study did not find evidence for grid-like representations in the ventromedial prefrontal cortex while participants performed value-based decision making (Lee et al., 2021). This raises the question whether the detection of macroscopic grid-like representations is limited to some recording techniques (e.g., fMRI and iEEG but not surface EEG) and to what extent they are present in different tasks.”

      Reviewer #1, comment #3: Second, as the authors note, the conjunctive mechanism is based on the tight coupling of a narrow head direction selectivity to one of the grid axes. They compare "ideal" with "Doeller" parameters, but to me the "Doeller" ones appear rather narrower than commonly observed and, crucially, they are applied to all cells in the simulations, whereas in reality only a proportion of cells in mEC are reported to be grid cells, only a proportion of them to be conjunctive, and only some of these to be narrowly conjunctive. Further, Gerlei et al (2020) find that conjunctive grid cells may have each of their fields modulated by different head directions, a truly surprising phenomenon that, if extensive, seems to me to cast doubts on the relation between mass activity hexasymmetry and single grid cells.

      Response: We have revised the manuscript in several ways to address the different aspects of this comment.

      Firstly, we agree with the reviewer that our “Doeller” parameter for the tuning width is narrower than commonly observed. We have therefore reevaluated the concentration parameter κ_c in the ‘realistic’ case from 10 rad-2 (corresponding to a tuning width of 18o) to 4 rad-2 (corresponding to a tuning width of 29o). We chose this value by referring to Supplementary Figure 3 of Doeller et al. (2010). In their figure, the tuning curves usually cover between one sixth and one third of a circle. Since stronger head-direction tuning contributes the most to the resulting hexasymmetry, we chose a value of κ_c=4 for the tuning parameter, which corresponds to a tuning width (= half width) of 29o (full width of roughly one sixth of a circle). Regarding the coupling of the preferred head directions to the grid axes, the specific value of the jitter σc = 3 degrees that quantifies the coupling of the head-direction preference to the grid axes was extracted from the 95% confidence interval given in the third row of the Table in Supplementary Figure 5b of Doeller et al. 2010. We now better explain the origin of these values in our new Methods section “Parameter estimation” and provide an overview of all parameter values in Table 1.

      Furthermore, in response to your comment, we have revised Figure 2E to show neural hexasymmetries for a larger range of values of the jitter (σc from 0 to 30 degrees), going way beyond the values that Doeller et al. suggested. We have also added a new supplementary figure (Figure 2 – figure supplement 1) where we further extend the range of tuning widths (parameter κ_c) to 60 degrees. This provides the reader with a comprehensive understanding of what parameter values are needed to reach a particular hexasymmetry.

      Regarding your comments on the prevalence of conjunctive grid by head-direction cells, we have revised the manuscript to make it explicit that the actual percentage of conjunctive cells with the necessary properties may be low in the entorhinal cortex (first paragraph of section “A note on our choice of the values of model parameters” of the Discussion): “Empirical studies in rodents found a wide range of tuning widths among grid cells ranging from broad to narrow (Doeller et al., 2010; Sargolini et al., 2006). The percentage of conjunctive cells in the entorhinal cortex with a sufficiently narrow tuning may thus be low. Such distributions (with a proportionally small amount of narrowly tuned conjunctive cells) lead to low values in the absolute hexasymmetry. The neural hexasymmetry in this case would be driven by the subset of cells with sufficiently narrow tuning widths. If this causes the neural hexasymmetry to drop below noise levels, the statistical evaluation of this hypothesis would change.” In addition, in Figure 5, we have applied the coupling between preferred head directions and grid axes to only one third of all grid cells (parameter pc= ⅓ in Table 1), following the values reported by Boccara et al. 2010 and Sargolini et al. 2006. To strengthen the link between Figure 5 and Figure 2, we now state the hexasymmetry when using pc= ⅓ along with a ‘realistic’ tuning width and jitter for head-direction modulated grid cells in Figure 2H. Additionally, we performed new simulations where we observed a linear relationship (above the noise floor) between the proportion of conjunctive cells and the hexasymmetry. This shall help the reader understand the effect of a reduced percentage of conjunctive cells on the absolute hexasymmetry values. We have added these results as a new supplementary figure (Figure 2 – figure supplement 2).

      Finally, regarding your comment on the findings by Gerlei et al. 2020, we now reference this study in our manuscript and discuss the possible implications (second paragraph of section “A note on our choice of the values of model parameters” of the Discussion): “Additionally, while we assumed that all conjunctive grid cells maintain the same preferred head direction between different firing fields, conjunctive grid cells have also been shown to exhibit different preferred head directions in different firing fields (Gerlei et al., 2020). This could lead to hexadirectional modulation if the different preferred head directions are offset by 60o from each other, but will not give rise to hexadirectional modulation if the preferred head directions are randomly distributed. To the best of our knowledge, the distribution of preferred head directions was not quantified by Gerlei et al. (2020), thus this remains an open question.”

      Reviewer #1, comment #4: Finally, a variant of the fourth hypothesis is that the hexasymmetry might be produced by a clustering of head direction preferences across head direction cells similar to that hypothesized in the first hypothesis, but without such cells having to fire in grid patterns. If head direction selectivity is so clustered, who needs the grids? This would explain why hexasymmetry is ubiquitous, and could easily be explored computationally by, in fact, a simplification of the models considered in this study.

      Response: We fully agree with you. We now explain this possibility in the Introduction where we introduce the conjunctive grid by head-direction cell hypothesis (fourth paragraph of the Introduction) and return to it in the Discussion (section “Potential mechanisms underlying hexadirectional population signals in the entorhinal cortex”). There, we now also explain that in such a case another mechanism would be needed to ensure that the preferred head directions of head-direction cells exhibit six-fold rotational symmetry.

      Reviewer #2 (Public Review):

      Reviewer #2, comment #1: Grid cells - originally discovered in single-cell recordings from the rodent entorhinal cortex, and subsequently identified in single-cell recordings from the human brain - are believed to contribute to a range of cognitive functions including spatial navigation, long-term memory function, and inferential reasoning. Following a landmark study by Doeller et al. (Nature, 2010), a plethora of human neuroimaging studies have hypothesised that grid cell population activity might also be reflected in the six-fold (or 'hexadirectional') modulation of the BOLD signal (following the six-fold rotational symmetry exhibited by individual grid cell firing patterns), or in the amplitude of oscillatory activity recorded using MEG or intracranial EEG. The mechanism by which these network-level dynamics might arise from the firing patterns of individual grid cells remains unclear, however.

      In this study, Khalid and colleagues use a combination of computational modelling and mathematical analysis to evaluate three competing hypotheses that describe how the hexadirectional modulation of population firing rates (taken as a simple proxy for the BOLD, MEG, or iEEG signal) might arise from the firing patterns of individual grid cells. They demonstrate that all three mechanisms could account for these network-level dynamics if a specific set of conditions relating to the agent's movement trajectory and the underlying properties of grid cell firing patterns are satisfied.

      The computational modelling and mathematic analyses presented here are rigorous, clearly motivated, and intuitively described. In addition, these results are important both for the interpretation of hexadirectional modulation in existing data sets and for the design of future experiments and analyses that aim to probe grid cell population activity. As such, this study is likely to have a significant impact on the field by providing a firmer theoretical basis for the interpretation of neuroimaging data. To my mind, the only weakness is the relatively limited focus on the known properties of grid cells in rodent entorhinal cortex, and the network level activity that these firing patterns might be expected to produce under each hypothesis. Strengthening the link with existing neurobiology would further enhance the importance of these results for those hoping to assay grid cell firing patterns in recordings of ensemble-level neural activity.

      Response: Thank you very much for reviewing our manuscript and your positive assessment. Following your comments, we have revised the manuscript to more closely link our simulations to known properties of grid cells in the rodent entorhinal cortex.

      Reviewer #3 (Public Review):

      Reviewer #3, comment #1: This is an interesting and carefully carried out theoretical analysis of potential explanations for hexadirectional modulation of neural population activity that has been reported in the human entorhinal cortex and some other cortical regions. The previously reported hexadirectional modulation is of considerable interest as it has been proposed to be a proxy for the activation of grid cell networks. However, the extent to which this proposal is consistent with the known firing properties of grids hasn't received the attention it perhaps deserves. By comparing the predictions of three different models this study imposes constraints on possible mechanisms and generates predictions that can be tested through future experimentation.

      Overall, while the conclusions of the study are convincing, I think the usefulness to the field would be increased if null hypotheses were more carefully considered and if the authors' new metric for hexadirectional modulation (H) could be directly contrasted with previously used metrics. For example, if the effect sizes for hexadirectional modulation in the previous fMRI and EEG data could be more directly compared with those of the models here, then this could help in establishing the extent to which the experimental hexadirectional modulation stands out from path hexasymmetry and how close it comes to the striking modulation observed with the conjunctive models. It could also be helpful to consider scenarios in which hexadirectional modulation is independent of grid firing, for example perhaps with appropriate coordination of head direction cell firing.

      Response: Thanks for reviewing our manuscript and for the overall positive assessment. The new Methods section “Implementation of previously used metrics” starts with the following sentences: “We applied three previously used metrics to our framework: the Generalized Linear Model (GLM) method by Doeller et al. 2010; the GLM method with binning by Kunz et al. 2015; and the circular-linear correlation method by Maidenbaum et al. 2018.” We have created a new supplementary figure (Figure 5 – figure supplement 4) in which we compare the results from these other methods to the results of our new method. Overall, the results are highly similar, indicating that all these methods are equally suited to test for a hexadirectional modulation of neural activity.

      In section “Implementation of previously used metrics” we then explain: “In brief, in the GLM method (e.g. used in Doeller et al., 2010), the hexasymmetry is found in two steps: the orientation of the hexadirectional modulation is first estimated on the first half of the data by using the regressors and on the time-discrete fMRI activity (Equation 9), with θt being the movement direction of the subject in time step t. The amplitude of the signal is then estimated on the second half of the data using the single regressor , where . The hexasymmetry is then evaluated as .

      The GLM method with binning (e.g. used in Kunz et al., 2015) uses the same procedure as the GLM method for estimating the grid orientation in the first half of the data, but the amplitude is estimated differently on the second half by a regressor that has a value 1 if θt is aligned with a peak of the hexadirectional modulation (aligned if , modulo operator) and a value of -1 if θt is misaligned. The hexasymmetry is then calculated from the amplitude in the same way as in the GLM method.

      The circular-linear correlation method (e.g. used in Maidenbaum et al., 2018) is similar to the GLM method in that it uses the regressors β1 cos(6θ_t) and β2 on the time-discrete mean activity, but instead of using β1 and β2 to estimate the orientation of the hexadirectional modulation, the beta values are directly used to estimate the hexasymmetry using the relation .”

      For each of the three previously used metrics and our new method, we estimated the resulting hexasymmetry (new Figure 5 – figure supplement 4 in the manuscript). In the Methods section “Implementation of previously used metrics” we then continue with our explanations: “Regarding the statistical evaluation, each method evaluates the size of the neural hexasymmetry differently. Specifically, the new method developed in our manuscript compares the neural hexasymmetry to path hexasymmetry to test whether neural hexasymmetry is significantly above path hexasymmetry. For the two generalized linear model (GLM) methods, we compare the hexasymmetry to zero (using the Mann-Whitney U test) to establish significance. Hexasymmetry values can be negative in these approaches, allowing the statistical comparison against 0. Negative values occur when the estimated grid orientation from the first data half does not match the grid orientation from the second data half. Regarding the statistical evaluation of the circular-linear correlation method, we calculated a z-score by comparing each empirical observation of the hexasymmetry to hexasymmetries from a set of surrogate distributions (as in Maidenbaum et al., 2018). We then calculate a p-value by comparing the distribution of z-scores versus zero using a Mann-Whitney U test. We use the z-scores instead of the hexasymmetry for the circular-linear correlation method to match the procedure used in Maidenbaum et al. (2018). We obtained the surrogate distributions by circularly shifting the vector of movement directions relative to the time dependent vector of firing rates. For random walks, the vector is shifted by a random number drawn from a uniform distribution defined with the same length as the number of time points in the vector of movement directions. For the star-like walks and piecewise linear walks, the shift is a random integer multiplied by the number of time points in a linear segment. Circularly shifting the vector of movement directions scrambles the correlations between movement direction and neural activity while preserving their temporal structure.”

      The results of these simulations, i.e. the comparison of our new method to previously used metrics, are summarized in Figure 5 – figure supplement 4 and show qualitatively identical findings when using the different methods. We have added this information also to the manuscript in the third paragraph of section “Quantification of hexasymmetry of neural activity and trajectories” of the Methods: “Empirical (fMRI/iEEG) studies (e.g. Doeller et al., 2010; Kunz et al., 2015; Maidenbaum et al., 2018) addressed this problem of trajectories spuriously contributing to hexasymmetry by fitting a Generalized Linear Model (GLM) to the time discrete fMRI/iEEG activity. In contrast, our new approach to hexasymmetry in Equation (12) quantifies the contribution of the path to the neural hexasymmetry explicitly, and has the advantage that it allows an analytical treatment (see next section). Comparing our new method with previous methods for evaluating hexasymmetry led to qualitatively identical statistical effects (Figure 5 – figure supplement 4).” We have also added a pointer to this new supplementary figure in the caption of Figure 5 in the manuscript: “For a comparison between our method and previously used methods for evaluating hexasymmetry, see Figure 5 – figure supplement 4.”

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      The paper is overall convincing. However, a little more attention to data presentation and possibly the addition of at least another technique (see below) would greatly strengthen the findings.

      As we hope to demonstrate below, we have taken steps to improve our manuscript on both fronts (data presentation and experimental evidence).

      The absence of statistics catches immediately the eye. I am sure that the shown differences are statistically significant (thanks to the number of analyzed cells), but reporting the result of some statistical test would help the reader in identify the relevant data in a plot. This is somehow necessary considering that sometimes in the text something is deemed to be "significant" or "not significant", and I felt that I really needed that when looking at the plot in Fig. 3D.

      To facilitate the interpretation of figures that contain data from multiple strains (such as the one mentioned by the reviewer), we have carried out a nonparametric single-step multiple comparison test (Games-Howell) to identify mutants whose means differ significantly from each other. To avoid overcrowding the figures, we have graphically summarized the p-values of all pairwise comparisons in a small matrix within the corresponding panel, and provided 99% confidence intervals and p-values of all differences in the Supplement.

      Related to the previous point: for every N/C distribution analysis, a number of analyzed cells is reported. By the way it is written, it seems that the replication relies solely by the cells in that specific population, i.e.: each cell is treated as a replicate. At least I could not find if that is not the case in the legends or in the methods. I wonder what the results would be (and their significance) if each replicate would be a new assay on another population.

      Cell populations exhibit significant variability in their phenotypic characteristics. Consequently, the quantification of a specific feature (e.g., the Sfp1 nuclear/cytoplasmic ratio) across a sample of cells from a given population results in a distribution rather than a single fixed value. For each quantification, we report the number of cells that were used to construct the corresponding distribution, i.e. the sample size. To compare samples from different populations (e.g., different Sfp1 mutant strains), we run them in parallel during microscopy experiments and compare their means, as described above. Throughout our study, we have tried to ensure that we quantify a sufficiently large number of cells to overcome cell-to-cell variability and enhance the reliability of our results.

      In this context, the question of the reviewer is not entirely clear to us, as individual measurements of a sample are not replicates. However, one can replicate the entire experiment on a different day by re-growing the different strains, running microscopy, quantifying the new movies etc. In this sense, the experiments shown in the manuscript consist of single replicates, i.e. experiments that were carried out on the same day, with all the relevant mutants and controls quantified together. However, we have monitored many of our mutants multiple times over the course of our work. For example, Fig. 1 below shows replicates of the Sfp1 N/C ratio distributions at steady-state in the analog-sensitive (A) and wild-type (B) background, which were quantified several times across various experiments. While day-to-day variability in the empirical distributions of the same mutant exists to a small extent, it is quite small.

      The scale of x axes in N/C ratio plots. Besides not being consistent throughout the figures, it originates from 1, visually enhancing the differences.

      We believe the reviewer was referring to the y-axes, as the x-axes represent time. Summarizing the N/C ratio dynamics of different Sfp1 mutants has been challenging. First, the average N/C ratios at steady-state vary considerably across different mutants, as shown in the panels that summarize steady-state N/C ratios. To compare the magnitude and features of their responses, normalization is necessary. We chose to normalize the time series of each mutant to have a mean of 1 prior to the onset of a perturbation. This allows the normalized time series to represent the percentage-wise changes in the Sfp1 N/C ratio upon perturbation.

      Using a common y-axis scale for all plots of N/C ratio dynamics not ideal, as some responses are subtler than others. Additionally, we do not believe that N/C dynamics across different figures need to (or should) be compared to each other. However, within a figure, panels that require comparison are placed in the same row and share the same y-axis scale. We believe that this approach optimizes data visualization and facilitates important visual comparisons.

      Related to the previous point: it is evident from the plots that the N/C ratio is always positive, even in the most deficient of the analyzed mutants. This implies that a relevant fraction of Sfp1 is still nuclear. I thus wonder what the impact of these mutations would be on the actual function of Sfp1. For this reason, I feel that qPCR evaluation of transcripts of Sfp1 target genes is particularly needed. Since lack of Sfp1 is known to yield some of the smallest cells possible, it would also be cool to have an estimate of the size of mutants where Sfp1 is less nuclear. These analyses could confer phenotypical relevance to the data, but would also help in assessing a currently unexplored possibility, that phosphorylation events by PKA influence Sfp1 function besides its localization, i.e.: the still somehow nuclear fraction is not as functional as wt Sfp1 in promoting transcription.

      It is indeed the case that the recorded N/C ratios are larger than 1 in all strains that we have monitored. We have never observed an N/C ratio smaller than 1 using widefield microscopy for two main reasons: first, out-of-focus light from the cytosol above and below the nucleus is added to the nuclear signal, causing the nuclear signal to always be non-zero, even for predominantly cytosolic proteins. Second, both in- and out of focus vacuoles are devoid of the fluorescent protein fusions that we quantify, which reduces the average brightness of the cytosol. For these reasons, even when a protein is largely cytosolic, the average N/C ratio over a cell population is no lower than around 1.5. Keeping these points in mind, one can observe that our most delocalized Sfp1 mutants have an N/C ratio that is around 1.6-1.7, which is very close to the lower limit. This means that these Sfp1 mutants are largely cytosolic, and the nuclear fraction (if non-zero) is quite small.

      We agree that assessing the phenotypic relevance of Sfp1 mutations is of interest. However, this was impossible with our original strains, as we introduced each Sfp1 mutant as an extra copy in the HO locus while leaving the endogenous Sfp1 locus intact. This was done in order to avoid any phenotypic changes that might result from changes in Sfp1 activity.

      To address the suggestion of the reviewer, we therefore deleted the endogenous Sfp1 copy in strains carrying sfp1PKA2A, sfp1PKA2D and sfp113A, leaving only the mutated Sfp1 copy at the HO locus. Surprisingly, the growth rate and drug sensitivity (determined by halo assays) of these single-copy mutants did not differ much in comparison to the mutants carrying the functional Sfp1 copy and from the wild-type (Supp. Figs. 4J and 7). This observation aligns with findings for the single-copy sfp1-1 mutant in [Lempiäinen et al. 2009], which corresponds to sfp1TOR7A in our work. [Lempiäinen et al. 2009] had suggested that Sch9 compensates for the loss of Sfp1 activity via a feedback mechanism, which could explain our results as well. If this is the case, acute depletion of wild-type Sfp1 could unveil transient changes in cell growth, before the compensatory effect of Sch9 was established. Unfortunately, we were unable to efficiently degrade wild-type Sfp1 carrying a C-terminal auxin-inducible degron. Instead, we followed the same approach with [Lempiäinen et al. 2009] and deleted SCH9.

      As we describe in the last section of Results, the difference was dramatic for sfp113A __mutants, which were extremely slow-growing in the absence of Sch9 (doubling time was around 4 hours, but it was hard to estimate because we could not grow the cells consistently). Interestingly, SCH9 deletion had a negative impact on sfp1__PKA2D __but not sfp1__PKA2A __cells (__Supp. Fig. 7). Overall, these results demonstrate that Sch9 can compensate for loss of Sfp1 activity, which makes it challenging to study the impact of Sfp1 mutations on cellular phenotypes.

      To further understand to what extent Sch9 compensates for loss of Sfp1 phosphorylation, we carried out RNA-seq on WT and cells carrying a single copy of sfp113A (with the endogenous SFP1 copy removed). Despite the fact that sfp113A __grow as well as WT, RNA-seq picked up several differentially expressed genes related to amino acid biosynthesis. This surprising finding is presented in the last section of Results, and in __Supplementary Figures 8, 9 and 10. We explore the relevance of these results and their connection with past literature on Sfp1 and Sch9 in the Discussion section.

      I found some typos here and there, and it would greatly help to report them if in the manuscript line numbers were included.

      We apologize for the typos. We have tried to eliminate them, and we have also added line numbers to the manuscript.

      Reviewer 2

      There is no biochemical evidence presented that the putative PKA sites (S105 and S136) are genuinely phosphorylated by PKA. The fact that they match the PKA consensus motif, alone, does not guarantee this. In order to claim that they are looking at the effect of PKA by mutagenizing these residues, the authors have to demonstrate the PKA-dependency of S105 and S136 phosphorylation by, for example, mass spec experiments or western blotting with phospho-specific antibodies (Cell Signaling Technology #9624 for example). Also, does the band-shift caused by PKA inhibition (Fig 3C) is canceled by the S105A/S136A mutation?

      We took several actions to demonstrate that the putative PKA sites are indeed phosphorylated by PKA. We first tried to detect Sfp1 phosphorylation using the antibody mentioned by the reviewer, but failed as the sensitivity of this antibody appears to be quite low. On the other hand, mass spectrometry did not produce the right fragments to detect the sites of interest. We therefore resorted to an in vitro kinase assay using [γ-32P]ATP together with purified PKA and Sfp1. Unfortunately, bacterial overexpression of MBP-tagged Tpk1, Tpk2 and Tpk3 (the catalytic subunits of PKA) was quite challenging and we were unable to produce soluble protein. We therefore resorted to commercially available bovine PKA (bPKA, PKA catalytic subunit, Sigma-Aldrich 539576), which shows high homology to the yeast Tpk kinases [Toda et al. 1987]. Moreover 87% of bPKA substrates have been shown to also be Tpk1 substrates [Ptacek et al. 2005], and bPKA has been used to identify new Tpk substrates in budding yeast [Budovskaya et al. 2005__]. As we show in the revised manuscript, bovine PKA does phosphorylate Sfp1. Moreover, phosphorylation is reduced by 50% in the double S105A, S136A mutant (Fig.1F), and becomes undetectable in the 13A mutant__ (Supp Fig. 6). Together with the rapid response of Sfp1 localization to acute PKA inhibition which we had already reported, we believe that these results provide strong evidence that Sfp1 is a direct PKA substrate, and that the two phosphosites that we identified are functional.

      As the above in vivo experiments do not exclude S105/S136 phosphorylation by other kinases downstream of PKA, in order to claim the direct phosphorylation, the authors need in vitro PKA kinase assay. These biochemical experiments are not trivial, but I think absolutely necessary for this story.

      One cannot exclude that S105/S136 are also phosphorylated by other kinases of the AGC family (note that [Lempiäinen et al. 2009] has already excluded Sch9). However, as we hope to have shown, PKA indeed phosphorylates Sfp1. Examining if other kinases besides PKA and TORC1 target Sfp1 is a very interesting question that should be addressed in future work.

      The authors only look at the localization of Sfp1. To assess its functionality and so physiological impact, it would be informative to measure the mRNA level of target ribosomal genes in various Sfp1 mutants they created.

      As we described in our response to Reviewer 1 above, we did perform RNA-seq on WT and cells carrying a single copy of sfp113A. We observed a notable absence of differentially expressed ribosomal genes and ribosome-related categories in the GO analysis (Supp. Figs. 8, 9 and 10). Together with our observations on SCH9 deletion (Supp. Fig. 7), these results suggest that Sch9 can largely compensate for the loss of Sfp1 activity. On the other hand, the emergence of differentially expressed amino acid biosynthesis genes is a finding that merits further investigation, as it connects with previous observations made with Sch9 deletion mutants and the [ISP+] prion form of Sfp1 (cf. Discussion).

      In the experiments using analog-sensitive PKA (Fig 1D and E for example), they directly compare wildtype-PKA versus analog sensitive-PKA, or with 1-NM-PP1 versus without 1-NM-PP1. This makes interpretation difficult, particularly because 1-NM-PP1 itself has a significant impact even in the wild PKA strain. The real question is the difference between wild-type Sfp1 versus mutant Sfp1. In the current form, they compare Fig 1D versus 1E, these two do not look like a single, side-by-side experiment. They should compare wild-type Sfp1 versus mutant Sfp1 side-by-side.

      Figure 1D shows that 1-NM-PP1 has a transient off-target effect on Sfp1 localization in WT cells, which could also affect Sfp1 mutants. This observation prompted us to use wild-type PKA as a control when testing the effect of 1-NM-PP1 on sfp1PKA2D in cells carrying PKAas (Figure 1E). As Fig. 1E shows, the effect of 1-NM-PP1 on sfp1PKA2D localization in PKAas cells is quite similar to the off-target effect in cells carrying sfp1__PKA2D __and wild-type PKA. This behavior of sfp1__PKA2D __is clearly different from the response of wild-type Sfp1 to PKAas inhibition, which results in sustained delocalization. We have made the latter observation repeatedly, both in this study and our previously published work [Guerra et al. 2021].

      In Figure 3, the argument around the additive effects of PKA and TORC1 is confusing. The authors say they are additive referring Figure 3E, but say they are not additive referring Figure 3B. Which is true? In fact, Figure 3B appears to show an additive effect as well.

      We did not use the word "additive" in the text, because we find it difficult to interpret. Instead, we state that PKA and TORC1 appear to control Sfp1 phosphorylation independently of each other. PKA and TORC1 phosphorylation converges to the same response, affecting Sfp1 localization. It appears that loss of either kinase delocalizes Sfp1, while loss of both kinases may only have a small additional effect.

    1. Josh is, by the way, a philosopher and a neuroscientist, so this gives him special powers. He doesn't sort of sit back in a chair, smoke a pipe and think, "Now why do you have these differences?" He says, "No, I would like to look inside people's heads, because in our heads we may find clues as to where these feelings of revulsion or acceptance come from." In our brains.

      AUTHORITY ??

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): ** Summary

      The nucleus is recognised as a core component of mechanotransduction with many mechano-sensitive proteins shuttling between the nucleus and cytoplasm in response to mechanical stimuli. In this work, Granero-Moya et al characterise a live florescent marker of nucleocytoplasmic transport (NCT) and how it responds to a variety of cues. This work follows on from the authors previous study (Andreu 2022) where they examined the response of passive and active NCT to mechanical signalling using a series of artificial constructs. One of these constructs (here named Sencyt) showed a differential localisation depending on substrate stiffness, accumulating in the nucleus on stiffer substrates (which the authors previously showed was due to differences in mechano-sensitivity of passive versus facilitated NCT). Here the authors use Sencyt as a tool to probe how different cues affect NCT and thus nuclear force-sensing in two different cell lines (one epithelial, one mesenchymal). *

      They have established a 3D image segmentation pipeline to measure both the nuclear/cytoplasmic ratio of Sencyt and 3D nuclear shape parameters. As a proof-of -principle, they show that hypoosmotic shock (which inflates the nucleus and would be expected to increase nuclear tension) and hyper-osmotic shock (which shrinks and deforms the nucleus) alter Sencyt nuclear-cytoplasmic ration as expected. They then show that inhibiting acto-myosin, which would be expected to block force transduction to the nucleus, reduces NCT, although interestingly this is without any changes to nuclear morphology. They then examine how cell density affects NCT and show that Sencyt localisation correlates only weakly with density but much more strongly with nuclear deformation (especially as measured by solidity). This is surprising considering that mechano-sensitive transcription factors such as YAP have been shown to exit the nucleus at high cell densities. Therefore, the authors directly compare Sencyt and Yap nucleo/cytoplasmic localisation and show that Sencyt behaves differently to YAP with YAP localisation correlating strongly with cell density. This reveals an added layer of complexity in YAP regulation beyond pure changes to NCT.* Major points *

      The data presented throughout this work are high quality and rigorous. The controls used are appropriate (including the use of a freely diffusing mCherry to illustrate the specificity of the Sencyt probe in osmotic shock experiments - figure S2). Experiments are properly replicated and the statistical analysis is appropriate. The data are beautifully presented in figures and the manuscript is well written and very clear. Overall this is a high quality work.

      We thank the reviewer for the positive assessment of the manuscript.

      * The discussion is careful and the conclusions are supported by the data. My only small concern is that the authors place too much emphasis on how this work is in 'multicellular systems' as opposed to their previous work in single cells (for example "Here, we demonstrate that mechanics also plays a role in multicellular systems, in response to both hypo and hyper-osmotic shocks, and to cell contractility. L212). Cell density is only controlled in figures 3 and 4 and in some of the earlier experiments, cells look quite sparse (eg Figure 2). It's also debatable how far a monolayer of cancer cells, which lack contact inhibition of growth, is a multicellular system. Furthermore, the authors don't specifically look at cell/cell adhesion or observe major differences between the epithelial or mesenchymal lines. For this reason, the authors should tone down this discussion before publication. *

      • *

      We agree with the reviewer that properly assessing cell-cell adhesion is important in the context of the work. To this end, we have stained for E-cadherin in both cell lines. As expected and as described previously, the results confirm that MCF7 cells do have clear cadherin-mediated cell-cell adhesions, with a cadherin staining localized specifically in cell-cell junctions. Also as expected, C26 cells show much lower cadherin expression, without a clear pattern. Further confirming this difference, MCF7 cells show clearly distinct actin organizations in their apical and basal sides, whereas C26 cells do not. Thus, we believe that the two cell models do represent a reasonable assessment of epithelial versus mesenchymal phenotypes, in a multicellular context. The data are presented in new supplementary fig. 1, and discussed in page 3 of the manuscript (first paragraph). We have also included a paragraph in the discussion to comment on the differences between cell types (page 7, 2nd paragraph).

      * Optional experimental suggestions: For me, the most compelling finding is that nuclear deformation has a greater correlation with NCT than cell density and that this is different from the behaviour of YAP. To cement the importance of nuclear deformation, the authors could induce deformation in single cells, for example by culture on very thin micropatterned lines and assess the localisation of Sencyt and YAP. It would also be interesting to assess the role of force transduction in this context or in different densities by removing actin, which affects NCT without inducing nuclear shape changes. These functional experiments would allow the authors to draw stronger conclusions about the role of nuclear shape and deformation but they aren't necessary for publication. *

      • *

      This is a very interesting suggestion. Following the reviewer's advice, we have now carried out experiments in which we have seeded cells on micropatterns of different sizes, and measured both sencyt and YAP ratios. In C26 cells, we have found as expected that increasing spreading leads to progressive nuclear deformation (as measured through nuclear solidity) and progressive increase in both sencyt and YAP ratios. Interestingly, cell spreading in MCF7 did not affect nuclear solidity, sencyt ratios, or YAP ratios. This further confirms the relationship between nuclear deformation and nucleocytoplasmic transport, and shows as well that different cell lines have different sensitivities. The lack of response of MCF7 cells is consistent with the lower sencyt response, and lower sencyt/nuclear shape correlation measured in fig. 4. It suggests that MCF7 cells may have mechanisms to shield the nucleus from deformation, something which we have reported in a different context (Kechagia et al., Nat. Mater. 2023). The new results are reported in new fig. 3, and supplementary fig. 8, and discussed in pages 5 (1st paragraph) and 6 (1st paragraph) of the manuscript results.

      • *

      Minor points

      * - I'd like to see better examples of 3D reconstructions of nuclei (ie fig 1C but bigger) in different conditions. This is especially important in figure 3 where it would be helpful to see examples of nuclei with high or low solidity. The differences in oblateness are clear to see from the images in 3a and 3f but solidity could be better illustrated. *

      • *

      We have now added 3D reconstructions as requested, which illustrate the nuclear shape changes that take place. This is shown in figs. 1, 4 (which corresponds to figure 3 in the previous version of the manuscript), s3, and s7.

      *

      • Where Sencyt index is plotted, it would be clearer to add labels to at least figure 1 which indicate whether it is more cytoplasmic or nuclear. *
      • *

      We have done this as requested in figure 1.

      * Reviewer #1 (Significance (Required)): *

      * In this work, Granero-Moya et al characterise a new tool for measuring NCT and show that it is mechanically regulated. Given the importance of NCT in mechano-transduction, this tool will be a great asset to the mechano-biology community and will likely be adopted by multiple groups in the future. The findings about the effects of cell density on NCT and differences from YAP are interesting but could be further fleshed out. This work is likely to be of greatest interest to a specialised audience working in the fields of mechano-biology and nuclear transport. *

      • *

      We thank the reviewer for the positive assessment.

      * *

      • *

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)): *

      * The study conducted by Granero Moya and colleagues describes the application of a synthetic protein which is observed to enter the nucleus in response to mechanical strains, rather than being influenced by cell density. However, the novelty of this work is minimal since the conceptual framework and the utilization of this identical or similar tool have been previously reported by the same team in earlier publications. *

      • *

      We respectfully disagree with the assessment of the reviewer. Please see below for a detailed response regarding novelty.

      • *

      *In their experiments, they employ this GFP-based sensor, referred to as Sencyt, in cells subjected to osmotic shocks. These shocks are highly stressful and impact a range of cellular processes, including stress response pathways MAPK and others; Osmoregulatory pathways; cell cycle regulations, autophagy and death pathway; ion channel regulations and others. The second findings are on cells treated with a combo of drugs affecting the actin cytoskeleton. The justification for using a combination of two specific drugs remains unclear, as the study does not adequately explain the rationale behind this choice. Additionally, there is a lack of information regarding the full range of targets these drugs affect. This raises questions about the comprehensiveness and applicability of the findings, as understanding the complete scope of the drugs' targets is crucial for interpreting the results within a minimal frame of physiological context. *

      • *

      The two drugs used are paranitroblebbistatin (a photostable version of blebbistatin) and Ck666. We apologize for not explaining in more detail the action of these drugs, both of which have been characterized and used extensively in the literature. Paranitroblebbistatin binds to myosin, preventing its ATPase activity and therefore impairing actomyosin contractility (https://doi.org/10.1002/anie.201403540). It acts on different myosin isoforms, including non-muscle myosin II, the main type of myosin responsible for actomyosin contractility in non-muscle cells. CK666 binds to and inhibits arp2/3, a protein responsible for nucleating branched actin (https://doi.org/10.1016/j.chembiol.2013.03.019). This impairs lamellipodial formation and therefore cell spreading (see for instance https://doi.org/10.1371/journal.pone.0100943).

      The rationale for using both drugs in combination was explained in page 4 of the manuscript. In our previous work, we determined that myosin inhibition with blebbistatin is not sufficient to inhibit nuclear mechanotransduction. Indeed, in an epithelial context, we observed that due to reduced contractility, blebbistatin-treated epithelial cells in fact spread more on their substrate. This leads to more deformed (flattened) nuclei, leading to the counterintuitive result that YAP nuclear localization increases rather than decreases. If cell spreading is impaired by interfering with branched actin nucleation, then this spreading is prevented, and the combination of drugs leads to reduced nuclear deformation, and reduced YAP nuclear localization (see supplementary fig. 7 in Kechagia et al, Nat. Mater. 2023, https://doi.org/10.1038/s41563-023-01657-3). Similar results had been published previously by the group of Clare Waterman (https://doi.org/10.1074/jbc.M115.708313).

      Thus, the combination of drugs was designed to ensure that we were impairing nuclear mechanotransduction. Of course, we agree with the reviewer that all perturbations have potential side effects. Osmotic shocks will affect a range of cellular processes (as mentioned in the discussion of the manuscript), and any drug treatment can potentially have off-target effects. However, the fact that two orthogonal perturbations with different potential side effects (osmotic shocks versus actomyosin-targeting drugs) lead to the same effects in sencyt strongly suggests that the effect is mediated by mechanics, and not other factors. To reinforce this, we have now added an additional mechanical manipulation: seeding cells on micropatterned islands of different sizes. As spreading increases, cells are known to increase actomyosin contractility, and nuclear deformation (https://doi.org/10.1529/biophysj.107.116863, https://doi.org/10.1073/pnas.0235407100, https://www.nature.com/articles/ncomms1668, https://doi.org/10.1073/pnas.1902035116). As expected, nuclear solidity, sencyt ratios, and Yap ratios all increased with cell spreading. Interestingly, this occurred only for C26 and not MCF7 cells, where no changes were measured in solidity, sencyt, or YAP. The lack of response of MCF7 cells is consistent with the lower sencyt response, and lower sencyt/nuclear shape correlation measured in fig. 4. It suggests that MCF7 cells may have mechanisms to shield the nucleus from deformation, something which we have reported in a different context (Kechagia et al., Nat. Mater. 2023).

      The new results are shown in figs. 3 and s8. We have also expanded the explanation of drug treatments in page 4 (3rd paragraph).

      * The novelty is on the specificity of this synthetic fusion protein for these manipulations and not on cell density. Yet, the reasons behind this selective response remain unexplained, potentially attributable to the unique characteristics or sensitivity thresholds of their synthetic probe. As comparison, YAP localization and this is sensitive to both inputs, but this is also already published (fig4). The focus is anyway on Sencyt for which they offer simple observations and quantifications. *

      • *

      The main novelty of the work lies in the characterization of the role of nucleocytoplasmic transport in mechanotransduction, in the context of multicellular systems. We and others had shown that nucleocytoplasmic transport responds to mechanical force in the context of single cells (see for instance Andreu et al. 2022 from our group, but also https://doi.org/10.1126/science.abd9776 from the Martin Beck group). However, to what extent this applies to multicellular systems was unknown. It is true that in multicellular systems, the response of YAP and other mechanosensitive transcription factors has been characterized (such as in our Elosegui-Artola 2017 paper, mostly done at the single cell level but including one figure panel on epithelial cell monolayers). The reviewer argues here and in the consultation comments with other reviewers (see below) that this demonstrated the role of nucleocytoplasmic transport in multicellular systems. However, we respectfully disagree. As also noted by reviewer 3 in the consultation, the response of YAP, and of any transcription factor, may include effects on nucleocytoplasmic transport, but will also likely include effects caused by the complex biochemical signalling pathways that regulate them. Disentangling such effects requires a sensor that only responds to nucleocytoplasmic transport, and this is precisely what Sencyt provides.

      The reviewer also states that our manuscript does not explain why sencyt responds to mechanics and not cell density. We disagree: sencyt responds to mechanics for the reasons explained in our previous work (Andreu et al., Nat. Cell Biol. 2022), and there is no reason to expect a specific response to cell density. In this regard, we don't think there are any sensitivity thresholds to detect cell density, as the probe is not designed to sense this parameter in the first place. The fact that YAP responds to both mechanics and cell density shows that the response to density cannot be merely explained by mechanics, and is rather due to signalling through other means. Of course, we agree that we do not explain the mechanism by which YAP senses cell density, but we think this lies clearly out of the scope of our manuscript.

      In terms of novelty, our work also characterizes a tool to assess nucleocytoplasmic transport live in cells. We agree with the reviewer that the specific construct had been reported in our previous paper, but it had not been characterized in detail. This is done here, enabling its use by the community as a tool to measure nucleocytoplasmic transport in any context, be it related to mechanics or not.

      • *

      When reviewing the figures presented, I find it challenging to detected marked differences, despite their quantitative data suggesting otherwise.

      • *

      We assume here that the reviewer refers to differences in sencyt nuclear localization, that is, the sencyt index. We have now checked the example images showing changes in sencyt index, in figures 1 and 2. In figure 1, the example cells under hypo-osmotic shocks increase their sencyt index from 1.2 to 1.45 (C26). In figure 1, the example cells under hyper-osmotic shocks decrease their sencyt index from 0.9 to 0.3 (MCF7) and from 1.4 to 0.5 (C26). In figure 2, the example cells increase their sencyt index upon drug washout from 0.2 to 1.4 (MCF7) and from 0 to 0.9 (C26). Of course, these individual values don't reflect exactly average values, but they do reflect the reported average trends and their magnitudes faithfully. Here we note that even though sencyt changes with the different treatments, it is always more nuclear than cytosolic (sencyt index >0, as it has an NLS). Thus, to the naked eye, sencyt always seems to show a "bright" nucleus, and it is hard to intuitively see changes in its localization. Further, we also note that osmotic shocks lead to overall changes in fluorescence levels due to volume changes (as GFP molecules get diluted or concentrated in hypo or hyper osmotic shocks, respectively). This does not affect ratiometric quantifications as assessed with our mcherry control, but means that changes in ratios are hard to see by eye. To help in this visualization, we have now changed the images from green to grayscale, which is better perceived by the human eye. We have also specified the issue of fluorescence intensity changes in the legend of the figure.

      In addition to this, we have seen that there is indeed a case in which examples were not following average trends. In the case of hypo-osmotic shocks in figure 1, example MCF7 cells were barely changing their sencyt index with treatment. We apologize for choosing this non-representative image for the figure, we have now changed the figure to show more representative cells.

      • Furthermore, the study attempts to correlate the behavior of Sencyt with the nuclear geometric parameter of solidity, a connection that seems to lack a clear basis in cell biology and could potentially lead to misconceptions. *
      • *

      Mechanical effects on nucleocytoplasmic transport are mediated by mechanical tension application to nuclear pores, which are embedded in the nuclear membrane (nuclear envelope). Whereas nuclear envelope tension is very challenging to measure directly, it can be indirectly related to nuclear shape. Indeed, a tense membrane will tend to even out membrane irregularities and appear rounded, whereas a membrane under low tension will tend to show wrinkles. Nuclear solidity is a geometric parameter that compares actual nuclear volume to the volume of the convex hull (intuitively, the volume of the smallest wrinkle-free object containing all of the nucleus). Thus, it is the geometric parameter that best reflects the presence of wrinkles, folds or irregularities, and as such the one that should best correlate to membrane tension. Of course, this correlation is not perfect, and there could be many situations in which changes in membrane tension may not directly affect nuclear solidity. But we do believe that solidity is the geometrical parameter that should best reflect membrane tension, and this is why we focus on it. Consistent with our hypothesis, solidity is the geometrical parameter that best correlates with sencyt. To further clarify this, we now explain this rationale in detail in page 4 of the manuscript (1st paragraph).

      * Reviewer #2 (Significance (Required)): *

      * In sum, I think the MS is of interest for a very specialistic audience. There are no clear interpretations. The work is done in one or two cellular model systems in vitro; and the general significance of these observations is of very limited impact and no novelty. *

      We strongly disagree. The study is done on two cellular models, one with epithelial and the other with mesenchymal phenotype, and thus highly relevant for multicellular systems. Following suggestions by reviewers 1 and 2, we have now characterized the epithelial/mesenchymal behaviour of the cell types in detail (see supp. fig. 1). The results are novel in that they demonstrate the role of nucleocytoplasmic transport in multicellular systems, something which as argued above had not been done before. The difference with YAP, and the disentanglement between transport and signalling, is also novel. Finally, we believe the manuscript will be impactful because of this novelty, but also because of the availability of sencyt as a tool for the community. In fact, since placing this manuscript in biorxiv, we have received many requests (directly and through addgene) to share sencyt, which is currently being used in several labs across the world.

      • *

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)): *

      • *

      In this very well-written manuscript, Pere Roca-Cusachs and colleagues investigated the response of nucleocytoplasmic transport (NCT) to mechanical stress and tested whether this response is similar in epithelial and mesenchymal cells using a combination of quantitative approaches. This study builds upon their earlier findings, which elegantly demonstrated that NCT is sensitive to mechanical forces transmitted to the nuclear membrane. Using a similar approach to their recent work, they quantitatively analyzed NCT and compared the two cell types using various treatments that impact nuclear membrane tension. The study is straightforward and experimentally sound, with an adequate number of replicates and independent experiments. While one might consider the limitations given their previous work, none have demonstrated that NCT is mechanosensitive in epithelial cells. Additionally, they provide a simple approach to measure NCT, which should be of interest in the field. However, it is unclear how the authors defined the epithelial phenotype in this work and whether they solely based this characterization on the tissue/cell's origin. Epithelia can be defined ultrastructurally with reference to their apico-basal polarity and specific cell-cell junctions (Alberts et al., 1994; Davies and Garrods, 1997). Changing cell density should affect cell/cell adhesion, but the authors provide no evidence that the cells tested in the study are attached to their neighbors on all sides and form an epithelium. While I recognize that the objective of this study is not to mimic the in vivo behavior of epithelial tissue, the authors should at least ensure that cells form a monolayer by quantitatively assessing cell-cell junctions (or they should adjust their conclusions adequately). This control is specifically important for Figure 3 and 4, whose objective is to test the impact of cell/cell contacts. But it would also be important to provide this essential control for Figure 1 and 2, as it is unclear from the images provided if MCF7 cells are forming an epithelium (and form cell/cell junctions).

      • *

      We thank the reviewer for the positive assessment of our work. We fully agree with the reviewer that properly assessing cell-cell adhesion is important in the context of the work. To this end, we have stained for E-cadherin in both cell lines. As expected and as described previously, the results confirm that MCF7 cells do have clear cadherin-mediated cell-cell adhesions, with a cadherin staining localized specifically in cell-cell junctions. Also as expected, C26 cells show much lower cadherin expression, without a clear pattern. Further confirming this difference, MCF7 cells (but not C26 cells) show a clear apico-basal polarization, with distinct actin organizations in their apical and basal sides. Thus, we believe that the two cell models do represent a reasonable assessment of epithelial versus mesenchymal phenotypes, in a multicellular context. The data are presented in new supplementary fig. 1. We have also included a paragraph in the discussion to comment on the differences between cell types (page 7, 2nd paragraph).

      • Reviewer #3 (Significance (Required)): *

      • *

      The mechanosensitivity of NCT is an important question central to many aspects of cell biology. One might consider the impact of the proposed work limited, given their previous research. However, none have demonstrated that NCT is mechanosensitive in epithelial cells, making it a crucial question that needs to be addressed. Additionally, they provide a simple approach to measure NCT, which should be of interest to a broad audience.

      We thank again the reviewer for this positive assessment.

      • *

      *Referees cross-commenting *

      * Here comments from all 3 reviewers are reported *

      * Reviewer 1: *

      * I disagree with R2's comment that there is 'no novelty' here. Although this work is going to be of greater interest to a specialised rather than general audience, it characterises in depth a simple tool to measure NCT which will be useful for mechanobiology field. Also, using 'two cellular model systems in vitro' is very standard in the field when assessing subcellular processes like NCT. Using this approach in vivo would be very interesting but challenging and would be an entirely different study . *

      • *

      *I agree with R2's comments that the authors should better justify their combination of two actin inhibitors and R3s point on better assessing cell/cell junctions. *

      • *

      We thank the reviewer for these comments. Both issues have been addressed, as described in the response to reviewers above.

      * Reviewer 2 *

      * About Reviewer 3's comments, I believe it's a stretch to highlight the strength and novelty based on "NCT's mechanosensitivity in epithelial cells has not been demonstrated,". There are thousands of papers on the Hippo pathway, that is known to be mechanosensitive, on the regulation of YAP, that enters in the nucleus in Hippo inhibited conditions and exits to the cytoplasm in Hippo induced cells, including downstream of mechanical signals. The phenomenon of nuclear-cytoplasmic shuttling being a common event from neurons to endothelial and multiple types of epithelial, immune, and fibroblast cells is already established through NCT of this and other endogenous proteins. This is simply an accepted fact. Then, The Nature cell Biology 2022 was offering a very general claim. No warning that conclusions could have been cell type specific. In the Artola 2017 Cell paper they also showed NCT in mammary epithelial cells. We should definitively conclude that NCT's mechanosensitivity in epithelial cells has been well demonstrated. *

      • *

      We disagree with this assessment, for the same reasons also exposed by reviewer 3 below. Previous work on YAP and other transcription factors cannot be seen as a demonstration of the role of nucleocytoplasmic transport per se. The localization of any transcription factor is highly regulated by complex signalling pathways, and can be affected by many factors. One of them is nucleocytoplasmic transport, but signalling events (for instance through phosphorylation) could change localization by promoting binding to cytosolic or nuclear binding partners, by promoting protein degradation, by masking nuclear localization signals, and others. To isolate the role of nucleocytoplasmic transport, a probe sensitive only to this factor should be designed. This is exactly what sencyt provides. In fact, this has allowed us to answer an important open question: is the sensitivity of YAP to cell density mediated by mechanics and nucleocytoplasmic transport, or is it mediated by some other factor? Our results suggest that some other factor, likely mediated by the Hippo pathway and not necessarily mechanotransduction, explains this sensing of cell density. This is a novel finding, which was not provided in either our Elosegui-Artola 2017 paper or our Andreu 2022 paper.

      * About Reviewer 1: I find it challenging to grasp the point made in the comment. On novelty, in their previous study in NBC 2022 Syncet was already shown to undergo NCT. The reviewer states that the study presents "a simple tool to measure nuclear-cytoplasmic transport (NCT) beneficial for the mechanobiology field, and evidence that this demonstrates a novel layer of regulation in hippo signaling (also because this is observational and not a mechanistic study). The tool in question is far from simple. Its application requires transfection into cell cultures, conducting live imaging, etc. If one aims to measure NCT of endogenous proteins, straightforward immunofluorescence or live imaging of endogenous proteins (like GFP-tagged YAP, Twist, Smads, etc.) using the same experimental setup should suffice to demonstrate relevance, without necessitating any additional experiments. What then, is the unique benefit of this proposed tool? Given it's an artificial construct combining NLS-GFP with a bacterial protein, questions arise about the effects of the forced nuclear localization signal (NLS) or the bacterial component. It is an empirical artificial construct and there is no mechanism to explain its behavior.The comparison of Syncet with YAP seems to me questionable and of limited utility. *

      As also noted by reviewer 3 below, the use of genetically encoded fluorescent sensors that require transfection is by now absolutely standard in biology, and cannot be considered to be "far from simple". And as stated above, imaging of endogenous transcription factors (which also requires transfection if it is done live) does not isolate the role of nucleocytoplasmic transport. We also disagree that "there is no mechanism to explain its behaviour". Sencyt was developed in our previous andreu et al 2022 paper, where the mechanism is explained in detail.

      • *

      *It's unsurprising that an artificial construct only mirrors some aspects of what is considered a genuine mechanosensitive protein. The utility of a synthetic tool lies in its ability to replicate actual phenomena, not in what it fails to do. In comparison to their NBC 2022 study, this manuscript focuses on what their reporter fails to detect. *

      We disagree that a synthetic tool is only useful if it replicates the behaviour of endogenous proteins. A synthetic tool, precisely due to its engineered, artificial nature, can be made to respond only to specific factors (in this case, nucleocytoplasmic transport). This can then be used to disentangle the role of such specific factors, as done here.

      The osmotic shock was the assay in their 2017 Cell paper. Here they demonstrate that a combination of Blebbistatin+CK (an unclear choice of drugs) is ineffective, as is cell density. Are there other specific peculiarities associated with this construct?

      Here, we note that our osmotic shock experiments in our 2017 paper were done for YAP (not nucleocytoplasmic transport in general). Regarding the choice of drugs, please refer to our answer to the reviewer comments above for a full explanation. Also, we want to clarify that this combination is not ineffective, as it leads to clear changes in sencyt. * *

      * My other concern is on the minor quantitative changes reported, which seem inconsistent with the provided representative images, where significant differences are difficult to appreciate. For instance, the claim that the transfected sensor differs from an endogenous NCT protein, YAP, after cell density treatment, is hard to detect in their images. In Figure 4, comparing YAP and Syncet in C26 cells, YAP appears uniformly nuclear at high cell density, potentially more nuclear than the synthetic sensor, which is not coherent with their claim.*

      • *

      Regarding the concern of the minor changes seen in images, please refer to our full response to the reviewer comments above. Regarding the comparison between sencyt and YAP, we want to clarify that in our manuscript we do not compare the absolute values of nuclear localization between YAP and sencyt. As the reviewer notes, these are two different proteins, so which one is more nuclear does not really provide useful information. So whether YAP is more or less nuclear than sencyt is unrelated to (not incoherent with) our claim. What we state in figure 4 is that YAP responds to cell density, whereas sencyt does not. This is clear from the quantifications and also from the images.

      • *
      • From the Hippo perspective, there is really an unusual amount of nuclear YAP left in their cells. This should be almost completely cytoplasmic from prior contact inhibition studies in the Hippo field. Syncet could be simply less sensitive than YAP in these borderline conditions. Although there's a more noticeable cytoplasmic noise in dense cells with YAP compared to Syncet, this could be attributed to several factors, including differences in protein degradation rates, which I suspect to be quicker for a synthetic protein. From a technical perspective it is complex to get strong conclusions after comparing something so unrelated with each other. One is a live GFP detection and the other is a staining by immunofluorescence. the nature of the background is also different and so conclusions from comparisons between unrelated systems is not justified. *
      • *

      In conditions of high density, average YAP ratios are close to one (zero in logarithmic scale, as reported in the figures) for MCF10A cells, so there is no nuclear localization. This is similar to what we and others have previously reported in similar conditions (Elosegui Artola et al 2017, Kechagia et al. 2023, for example). In C26 cells, YAP levels at high density are a bit higher. This is likely due to their mesenchymal nature, and therefore diminished cell-cell contact inhibition (as assessed in detail in this revision). This in fact further suggests that the response of YAP to cell-cell contacts is different from a mere mechanical factor, supporting our hypothesis. Regarding the issue of noise, background noise is removed from quantifications, and potential noise coming from non-specificities or autofluorescence is also cancelled by the fact that we compute fluorescence ratios between nucleus and cytoplasm (and not absolute values). Thus, we don't think noise is an issue. Further, we note again that we do not directly compare values between sencyt and yap.

      * This suggests caution on what is heralded as the main claim here put forward. *

      * Reviewer 1: *

      *I do have some sympathy with R2s comments in the consultation. I agree that showing that NCT is mechanosensitive in an epithelium is not new. I also agree that sometimes it is difficult to see the quantitative differences by eye. This second point could be addressed by including more details of the segmentation and analysis in the supplemental material (along with some example images). *

      • *

      We thank the reviewer for the suggestions. Regarding the novelty, please see above for a detailed discussion, and also the comments of reviewer 3 below (previous work studied not NCT but transcription factors, affected by many parameters). Regarding quantitative differences, we have now addressed this issue by showing images in grayscale rather than green, and also by replacing one example cell in figure 1 which indeed did not reflect the average measured trends. We now also show examples of 3D rendered images of the nuclei in different conditions. We have also gone through the methods and clarified in detail how ratios are calculated, the segmentation procedure is also explained in detail.

      * Regarding novelty, I would be interested to know if R2 thinks that there are experiments that the authors could do to improve the work. Or do they need to simply tone down their claims? It's perfectly acceptable to publish a well characterised tool with a series of observations and it's beneficial to the community to do so.*

      • Reviewer 3 *

      * Thanks to Reviewers #1 and #2 for using this consultation option; I truly appreciate their feedback on my comments and find it extremely valuable. I agree with Reviewer #1 that the method proposed here is relatively simple. Transfecting cells and conducting live fluorescent imaging can hardly be considered difficult. I believe the construct used/designed by the authors is the main advantage as it provides a specific way to quantitatively assess NCT and not limit the analysis to a single nuclear protein (such as YAP). Reviewer #2 suggests using immunofluorescence staining of YAP or live imaging of fusion fluorescent protein (following transfection) to analyze NCT, but this approach would yield a readout not only based on NCT but also on the many other interacting partners/mechanisms that regulate the candidate localization, resulting in an unspecific readout (and similar transfection/live imaging set-up). *

      • *

      We thank the reviewer for this comment, we fully agree and have elaborated on this in our responses above.

      * Regarding the impact of the study, I agree that it is certainly not as impactful as previous publications on this topic. Although I find reviewer#2 argument on Yap irrelevant, as YAP is not the main focus of this paper. Some experiments have been done with cells of epithelial origin, but NCT mechanosensitivity has not been clearly tested in epithelial monolayer, which is the main claim of the proposed study here. The 2017 Cell paper focused on YAP transport into the nucleus (and not NCT in general) and they showed a correlation between YAP nuclear localization and traction force in MCF10A. I am not sure if one would say that "NCT mechanosensitivity has been well demonstrated in epithelial cells" based on this single panel. The impact of the proposed study is certainly not outstanding but offering a thorough analysis in epithelial cells (as monolayers and not as individual cells) and presenting a well-defined experimental approach should be of interest in the field. I agree with comments from reviewer#2 that some reported effects in graph are unclear on main images. More experimental details should hopefully clarify this aspect.*

      • *

      We fully agree with the reviewer. Regarding quantitative differences, we have now addressed this issue by showing images in grayscale rather than green, and also by replacing one example cell in figure 1 which indeed did not reflect the average measured trends.

    1. Author response:

      The following is the authors’ response to the original reviews.

      (1) The authors should show i) whether the variants exhibit the same surface expression as wildtype and ii) whether changes of surface expression (e.g. wt transporter expressed low and high) alters growth rates under conditions where growth depends on amino acid uptake. The authors say that the uptake of radioactive substrate and the overall fitness coincide (Figures 5 and 6), but it would be good to quantify the correlation, perhaps by using a scatterplot and linear regression.

      We thank the reviewer for the questions and proposals. The comparison of the surface expression between the transporter-expressing variants was added to the manuscript (Figure 3- Figure supplement 1 and 2). In the case of the AGP1 variants it was calculated that surface expression between the evolved mutants and the wild-type is similar, indicating that the transporter overexpression has no impact on the growth rate per se. The same analysis for the PUT4 variants showed significant difference, with the PUT4-S variant seemingly expressed more than the wild-type. However, that does not seem to affect the uptake effect of the mutation in the cases of the original substrates of Ala, Gly and GABA, since in those cases the transporter activity for the evolved variant is substantially decreased (Figure 5). Thus, the variation on the surface expression between the mutant and the wild-type, which could be attributed to the small sample size and the inherent limitations of the analysis (imaging of a culture with cells in different planes), is not expected to interfere with the reported results.

      Additionally, a scatterplot accompanied with a linear regression curve describing the connection between the overall fitness and uptake of 2 mM radioactive substrates was added to the manuscript, as advised (Figure 5- Figure supplement 2). In both cases of 2 mM Phe or Glu, the regression model explains 60-70% of the variation observed in the uptake rate of the amino acids by the different variants if changes in the uptake rate are dependent on changes in the fitness.

      (2) The authors should further investigate to what extent the (over)expression of wildtype versus variant transporters impacts growth rates. I would recommend such experiments being done under conditions where nitrogen uptake does not depend on amino acid uptake. I could imagine that some of the fitness data are confounded by the general effects of mutations on growth rates. More concretely, I could imagine that overexpression of e.g. the AGP1-G variant is less of a burden for the yeast cells and would allow to grow them better in general. This could explain why its overall fitness is close to wt, whereas other variants exhibit diminished fitness (Fig. 4A).

      The growth curves of all transporter variant cultures in the absence of selection for amino acid uptake have been presented in Figure 4 - Supplement figure 1. As proposed, the growth rates of the variants in medium with ammonium as nitrogen source were calculated and presented in Figure 3- Supplement figure 1 and 2. For both cases of AGP1 and PUT4 expressing variants, statistical analysis showed no significant difference between the mutants and the wild-type.

      (3) It is quite remarkable that the PUT4-S variant has such a dramatically enlarged substrate spectrum. In addition, the fitness losses for Alanine and GABA are rather small. This striking finding asks the question of why yeast has not evolved this much better/more efficient variant in the first place?

      We thank the reviewer for this very good question. We now included an explanation in the Discussion, but to give a short answer here: One should keep in mind that we used a 10-gene deletion strain to select for given mutants. Wild-type cells have a wide spectrum of substrates through the use of many amino acid transporters, and their regulation is intricately tuned to achieve optimum transport under any environmental circumstance. Broadening the spectrum of a single transporter thus would not lead to increased fitness. On the contrary, it would probably throw off this fine balance.

      (4) It would be generally interesting which types of selections (transporter/amino acid combinations) were tried (maybe as part of the methods section). I could imagine that the examples that are shown in the paper are the "tip of the iceberg", and that many other trials may have failed either because the cultures died, or the identified clones would grow faster due to mutations outside of the plasmid. It would be helpful for researchers planning such experiments in the future to be made aware of potential stepping stones.

      The issues raised here are spot-on, as we actually did test the evolution of PUT4 towards transport of other amino acids than the two mentioned in the report. Aside from the successful Asp and Glu, we ran parallel cultures selecting for transport of Gln, Thr, Trp, Tyr, and Cit. Neither of these evolution regimes led to increased growth phenotypes that were linked to the evolved gene, and we did not investigate these cultures further. At this point, we cannot fully explain this result, which is why we decided to omit it from the report. The L207S variant of PUT4 was later shown to indeed support growth on Gln, Thr, and Cit. Therefore, we speculate that the reason for not evolving this mutant in the respective evolution cultures was that the fitness gain in these amino acids was not large enough to be sufficiently enriched in the course of the evolution trial. Given that the Δ10AA strain still harbors nine amino acid transporter genes in its genome, it is conceivable that upregulation of some of these genes causes growth in some amino acids, prohibiting the selection of mutations in PUT4 (e.g., by mutations outside the plasmid, as the reviewer aptly suggested). We deemed these (negative) results not appropriate for the manuscript, as our main focus was characterizing the fitness effects of single mutations, not the laboratory evolution process of obtaining the mutants.

      (5) The authors took a genetic gain-of-function approach based on random mutagenesis of the transporter. In such approaches, it is difficult to know which mutation space is finally covered/tested, and information that can be gained from loss-of-function analyses is missed. Accordingly, the outcome is somewhat anecdotal. To provide an idea of the mutational landscape accessible, the authors could perform NGS of cultures without any selective pressure, and report the distribution of missense variants in the population.

      We very much appreciate the interest in the details of the mutagenesis. Based on the information given in the original OrthoRep publications (e.g., Ravikumar et al., DOI: 10.1016/j.cell.2018.10.021; mutation rate approx. 10-5 per generation and nucleotide), we calculated the expected number of mutations per passage in our experiments. For AGP1, it is about 5000 mutational events per passage (10 mL culture volume and 1:200 dilution), and for PUT4, it is about 1000 mutational events per passage (2 mL culture volume and 1:100 dilution). At a gene length of about 2000 bp, we expect to cover most single mutations already in the first or second passage (in the absence of selection). This is reflected in the result that the strongly beneficial mutation L207S in PUT4 was recovered in every selection on Asp or Glu we tested. We included this information in the Methods section.

      That said, the present study was consciously designed to research gain-of-function mutations, as we wanted to know if and how membrane transporters can evolve new substrate specificities without losing the original functions. Our approach was chosen to reflect as close as possible a natural scenario where a microorganism encounters a new ecological niche (a new nutrient to be transported). At the same time, we included selective pressure to keep the capacity to thrive in the original niche (to assimilate an ancestral nutrient). This approach is designed to specifically select against any loss-of-function mutations, which is in line with most modern theories about evolution of protein function (excellently reviewed in Soskine and Tawfik, DOI: 10.1038/nrg2808). We find that this approach gives a good idea how transporters could evolve new functions in a natural setting. By engineering single mutations in the wild-type background of the transporters, we show the fitness effects of different single mutations - this finding thus does not depend on the mutational landscape that is covered in the experiment.

      (6) The authors do not discuss the impact of these mutations on transport rates/kinetics, which are known to play a role in substrate selection in solute carriers (https://www.nature.com/articles/s41467-023-39711-y). Do the authors think ligand binding/recognition is more important than kinetic selection in the evolution of function?

      Indeed, the observed phenotypes can stem from both changes in transport rate and changes in substrate binding. In our opinion, both are perfectly possible explanations for the behavior of evolved transporter variants. We are not discussing this in the manuscript as the weak transport of the novel substrates in the wild-type transporters did not allow us to unambiguously assign one or the other. Yet, we can lend minor circumstantial evidence pointing towards substrate affinity being the more important factor in evolving a new activity in transporters: Overall transport rate (for original substrates) declined in most evolved transporters. Therefore, it is a bit less likely that improved transport rate allowed novel substrates to be used as a nutrient. However, this is not to say that both processes can occur (even side by side).

      (7) Ultimately, what are the selective pressures that drive transporter function? The authors pose this question but don't fully develop the idea. Would promiscuous variants still be selected for if the limiting nitrogen source was taken up by the cell via a different pathway (i.e. ammonium or perhaps arginine)?

      Evolution and regulation of transporters is a very complex system, and we simplify this system in our single-transporter/single-amino acid approach. In nature, the selective forces are assumed to be much smaller than in our system, and multiple selective pressures might occur at the same time (maybe even in opposite directions). Therefore, such predictions are beyond the scope of the present study. To put it shortly, yeasts (and other organisms) have evolved the capacity to transport all natural amino acids. Yet, to actually allow fine-tuned regulation of transport of each individual amino acid, narrow- and broad-range transporters have evolved, including a lot of redundancy. This means that the question posed cannot be answered by yes or no, but by “it depends”.

      (8) Amino acids are a special class of metabolites, in that they all have the same basic structure. Thus, transport systems really only need to recognize the amino and carboxyl groups with high fidelity, and can modulate the side chain binding site to increase specificity. This was demonstrated in a bacterial APC transporter (https://www.nature.com/articles/s41467-018-03066-6#Sec2). Is this why the APC fold is largely responsible for AA uptake in biology?

      Indeed, typically, APC-type amino acid transporters bind the amino and carboxyl groups in the same position by backbone interactions. Therefore, this might be an ancestral feature of the APC superfamily and explain why this group represents the main group of amino acid transporters.

      (9) There isn't much discussion on the location of the mutations with respect to binding site vs. gating helices. Are there hotspots of mutations within the APC, and areas where variation is poorly tolerated? It would be helpful to briefly review what is known about mutations that change amino acid specificity in the APC family. My impression is that other studies applying rational mutagenesis have also shown that single-site mutations in the binding pocket alter substrate specificity - are these analogous to the L207 in PUT4? PUT4: I64T comes up in 3 of 5 selections. Did the authors consider a closer analysis of this mutation, and if not, why?

      We agree that it would be helpful to determine hotspots of mutations in APC transporters that lead to changes in selectivity. However, we feel that the current literature does not lend enough data to support an extended analysis of such hotspots. Conversely, the natural sequences of APC transporters are not similar enough to determine which residues are responsible for a certain selectivity profile. There are however some studies on site-directed mutagenesis, as mentioned by the reviewer. A short summary of those is discussed in the revised paper. Interpretation of the previous studies under the light of our results suggests that the evolutionary evolved sites derived in our work play a significant role in substrate selectivity and transporter function within the superfamily of the APC transporters.

      As to the question why we did not include the I64T mutation in our experiments: this mutation lies within the poorly defined N-terminus of the protein, which is not part of the transmembrane core. We therefore deemed this residue as probably not connected to the specificity of the protein; it might be related to the protein’s stability in the cell, as the termini of transporters are known to be important for post-translational regulation, especially vacuolar degradation.

      (10) What do we learn about the APC fold that informs our understanding of where substrate specificity arises in this fold? Do the authors think all SLC folds are equally capable of adaption, or are some more evolutionary-ready than others? An evolutionary analysis of these transporters to gain insights into whether the identified substitutions also occurred during natural evolution under real-life conditions would further strengthen the manuscript. Could the authors provide a sense of how similar the 18 yeast amino acid transporters are, such as sequence alignments or a matrix of pairwise sequence identity/similarity? Are they very diverged, or is the complement of amino acid substrates covered by a rather conserved suite of transporters?

      We do not want to make bold statements about adaptive evolution in other SLC folds, but we consider it not unlikely that a similar approach will lead to similar conclusions in other transporters.<br /> As advised, a pairwise identity matrix was added to the manuscript (Figure 1–figure supplement 2).

      As to the proposed analysis focusing on natural occurrence of the mutations we found: we have indeed looked into this, but have not found evidence of such mutations. This is actually expected, as our selection regime puts “unnatural” selective pressures on a single transporter in isolation, which in reality co-evolved with a whole suite of other transporters that already have the capacity to transport all amino acids. Therefore, it is unlikely that the same mutations would happen in a natural setting. Our study is designed to capture evolution where a completely novel substrate is encountered, for which no transport mechanism has evolved yet.

      (11) Throughout: some of the bar graphs show individual data points, but others do not (Figure 3, Figure 5). These should be shown for all experiments.

      We thank the reviewer for the comment. In the revised version of the manuscript, we included individual data points in all bar graphs.

      (12) For bar graphs in which no indication of significance is shown, does this mean that p>0.05? Comparisons that are not significant (p>0.05) should be indicated as such.

      We thank the reviewer for the comment. In the revised version of the manuscript, we indicated in the legends that in cases of no significant difference (p > 0.05) between the wild-type and the evolved variants, no asterisks are shown.

      (13) Figure 5, Figure 6: Are the three confocal images just three different fields of view? It might be useful to include a zoom-in on a single representative cell, as it is hard for the reader to see to evaluate the membrane localization.

      In the revised version of the manuscript, we clarified that the three confocal images represent three different cultures, as each variant was tested in triplicates. We also included a zoom-in of a representative cell, as suggested.

      (14) In the main text, page 9, the conditions used for each experimental evolution are not clear ("nitrogen limiting mixture of amino acids (1 mM final concentration)". I think this is an important detail, since the mixtures are quite different for the more promiscuous vs. the more selective transporter, and it would be helpful if this was described more clearly in the main text.

      We thank the reviewer for the comment. We have included further clarification in the revised manuscript.

      (15) Figure 1-Supplement 1 and Figure 4 Supplement 4 - can't read the figure labels. Try labeling columns and rows rather than individual plots.

      We have taken the proposal into account and revised the proposed Figures accordingly.

      (16) Page 9: "The transporter gene was sequenced and re-introduced into Delta-10AA cells." Was the plasmid isolated, sequenced, and re-introduced, or was the gene cut-and-pasted into a new vector backbone?

      In the revised manuscript we have clarified that the gene was sequenced and then cloned into the expression vector and re-introduced into naïve Δ10AA cells.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors report a high-quality genome assembly for a member of Xenacoelomorpha, a taxon that is at the center of the last remaining great controversies in animal evolution. The taxon and the species in question have "jumped around" the animal tree of life over the past 25 years, and seemed to have found their place as a sister-group to all remaining bilaterians. This hypothesis posits that the earliest split within Bilateria includes Xenacoelomorpha on the one hand and a clade known as Nephrozoa (Protostomia + Deuterostomia) on the other, and is thus referred to as the Nephrozoa hypothesis. Nephrozoa is supported by phylogenomic evidence, by a number of synapomorphic morphological characters in the Nephrozoa (namely, the presence of nephridia) and lack of some key bilaterian characters in Xenacoelomorpha, and by the presence of unique miRNAs in Nephrozoa.

      The Nephrozoa hypothesis has been challenged several times by the authors' groups who alternatively suggest placing Xenacoelomorpha within Deuterostomia as a sister group to a clade known as Ambulacraria. This hypothesis (the Xenambulacraria hypothesis) is supported by alternative phylogenomic datasets and by the shared presence of a number of unique molecular signatures. In this contribution, the authors aim to strengthen their case by providing full genome data for Xenoturbella bocki.

      The actual sequencing and analysis are technically and methodologically excellent. Some of the analyses were done several years ago using approaches that may now seem obsolete, but there is no reason not to include them. As a detailed report of a newly sequenced genome, the manuscript meets the highest standards.

      The authors emphasize a number of key findings. One is the fact that the genome is not as simple as one might expect from a "basal" taxon, and is on par with other bilaterian genomes and even more complex than the genome of secondarily simplified bilaterians. There is an implicit expectation here that the sister group to all Bilateria would represent the primitive state. This is of course not true, and the authors are aware of this, but it sometimes feels as though they are using this implicit assumption as a straw dog argument to say that since the genome is not as simple as expected, X. bocki must be nested within Bilateria. The authors get around this by acknowledging that their finding is consistent with a "weak version of the Nephrozoa hypothesis", which is essentially the Nephrozoa phylogenetic hypothesis without implicit assumptions of simplicity.

      We were NOT suggesting that Xenacoels are ‘basal’ though others have certainly done so. We were testing, instead, whether their supposed simplicity is reflected in the compostion of the genome.

      Another finding is a refutation of the miRNA data supporting Nephrozoa. This is an important finding although it is somewhat flogging a dead horse, since there is already a fair amount of skepticism about the validity of the miRNA data (now over 20 years old) for higher-level phylogenetics.

      The missing bilaterian microRNAs was one of the early pieces of evidence excluding the Xenacoelomorpha from Nephrozoa. Our new data are an important refutation of this source of evidence and add to the picture that this phylum is not lacking characters of Bilateria as had been suggested (missing micro RNAs Hox genes explicitly interpreted in this way).

      The finding that the authors feel is most important is gene presence-absence data that recovers a topology in which X. bocki is sister to Abulacraria. The problem is that the same tree does not support the monophyly of Xenacoelomorpha. This may be an artifact of fast evolving acoel genomes, as the authors suggest, but it still raises questions about the robustness of the data.

      In sum, the authors' results and analyses leave an open window for the Xenambulacraria hypothesis, but do not refute the Nephrozoa hypothesis. The manuscript is a valuable contribution to the debate but does not go a significant way towards its resolution.

      The manuscript has gone through several rounds of review and revision on a preprint server and is thus fairly clear of typos, inconsistencies and lack of clarity. The authors are honest and open in their interpretation of the results and their strengths.

      We thank the reviewer for their assessment of our manuscript. We have responded to some of the points they make above. As there were no specific points to edit or change raised by reviewer 1, we are replying in detail only to reviewer 2. We like to note that we have modified the text and thus focus of our manuscript in accordance to with what we think reviewer 1 is suggesting in the last two paragraphs of their review.

      Reviewer #2 (Public Review):

      The manuscript describes the genome assembly and analysis of Xenoturbella bocki, a worm that bears many morphological features ascribed to basal bilateria. The authors aim to analyse this genome in an attempt to determine the phylogenetic position of X. bocki as a representative of Xenacoelomorpha and its associated acoelomorphs. In doing so, they want to inform the debate as to whether xenacoelomorph belong among, or is in fact paraphyletic to all bilaterians.

      This paper presents a high-quality assembly of the X. bocki genome. By virtue of the phylogenetic position of this species, this genome has considerable scientific interest. This assembly appears to be highly complete and is a strength of the paper. The further characterisation of the genome is well executed and presented. Solid results from this paper include a comprehensive description of the Hox genes, miRNA and neruopeptide repertoire, as well as a description of the linkage group and how they relate to the ancestral linkage groups.

      Where this paper is weaker is that for the central claims and questions of this paper, i.e,. the question of the phylogenetic position of xenacoelomorph and whether X. bocki is a slowly evolving, but otherwise representative member of this clade, remains insufficiently resolved.

      The authors have achieved the goal of describing the X. bocki genome very well. By contrast, it is unclear, based on the presented evidence, whether xenacoelomorph is truly a monophyletic group. The balance of the evidence seems to suggest that the X. bocki genome belongs within the bilateria group. However, it is unclear as to what is driving the position of the other acoels. Assuming that X. bocki and the other two species in that group are monophyletic, then the evidence will favour the authors' conclusion (but without clearly rejecting the alternatives).

      This paper will likely further animate the debate regarding this basal species, and also questions related to the ancestral characters of bilateria as a whole. In particular the results from the HOX and paraHOX clusters, may provide an interesting counterpoint to the previous results based on the acoels.

      We thank the Reviewer for their extended comments on our manuscript. We would firstly like to point out that our work was not aiming to resolve the phylogenetic position of X. bocki. We discussed this question at length, as it was and is a major and important question in evolutionary biology, however we think that we had phrased any conclusions in this regard very cautiously as we are well aware of limitations in our data to resolve the conundrum.

      In this revision we have further modified our text, specifically in the Introduction and Abstract, to make it clear that we are contributing to the understanding of the evolution and biology of a fascinating organism that cannot easily be cultured in the laboratory.

      In addition, we have supplied more explanation on why Xenacoelomorpha are generally seen as a monophyletic group and which lines of evidence point to this. Again, it should be noted here that colleagues who regard the Nephrozoa hypothesis as true, do not doubt the monophyly of Xenacoelomorpha.

    1. Scene One: A Typical Day in English Class, Tuesday, 12:20 p.m.When I walk into English class, there are only two students in the classroom; the tables are set up in a U-shape. The room is not organized, your desk is messy, and the room has trash ever ywhere. There is one TV in the back of the room. The room smells like scented board markers. I walk to my seat and wait for you to get ever yone settled in the classroom. After more students arrive, you ask us to read our independent reading book for about 25 minutes. Some of us do what you ask while you work on your computer. Then three students get kicked out because they didn’t do what you wanted them to do, they were talking back, or maybe you were just having a bad day. We don’t have a jour-nal to write about our books and you do not ask us what we are reading dur-ing this time. When independent reading time is over, you tell us to take out our Hamlet books. We read Hamlet as a class for the rest of the period. While we are reading, we have to take notes about what is happening or write sum-maries in our Hamlet notebook. You tell us what you think about the text and what is happening in the play. Most often, we simply write what you tell us to write. This happens ever y single day. Class is over and you didn’t assign any homework — you rarely do

      The disorderly environment suggests a lack of organization and may impact the learning atmosphere negatively. While reading "Hamlet" provides valuable literary exposure, the lack of student input or discussion beyond teacher-directed notes may limit critical thinking and analysis.

    2. The Letter-Writing Process with StudentsI wanted to do this project not only for the experience of improving my writ-ing but also I think that the students’ voice is not always heard entirely, even through dialogue. I feel that by doing this journal we can make a difference with our personal experience and touch the heart of someone who is willing to stand by us. I also wanted to get the attention of other students who may be feel-ing the same frustration I have felt

      In the letter-writing process with students, Rashida Registe expresses her motivation for the project. She sees it as an opportunity not only to enhance her writing skills but also to amplify the voices of students, which she feels may not always be fully heard even through dialogue. Rashida believes that by sharing their personal experiences through the journal, they can make a difference and touch the hearts of those willing to support them.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study assesses homeostatic plasticity mechanisms driven by inhibitory GABAergic synapses in cultured cortical neurons. The authors report that up- or down-regulation of GABAergic synaptic strength, rather than excitatory glutamatergic synaptic strength, is critical for homeostatic regulation of neuronal firing rates. The reviewers noted that the findings are potentially important, but they also raised questions. In particular, the evidence supporting the findings is currently incomplete and demonstration of independent regulation of mEPSCs and mIPSCs is a necessary experiment to support the major claims of the study. 

      We appreciate the detailed, thoughtful assessment of our paper by the reviewers and editors and now submit a revised version that addresses the reviewers’ comments as detailed below in response to each concern. We include a more open discussion of alternative possibilities and have added experiments demonstrating that AMPAergic scaling in our mouse cortical cultures is triggered differently than GABAergic scaling. We treated the cultured neurons exactly as described for triggering GABAergic scaling (20µM CNQX for 24 hours), however this did not trigger AMPAergic upscaling (new Figure 7), even though it did reduce spiking/bursting activity. Below we explain the result further, but ultimately this does demonstrate independent regulation of mEPSCs and mIPSCs as requested by the editor/reviewer (spike reductions induced by CNQX reduced mIPSC amplitude, but had no effect on mEPSC amplitude).

      Reviewer #1 (Public Review):

      While the paper is ambitious in its rhetorical scope and certainly presents intriguing findings, there are several serious concerns that need to be addressed to substantiate the interpretations of the data. For example, the CTZ data do not support the interpretations and conclusions drawn by the authors. Summarily, the authors argue that GABAergic scaling is measuring spiking (at the time scale of the homeostatic response, which they suggest is a key feature of a homeostat) yet their data in figure 5B show more convincingly that CTZ does not influence spiking levels - only one out of four time points is marginally significant (also, I suspect that the bootstrapping method mentioned in line 454-459 was conducted as a pairwise comparison of distributions. There is no mention of multiple comparisons corrections, and I have to assume that the significance at 3h would disappear with correction).

      We certainly understand the criticism here (similar to reviewer 2’s third point). We now discuss these complications in a more detailed description in the manuscript (CTZ section of results and at end of the discussion). First, we are presenting our entire dataset to be as transparent as possible. Unlike most synaptic scaling studies (including our own) that apply drugs to alter activity and assess mPSC amplitude at the final time point, here we are actually showing CTZ’s effect on spiking activity within the culture over time. This is critical because it has informed us of the drug’s true effect on spiking, the variability that is associated with these perturbations, and the ability and timing of the cultured network to homeostatically recover initial levels. This was important because it revealed that the drugs do not always influence activity in the way we assume, and this provides greater context to our results. Second, we are showing all of our data, and presenting it using estimation statistics which go beyond the dichotomy of a simple p value yes or no (Ho J, Tumkaya T, Aryal S, Choi H, Claridge-Chang A. 2019. Moving beyond P values: data analysis with estimation graphics. Nat Methods 16: 565-66). Estimation statistics have become a more standard statistical approach in the last 15 years and is the preferred method for the Society for Neuroscience’s eNeuro Journal. This method shows the effect size and the confidence interval of the distribution. For the 3 hr time point in Fig. 5B the CTZ/ethanol vs. ethanol data points exhibit very little overlap and the effect size demonstrates a near doubling of spike frequency, and the confidence interval shows a clear separation from 0. This was a pairwise comparison as we compared values at each time point after the addition of ethanol or ethanol/CTZ. Third, the plots illustrate an upward trend in spike frequency at 1 and 6 hrs, but that there is also clear variability. It is important to note that these are multiunit recordings and not purely excitatory principal neurons that we target for mPSC recordings. This complication along with the variability inherent in these cultures could make simple comparisons difficult to interpret and we now discuss this (end of discussion). Regardless, we do see some increase in spiking with CTZ and we clearly see increases in mIPSC amplitude, thus providing some support for the idea that spiking could be a critical player in terms of GABAergic scaling, particularly when put in the context of all of our findings. Future work will be necessary to determine how alterations in spiking lead to changes in mIPSC amplitude and we now discuss this (2nd to last paragraph in discussion).

      Then, the fact that TTX applied on top of CTZ drives an increase in mIPSC amplitude is interpreted as a conclusive demonstration that GABAergic scaling is sensing spiking. It is inevitable, however, that TTX will also severely reduce AMAP-R activation - a very plausible alternative explanation is that the augmentation of AMPAR activation caused by CTZ is not sufficient to overcome the dramatic impact of TTX. All together, these data do not provide substantial evidence for the conclusion drawn by the authors. 

      We believe that the most parsimonious explanation for our results is that spiking activity, not AMPAR activation, triggers GABAergic downscaling. GABAergic scaling is no different when comparing 24hr TTX treatment vs TTX+CTZ, and optogenetic restoration of spiking activity while continuing to block AMPAR activation was able to restore GABAergic mPSC amplitudes to control levels. It is important to emphasize that our results with TTX vs. TTX+CTZ are different for GABAergic scaling (no difference in this study) and AMPAergic scaling (CTZ diminished upward scaling in previous study – Fong et al., 2015 - PMID: 25751516) suggesting different triggers for the two forms of scaling. While we strongly believe we have demonstrated that GABAergic downscaling is dependent on spiking (not AMPAergic transmission), we now acknowledge that we cannot rule out the possibility that upward GABAergic scaling may be influenced by AMPAR activation (2nd paragraph discussion), although we have no evidence in support of this.

      Specific points:

      - The logic of the basis for the argument is somewhat flawed: A homeostat does not require a multiplicative mechanism, nor does it even need to be synaptic. Membrane excitability is a locus of homeostatic regulation of firing, for example. In addition, synapse-specific modulation can also be homeostatic. The only requirement of the homeostat is that its deployment subserves the stabilization of a biological parameter (e.g., firing rate). 

      We largely agree with the reviewer and should not have implied that this was a necessary requirement for a spike rate homeostat. What we should have said was that historically this definition has been applied to AMPAergic scaling, which is thought to be a spike rate homeostat. We have now corrected this (introduction and discussion).

      - Line 63 parenthetically references an important, but contradictory study as a brief "however". Given the tone of the writing, it would be more balanced to give this study at least a full sentence of exposition. 

      Agreed, and we have now done this.

      - The authors state (line 11) that expression of a hyperpolarizing conductance did not trigger scaling. More recent work ('Homeostatic synaptic scaling establishes the specificity of an associative memory') does this via expression of DREADDs and finds robust scaling.

      The purpose of citing this study was to argue that the spike rate homeostat hypothesis doesn’t make sense for AMPAergic scaling based on a study that hyperpolarized an individual cell while leaving the rest of the network unaltered and therefore leaving network activity and neurotransmission largely normal. In this previous study scaling was not triggered, suggesting reduced spike rate within an individual cell was insufficient to trigger scaling in that cell. The more recent study mentioned by the reviewer achieved scaling by hyperpolarizing a majority of cells in the network. Importantly, this approach alters neurotransmission throughout the network, making it challenging to isolate the specific contributions of spiking vs. receptor activation. Unlike the previous study, which focused on the impact within individual cells, this newer study involves global alterations in network activity, complicating the interpretation of the role of spiking versus receptor activation in triggering scaling.

      - Supplemental figure 1 looks largely linear to me? Out of curiosity, wouldn't you expect the left end to be aberrant because scaling up should theoretically increase the strength of some synapses that would have been previously below threshold for detection?

      We agree that the scaling ratio plot is largely linear. To be clear, the linearity of the ratio plot was not our point here, rather that there was a positive slope meaning ratios (CNQX mEPSC amplitudes/control mEPSC amplitudes) got bigger for the larger CNQX-treated mEPSCs. Alternatively, a multiplicative relationship where mEPSCs are all increased by a single factor (e.g. 2X) would be a flat line with 0 slope at the multiplicative value (e.g. 2). In terms of the left side of the plot, we do see values that rise abruptly from 1 - this was partially obstructed by the Y axis in this figure and we have adjusted this. This left part of the plot is likely due the CNQX-induced increases in mEPSC amplitudes of mini’s that where below our detection threshold of 5pA, as suggested by the reviewer. Therefore, mini’s that were 4pAs could now be 5pAs after CNQX treatment and these are then divided by the smallest control mEPSCs which are 5 pAs (ratio of 1). We tried to do a better job describing this in the resubmission (1st paragraph of results).

      - Given that figure 2B also shows warping at the tail ends of similar distributions, how is this to be interpreted? 

      The left side of the ratio plot shows evidence consistent with the idea that mIPSCs are dropping into the noise after CNQX treatment (smallest GABA mIPSCs that don’t fall into noise are 5pA and this is divided by the smallest control GABA mPSCs of 5pPA and therefore the ratio is 1). The rest of the distribution will then approach the scaling factor (50% in this case). On the right side of the ratio plot the values appear to slightly increase. We are not sure why this is happening, but it maybe that a small percentage of mIPSCs are not purely multiplicative at 0.5, however the biggest mPSCs can vary to a great degree from one cell to the next and in other cases we do not see this (Figure 4B, Figure 5E). We tried to do a better job describing this in the resubmission (results describing Figure 2).

      - The readability of the figures is poor. Some of them have inconsistent boundary boxes, bizarre axes, text that appears skewed as if the figures were quickly thrown together and stretched to fit. 

      We have adjusted the figures to be more consistent throughout the manuscript.

      - I'm concerned about the optogenetic restoration of activity experiment. Cortical pyramidal neuron mean firing rates are log normally distributed and span multiple orders of magnitude. The stimulation experiments can only address the total firing at a network-level - given than a network level "mean" is meaningless in a lognormal distribution, how are we to think about the effect of this manipulation when it comes to individual neurons homeostatically stabilizing their own activities? In essence, the argument is made at the single-neuron level, but the experiment is conducted with a network-level resolution. 

      As described above, we do not have the capacity to know what the actual firing rate of a particular neuron was before and after perturbing the system, and certainly not for the specific cells we recorded from to obtain mPSC amplitudes, and so we cannot say that we have perfectly restored the original firing rates of neurons. However, there is reason to believe that this is achieved to some extent. Our optogenetic stimulation is only 50-100 ms long activating a subset of neurons. This is sufficient to provide a synaptic barrage that then triggers a full blown network burst where the majority of spikes occur, but this is after the light is off. In other words, the optogenetic light pulse only initiates what becomes a relatively normal network burst that fortunately allows the individual cells to express their relatively normal (pre-drug) activity pattern. In our previous study using optogenetic activity restoration (Fong et al., 2015) we were able to show that this was the case for individual units - the spiking of an individual unit during a burst is similar before and after CNQX/optogenetic stimulation (see Figure 4b and Suppl. Fig 4 in Fong et al. 2015). We are not claiming that we have restored spiking to exactly the pre-drug state, but bring it back toward those levels and we see this is associated with a return of the mIPSC amplitude to near control levels. We now include a brief description of this in the manuscript (results describing Figure 3).

      - Line 198-99: multiplicativity is not a requirement of a homeostatic mechanism.

      - Line 264-265 - again, neither multiplicativity and synaptic mechanisms are fundamentally any more necessary for a homeostatic locus than anything else that can modulate firing rate in via negative feedback. 

      As mentioned above, the multiplicative nature of scaling has been a historical proposal for AMPAergic scaling and we have now found such a relationship for GABAergic scaling. This is important for understanding how this plasticity works, but we agree that it is not necessary for a homeostat and we have adjusted the manuscript accordingly.

      - 277: do you mean AMPAR? 

      We were not clear enough here. We actually do mean GABAR. The idea was that CTZ increases network activity and thus increases both AMPAergic and GABAergic transmission. We have rewritten this part of the discussion to avoid any confusion (2nd paragraph discussion).

      - Example: Figure 1A is frustratingly unreadable. The axes on the raster insets are microscopic, the arrows are strangely large, and it seems unnecessary to fill so much realestate with 4 rasters. Only one is necessary to show the concept of a network burst. The effect of time+CNQX on the frequency of burst is shown in B and C.

      - Example: Figure 2 appears warped and hastily assembled. Statistical indications are shown within and outside of bounding boxes. Axes are not aligned. Labels are not aligned. Font sizes are not equal on equivalent axes. 

      These figures were generated by the estimation statistics website and text may have been resized inappropriately. We have tried to adjust this and now have attempted to standardize the axes text to the best of our ability.

      - The discussion should include mention of the limitations and/or constraints of drawing general conclusions from cell culture. 

      We have added this consideration at the end of the discussion. Further, this is why we cited studies that argue GABAergic neurons have a particularly important role in homeostatic regulation of firing following sensory deprivations in vivo.

      - The discussion should include mention of the role of developmental age in the expression of specific mechanisms. It is highly likely that what is studied at ~P14 is specific to early postnatal development. 

      We now discuss caveats of cortical cultures at the end of the discussion.

      It is essential to ensure that the data presented in the paper adequately supports the conclusions drawn. A more cautious approach in interpreting the results may lead to a stronger argument and a more robust understanding of the underlying mechanisms at play. 

      We have broadened our discussion of alternative interpretations throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      While I am hesitant to judge a paper based on its tone, I would personally recommend revision of some of the subjective words and statements, as the manuscript undermines its own effectiveness by making unnecessarily strong statements. The text repeatedly paints an "either A or B" picture, and if there's any general lesson in biology, it's that it's always A and B. Global, multiplicative glutamatergic scaling could quite conceivably occur alongside GABAergic scaling, as well as synapse-specific homeostatic modifications. It seems that it would be wise to acknowledge that, while the data presented here point in one direction, in vivo results in an adult brain (for example) might present an entirely different set of patterns. This will not only enhance the readability of the paper but also ensure that the scientific community can engage with the work in a constructive and collaborative manner. Again, I present this as only a constructive and supportive suggestion. I am a big fan of work from this laboratory, and I would love to see this paper in an improved form - it's an important set of ideas and I do believe that these data are rigorously collected. 

      We have attempted to provide a more comprehensive interpretation of our results. We agree that a homeostat can come in many flavors, but do believe that GABAergic scaling is strong candidate, whereas AMPAergic scaling does not currently fit such a role. We do now discuss caveats with our work and are open to other interpretations that need to be flushed out in future work.

      Reviewer #2 (Public Review):

      Major points:

      (1) The reason why CNQX does not completely eliminate spiking is unclear (Fig. 1). What is the circuit mechanism by which spiking continues, although at lower frequency, in the absence of AMPA-mediated transmission and what the mechanism by which spiking frequency grows back after 24h (still in the absence of AMPA transmission)?

      Is it possible that NMDA-mediated transmission takes over and triggers a different type of network plasticity?

      The bursting in AMPAR blockade is due to the remaining NMDA receptor-mediated transmission. We showed this in our previous study in Suppl. Figure 2 and 6 of Fong et al., 2015 (PMID: 25751516). Our ability to optically induce normal looking bursts of spikes was also dependent NMDAR activation (Fong et al 2015 and Figure 6 Newman et al., 2015 - PMID: 26140329). Further, in Dr Fong’s PhD dissertation it was shown that the bursting activity was abolished when AMPA and NMDA receptors were both blocked. There are likely many factors that contribute to the recovery of activity, and certainly one of them is likely to be the weakening of inhibitory GABAergic currents as we had mentioned. We have now added the point about NMDARs mediating the remaining bursts in the manuscript (results associated with Figure 1). We are not clear on what the reviewer has in mind in terms of “NMDA-mediated transmission takes over and triggers a different kind of network plasticity”, but we do discuss the possibility that spiking triggers GABAergic scaling through its effect on NMDAergic transmission, which we cannot rule out, but also have no evidence in support of this idea (3rd and 5th paragraph of discussion). We do plan on addressing this in a future work.

      (2) A possible activation of NMDARs should be considered. One would think that experiments involving chronic glutamatergic blockade could have been conducted in the presence of NMDAR blockers. Why this was not the case?

      Unfortunately, it was not possible to optogenetically restore normal bursting in the presence of NMDAR blockade (even when AMPAergic transmission was intact), as NMDARs appeared to be critical for the optical restoration of the normal duration and form of the burst in rat cortical cultures (see Suppl. Figure 6 Fong et al., 2015 Nat Comm and Figure 6 Newman et al., 2015). Even high concentrations of CNQX (40µM) prevented us from restoring spiking in mouse cultures in the current study, which is why we moved to 20µM CNQX for this study. The reviewer raises an excellent point about a possible NMDAR contribution to altered synaptic strength, however. It is likely that NMDAR signaling is reduced in the presence of CNQX since burst frequency was dramatically reduced along with AMPAR-mediated depolarizations. We cannot rule out the possibility that NMDAR signaling could contribute to the alterations in GABAergic mIPSCs and discuss this in the resubmission (3rd and 5th paragraph of the discussion). We had not considered this previously because prior work suggested that 24/48 hour block NMDARs (APV) did not trigger AMPAergic scaling in cortical or hippocampal cultures (see Figure 1 Turrigiano et al., 1998 Nature and Suppl. Figure 4 Sutton et al., 2006 Cell), moreover, our previous study showed that restoring NMDAergic transmission ontogenetically, at least to some extent, had no influence on AMPAergic scaling (Fong et al., 2015).

      Also, experiments with global ChR2 stimulation with coincident pre and postsynaptic firing might also activate NMDARs and result in additional effects that should be taken into consideration for the global scaling mechanism.

      To be clear, our optical stimulation was of short duration (duration 50-100 ms) and was turned off before the vast majority of spiking that occurred in the bursts. So the light flash was a trigger that allowed a relatively normal looking burst to occur after the light was off (see lower panel of Figure 3B optogenetic stimulation – short duration only at onset of burst – we now make this clearer in resubmission). Therefore, we were unlikely to trigger significant synchronous activation that does not normally occur in network bursts.

      (3) Cultures exposed to CTZ to enhance AMPA receptors generated variable results (Fig. 5), somewhat increasing spiking activity in a non-significant manner but, at the same time, strengthening mIPSC amplitude. This result seems to suggest that spiking might be involved in GABAergic scaling, but it does not seem to prove it. Then, addition of TTX that blocked spiking reduced mIPSC amplitude. It was concluded here that the ability of CTZ to enhance GABAergic currents was primarily due to spiking, rather than the increase in AMPA-mediated currents. However, in addition to blocking action potentials, TTX would also prevent activation of AMPARs in the presence of CTZ due to the lack of glutamatergic release. Therefore, under these conditions, an effect of glutamatergic activation on GABAergic scaling cannot be ruled out.

      These concerns were very similar to reviewer 1’s first comments (see above). To be clear we are going a step beyond most scaling studies by assessing MEA-wide firing rate, but this still provides an incomplete picture of the particular cells that we target for patch recordings in terms of their firing before and after a drug. Further, we see considerable variability in effect on firing rate from culture to culture, which we now discuss in the resubmission (final paragraph discussion). The fact that mIPSCs are no different after TTX treatment vs CTZ+TTX treatment suggests that AMPAergic transmission is not so influential on GABAergic downscaling. While the CTZ results are not conclusive by themselves, taken together with the optogenetic results, where restoration of spiking in AMPAR blockade reverses scaling, is most consistent with idea that GABAergic scaling is triggered by spiking rather than AMPAR activation and places GABAergic scaling as a strong candidate as spike rate homeostat. Although we do feel that we have demonstrated that downward GABAergic scaling is dependent on spiking, we cannot rule out the possibility that upward GABAergic scaling could be influenced by AMPAR activation to some extent. We now acknowledge this possibility (2nd paragraph discussion).

      (4) The sample size is not mentioned in any figure. How many cells/culture dishes were used in each condition?

      The individual dots represent either individual cells for mIPSC amplitude or individual cultures in MEA experiments. Number of cultures and cells are now stated in the figure legends.

      (5) Cortical cultures may typically contain about 5-10% GABAergic interneurons and 90-95 % pyramidal cells. One would think that scaling mechanisms occurring in pyramidal cells and interneurons could be distinct, with different impact on the network. Although for whole-cell recordings the authors selected pyramidal looking cells, which might bias recordings towards excitatory neurons, naked eye selection of recording cells is quite difficult in primary cultures. Some of the variability in mIPSC amplitude values (Fig. 2A for example) might be attributed to the cell type? One could use cultures where interneurons are fluorescently labeled to obtain an accurate representation. The issue of the possible differential effects of scaling in pyramidal cells vs. interneurons and the consequences in the network should be discussed.

      We now include this discussion in the resubmission (final paragraph discussion). Briefly, we chose large cells, which will be predominantly glutamatergic neurons as suggested by the reviewer. Ultimately, even among glutamatergic principal cells there may be variability in the response to drug application. All of these issues could contribute to variability and we have expanded our description of the variability in our results, including that based on cellular heterogeneity. 

      Reviewer #2 (Recommendations For The Authors):

      Minor comments –

      Fig S3: Please quantify changes in frequency

      We have done this (Supplemental Figure 5).

      Fig 2: please choose colors with higher contrast for CNQX/TTX

      We have done this.

      Fig. 3C: Why doesn't CNQX+PhotoStim reach control levels of bursting at 2h?

      The program was designed to follow and maintain total spike frequency and so it does a better job at this than maintaining burst frequency.

      Fig. 5A: please include a comparison between control and Ethanol

      We now do this in Figure 5C. Both around 26pAs.

      Fig. 5C: where is the Etoh condition?

      We have made this figure more clear in terms of controls (Figure 5C & D).

      Reviewer #3 (Public Review):

      This paper concerns whether scaling (or homeostatic synaptic plasticity; HSP) occurs similarly at GABA and Glu synapses and comes to the surprising conclusion that these are regulated separately. This is surprising because these were thought to be co-regulated during HSP and in fact, the major mechanisms thought to underlie downscaling (TTX or CNQX driven), retinoic acid and TNF, have been shown to regulate both GABARs and AMPARs directly. (As a side note, it is unclear that the manipulations used in Josesph and Turrigiano represent HSP, and so might not be relevant). Thus the main result, that GABA HSP is dissociable from Glu HSP, is novel and exciting. This suggests either different mechanisms underlie the two processes, or that under certain conditions, another mechanism is engaged that scales one type of synapse and not the other.

      However, strong claims require strong evidence, and the results presented here only address GABA HSP, relying on previous work from this lab on Glu HSP (Fong, et al., 2015). But the previous experiments were done in rat cultures, while these experiments are done in mice and at somewhat different ages (DIV). Even identical culture systems can drift over time (possibly due to changes in the components of B27 or other media and supplements). Therefore it is necessary to demonstrate in the same system the dissociation. To be convincing, they need to show the mEPSCs for Fig 4, clearly showing the dissociation. Doing the same for Fig 5 would be great, but I think Fig 4 is the key.

      We understand the concern of the reviewer as we do see significant variability within our cultures and they were plated in different places, by different people, in different species (rat vs mouse). Therefore, we have attempted to redo the study on AMPAergic scaling on these mouse cortical neurons. Surprisingly, we found that 20µM CNQX did not trigger AMPAergic upscaling (new Figure 7), even though it did reduce spiking activity and was able to produce GABAergic downscaling. We did not carry out the optogenetic restoration of activity, because we did not trigger upscaling. The result does however, show that the reductions in spiking/bursting that trigger GABAergic downscaling, did not trigger AMPAergic upscaling and therefore dissociate the 2 forms of scaling in these mouse cultures. We do not know why 20 µM CNQX did not trigger scaling in these cultures since it does reduce spiking and AMPAR activation. In the Fong study we used 40µM CNQX because intracellular recordings from rat cortical neurons suggested this was required to completely block AMPAergic currents. Our initial studies in the current manuscript examining GABAergic scaling in mouse cortical cultures used 40µM CNQX, however, this concentration of CNQX prevented us from restoring spiking through optogenetic activation, so we reduced our concentration to 20µM CNQX, which did trigger GABAergic downscaling and allowed the restoration of spiking. We now show and discuss this result (Figure 7 and 3rd paragraph discussion).

      The paper also suggests that only receptor function or spiking could control HSP, and therefore if it is not receptor function then it must be spiking. This seems like a false dichotomy; there are of course other options. Details in the data may suggest that spiking is not the (or the only) homeostat, as TTX and CNQX causes identical changes in mIPSC amplitude but have different effects on spiking. Further, in Fig 5, CTZ had a minimal effect on spiking but a large effect on mIPSCs. Similar issues appear in Fig 6, where the induction of increased spiking is highly variable, with many cells showing control levels or lower spiking rates. Yet the synaptic changes are robust, across all cells. Overall, this is not persuasive that spiking is necessarily the homeostat for GABA synapses.

      Together our results argue against AMPAR or GABAR activation as a trigger for GABAergic scaling and that this is different than our results for AMPAergic scaling. These points alone are important to recognize. While changes in spiking do not perfectly follow the changes in GABAergic scaling they do always trend in the right direction. As mentioned above, total spiking activity is only one measure of spiking. It is possible that these drugs alter the pattern of spiking that translates into an altered calcium transients which may be important for triggering the plasticity. Further, we acknowledge that we cannot rule out a role for NMDARs contributing to GABAergic scaling (3rd and 5th paragraph of discussion). Based on the variability that we observe and the nature of our MEA recordings we cannot precisely determine how the total activity or pattern of activity changes with drug application in the specific cells that we target for whole cell recordings, and this is now discussed (final paragraph of discussion). Again, it is important to note that we are going a step beyond most homeostatic plasticity studies that add a drug and simply assume it is having an effect on spiking (e.g. CNQX was initially thought to completely abolish spiking, but clearly does not). However, we believe that the most parsimonious explanation of our results supports our proposal that GABAergic scaling is a strong candidate as a spike rate homeostat. Regardless, in the resubmission we have included a broader discussion about these possibilities, and recognize that we cannot rule out the possibility that AMPAergic transmission could contribute to upward GABAergic scaling (2nd paragraph discussion).

      The paper also suggests that the timing of the GABA changes coincides with the spiking changes, but while they have the time course of the spiking changes and recovery, they only have the 24h time point for synaptic changes. It is impossible to conclude how the time courses align without more data.

      We can only say that by the 24 hour CNQX time point, when overall spiking is recovered in some but not all cultures and bursts have not recovered, that GABAergic scaling has already occurred. We now state this more clearly in the resubmission (near the end of the 2nd paragraph of the discussion).

      Reviewer #3 (Recommendations For The Authors):

      The statistics are inadequately described. The full information including actual p values should be given, particularly for the non-significant trends reported.

      We have done this in Figure legends.

      The abstract and introduction give the impression that GABA and Glu HSP are independent, though most work links them as occurring simultaneously and in a coordinated fashion to achieve homeostasis.

      While it is true that many studies have triggered both forms of scaling with activity or transmission blockade, these studies have not addressed whether these forms of scaling are actually triggered in the same way mechanistically, except potentially for the one study that we mentioned (Joseph et al.,). Our results suggest they are independent. We now do mention the idea that these two forms of scaling have been assumed to be commonly triggered (3rd paragraph introduction).

      The data in Fig 6 is presented as if BIC treatment is a novel result, although BIC/Gabazine/PTX have been used to induce down-scaling in many previous papers. While it's good to have the results, they should be put in proper context. As suggested in the paper, testing if decreased GABAR function would lead to upscaling does not make sense given all the previous data. 

      Figure 6 shows GABAergic upscaling in response to GABAR block (bicuculline), but we are aware of only two other studies that looked at GABAergic scaling after treating with a GABAR blocker and they found upscaling but this was in hippocampal cultures, not cortical cultures (Peng et al., 2010 - PMID: 21123568, Pribiag et al., 2014 - PMID: 24753587). We now mention this in the results section describing Figure 6. While many studies have blocked GABARs and find AMPAergic downscaling, we are addressing the triggers for GABAergic scaling in Figure 6.

      Is Fig S4B mislabeled? The title says spike rate but the graph axis says burst frequency.

      The reviewer is correct and we have now adjusted this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Protein conformational changes are often critical to protein function, but obtaining structural information about conformational ensembles is a challenge. Over a number of years, the authors of the current manuscript have developed and improved an algorithm, qFit protein, that models multiple conformations into high resolution electron density maps in an automated way. The current manuscript describes the latest improvements to the program, and analyzes the performance of qFit protein in a number of test cases, including classical statistical metrics of data fit like Rfree and the gap between Rwork and Rfree, model geometry, and global and case-by-case assessment of qFit performance at different data resolution cutoffs. The authors have also updated qFit to handle cryo-EM datasets, although the analysis of its performance is more limited due to a limited number of high-resolution test cases and less standardization of deposited/processed data.

      Strengths:

      The strengths of the manuscript are the careful and extensive analysis of qFit's performance over a variety of metrics and a diversity of test cases, as well as the careful discussion of the limitations of qFit. This manuscript also serves as a very useful guide for users in evaluating if and when qFit should be applied during structural refinement.

      Reviewer #2 (Public Review):

      Summary

      The manuscript by Wankowicz et al. describes updates to qFit, an algorithm for the characterization of conformational heterogeneity of protein molecules based on X-ray diffraction of Cryo-EM data. The work provides a clear description of the algorithm used by qFit. The authors then proceed to validate the performance of qFit by comparing it to deposited X-ray entries in the PDB in the 1.2-1.5 Å resolution range as quantified by Rfree, Rwork-Rfree, detailed examination of the conformations introduced by qFit, and performance on stereochemical measures (MolProbity scores). To examine the effect of experimental resolution of X-ray diffraction data, they start from an ultra high-resolution structure (SARS-CoV2 Nsp3 macrodomain) to determine how the loss of resolution (introduced artificially) degrades the ability of qFit to correctly infer the nature and presence of alternate conformations. The authors observe a gradual loss of ability to correctly infer alternate conformations as resolution degrades past 2 Å. The authors repeat this analysis for a larger set of entries in a more automated fashion and again observe that qFit works well for structures with resolutions better than 2 Å, with a rapid loss of accuracy at lower resolution. Finally, the authors examine the performance of qFit on cryo-EM data. Despite a few prominent examples, the authors find only a handful (8) of datasets for which they can confirm a resolution better than 2.0 Å. The performance of qFit on these maps is encouraging and will be of much interest because cryo-EM maps will, presumably, continue to improve and because of the rapid increase in the availability of such data for many supramolecular biological assemblies. As the authors note, practices in cryo-EM analysis are far from uniform, hampering the development and assessment of tools like qFit.

      Strengths

      qFit improves the quality of refined structures at resolutions better than 2.0 A, in terms of reflecting true conformational heterogeneity and geometry. The algorithm is well designed and does not introduce spurious or unnecessary conformational heterogeneity. I was able to install and run the program without a problem within a computing cluster environment. The paper is well written and the validation thorough.

      I found the section on cryo-EM particularly enlightening, both because it demonstrates the potential for discovery of conformational heterogeneity from such data by qFit, and because it clearly explains the hurdles towards this becoming common practice, including lack of uniformity in reporting resolution, and differences in map and solvent treatment.

      Weaknesses

      The authors begin the results section by claiming that they made "substantial improvement" relative to the previous iteration of qFit, "both algorithmically (e.g., scoring is improved by BIC, sampling of B factors is now included) and computationally (improving the efficiency and reliability of the code)" (bottom of page 3). However, the paper does not provide a comparison to previous iterations of the software or quantitation of the effects of these specific improvements, such as whether scoring is improved by the BIC, how the application of BIC has changed since the previous paper, whether sampling of B factors helps, and whether the code faster. It would help the reader to understand what, if any, the significance of each of these improvements was.

      Indeed, it is difficult (embarrassingly) to benchmark against our past work due to the dependencies on different python packages and the lack of software engineering. With the infrastructure we’ve laid down with this paper, made possible by an EOSS grant from CZI, that will not be a problem going forward. Not only is the code more reliable and standardized, but we have developed several scientific test sets that can be used as a basis for broad comparisons to judge whether improvements are substantial. We’ve also changed with “substantial improvement” to “several modifications”  to indicate the lack of comparison to past versions.

      The exclusion of structures containing ligands and multichain protein models in the validation of qFit was puzzling since both are very common in the PDB. This may convey the impression that qFit cannot handle such use cases. (Although it seems that qFit has an algorithm dedicated to modeling ligand heterogeneity and seems to be able to handle multiple chains). The paper would be more effective if it explained how a user of the software would handle scenarios with ligands and multiple chains, and why these would be excluded from analysis here.

      qFit can indeed handle both. We left out multiple chains for simplicity in constructing a dataset enriched for small proteins while still covering diversity to speed the ability to rapidly iterate and test our approaches. Improvements to qFit ligand handling will be discussed in a forthcoming work as we face similar technical debt to what we saw in proteins and are undergoing a process of introducing “several modifications” that we hope will lead to “substantial improvement” - but at the very least will accelerate further development.

      It would be helpful to add some guidance on how/whether qFit models can be further refined afterwards in Coot, Phenix, ..., or whether these models are strictly intended as the terminal step in refinement.

      We added to the abstract:

      “Importantly, unlike ensemble models, the multiconformer models produced by qFit can be manually modified in most major model building software (e.g. Coot)  and fit can be further improved by refinement using standard pipelines (e.g. Phenix, Refmac, Buster).”

      and introduction:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      and results:

      “This model can then be examined and edited in Coot12 or other visualization software, and further refined using software such as phenix.refine, refmac, or buster as the modeler sees fit.”

      and discussion

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore generally also be deposited in the PDB using the standard deposition and validation process.”

      Appraisal & Discussion

      Overall, the authors convincingly demonstrate that qFit provides a reliable means to detect and model conformational heterogeneity within high-resolution X-ray diffraction datasets and (based on a smaller sample) in cryo-EM density maps. This represents the state of the art in the field and will be of interest to any structural biologist or biochemist seeking to attain an understanding of the structural basis of the function of their system of interest, including potential allosteric mechanisms-an area where there are still few good solutions. That is, I expect qFit to find widespread use.

      Reviewer #3 (Public Review):

      Summary:

      The authors address a very important issue of going beyond a single-copy model obtained by the two principal experimental methods of structural biology, macromolecular crystallography and cryo electron microscopy (cryo-EM). Such multiconformer model is based on the fact that experimental data from both these methods represent a space- and time-average of a huge number of the molecules in a sample, or even in several samples, and that the respective distributions can be multimodal. Different from structure prediction methods, this approach is strongly based on high-resolution experimental information and requires validated single-copy high-quality models as input. Overall, the results support the authors' conclusions.

      In fact, the method addresses two problems which could be considered separately:

      - An automation of construction of multiple conformations when they can be identified visually;

      - A determination of multiple conformations when their visual identification is difficult or impossible.

      We often think about this problem similarly to the reviewer. However, in building qFit, we do not want to separate these problems - but rather use the first category (obvious visual identification) to build an approach that can accomplish part of the second category (difficult to visualize) without building “impossible”/nonexistent conformations - with a consistent approach/bias.

      The first one is a known problem, when missing alternative conformations may cost a few percent in R-factors. While these conformations are relatively easy to detect and build manually, the current procedure may save significant time being quite efficient, as the test results show.

      We agree with the reviewers' assessment here. The “floor” in terms of impact is automating a tedious part of high resolution model building and improving model quality.

      The second problem is important from the physical point of view and has been addressed first by Burling & Brunger (1994; https://doi.org/10.1002/ijch.199400022). The new procedure deals with a second-order variation in the R-factors, of about 1% or less, like placing riding hydrogen atoms, modeling density deformation or variation of the bulk solvent. In such situations, it is hard to justify model improvement. Keeping Rfree values or their marginal decreasing can be considered as a sign that the model is not overfitted data but hardly as a strong argument in favor of the model.

      We agree with the overall sentiment of this comment. What is a significant variation in R-free is an important question that we have looked at previously (http://dx.doi.org/10.1101/448795) and others have suggested an R-sleep for further cross validation (https://pubmed.ncbi.nlm.nih.gov/17704561/). For these reasons it is important to get at the significance of the changes to model types from large and diverse test sets, as we have here and in other works, and from careful examination of the biological significance of alternative conformations with experiments designed to test their importance in mechanism.

      In general, overall targets are less appropriate for this kind of problem and local characteristics may be better indicators. Improvement of the model geometry is a good choice. Indeed, yet Cruickshank (1956; https://doi.org/10.1107/S0365110X56002059) showed that averaged density images may lead to a shortening of covalent bonds when interpreting such maps by a single model. However, a total absence of geometric outliers is not necessarily required for the structures solved at a high resolution where diffraction data should have more freedom to place the atoms where the experiments "see" them.

      Again, we agree—geometric outliers should not be completely absent, but it is comforting when they and model/experiment agreement both improve.

      The key local characteristic for multi conformer models is a closeness of the model map to the experimental one. Actually, the procedure uses a kind of such measure, the Bayesian information criteria (BIC). Unfortunately, there is no information about how sharply it identifies the best model, how much it changes between the initial and final models; in overall there is not any feeling about its values. The Q-score (page 17) can be a tool for the first problem where the multiple conformations are clearly separated and not for the second problem where the contributions from neighboring conformations are merged. In addition to BIC or to even more conventional target functions such as LS or local map correlation, the extreme and mean values of the local difference maps may help to validate the models.

      We agree with the reviewer that the problem of “best” model determination is poorly posed here. We have been thinking a lot about htis in the context of Bayesian methods (see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278553/); however, a major stumbling block is in how variable representations of alternative conformations (and compositions) are handled. The answers are more (but by no means simply) straightforward for ensemble representations where the entire system is constantly represented but with multiple copies.

      This method with its results is a strong argument for a need in experimental data and information they contain, differently from a pure structure prediction. At the same time, absence of strong density-based proofs may limit its impact.

      We agree - indeed we think it will be difficult to further improve structure prediction methods without much more interaction with the experimental data.

      Strengths:

      Addressing an important problem and automatization of model construction for alternative conformations using high-resolution experimental data.

      Weaknesses:

      An insufficient validation of the models when no discrete alternative conformations are visible and essentially missing local real-space validation indicators.

      While not perfect real space indicators, local real-space validation is implicit in the MIQP selection step and explicit when we do employ Q-score metrics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A point of clarification: I don't understand why waters seem to be handled differently in for cryo-EM and crystallography datasets. I am interested about the statement on page 19 that the Molprobity Clashscore gets worse for cryo-EM datasets, primarily due to clashes with waters. But the qFit algorithm includes a round of refinement to optimize placement of ordered waters, and the clashscore improves for the qFit refinement in crystallography test cases. Why/how is this different for cryo-EM?

      We agree that this was not an appropriate point. We believe that the high clash score is coming from side chains being incorrectly modeled. We have updated this in the manuscript and it will be a focus of future improvements.

      Reviewer #2 (Recommendations For The Authors):

      - It would be instructive to the reader to explain how qFit handles the chromophore in the PYP (1OTA) example. To this end, it would be helpful to include deposition of the multiconformer model of PYP. This might also be a suitable occasion for discussion of potential hurdles in the deposition of multiconformer models in the PDB (if any!). Such concerns may be real concerns causing hesitation among potential users.

      Thank you for this comment. qFit does not alter the position or connectivity of any HETATM records (like the chromophore in this structure). Handling covalent modifications like this is an area of future development.

      Regarding deposition, we have noted above that the discussion now includes:

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore, generally also be deposited in the PDB using the standard deposition and validation process.”

      Finally, we have placed all PDBs in a Zenodo deposition (XXX) and have included that language in the manuscript. It is currently under a separate data availability section (page XXX). We will defer to the editor as to the best header that should go under.

      - It may be advisable to take the description of true/false pos/negatives out of the caption of Figure 4, and include it in a box or so, since these terms are important in the main text too, and the caption becomes very cluttered.

      We think adding the description of true/false pos/negatives to the Figure panel would make it very cluttered and wordy. We would like to retain this description within the caption. We have also briefly described each in the main text.

      - page 21, line 4: some issue with citation formatting.

      We have updated these citations.

      - page 25, second paragraph: cardinality is the number of members of a set. Perhaps "minimal occupancy" is more appropriate.

      Thank you for pointing this out. This was a mistake and should have been called the occupancy threshold.

      - page 26: it's - its

      Thank you, we have made this change. 

      - Font sizes in Supplementary Figures 5-7 are too small to be readable.

      We agree and will make this change. 

      Reviewer #3 (Recommendations For The Authors):

      General remarks

      (1) As I understand, the procedure starts from shifting residues one by one (page 4; A.1). Then, geometry reconstruction (e.g., B1) may be difficult in some cases joining back the shifted residues. It seems that such backbone perturbation can be done more efficiently by shifting groups of residues ("potential coupled motions") as mentioned at the bottom of page 9. Did I miss its description?

      We would describe the algorithm as sampling (which includes minimal shifts) in the backbone residues to ensure we can link neighboring residues. We agree that future iterations of qFit should include more effective backbone sampling by exploring motion along the Cβ-Cα, C-N, and (Cβ-Cα × C-N) bonds and exploring correlated backbone movements.

      (2) While the paper is well split in clear parts, some of them seem to be not at their right/optimal place and better can be moved to "Methods" (detailed "Overview of the qFit protein algorithm" as a whole) or to "Data" missed now (Two first paragraphs of "qFit improves overall fit...", page 8, and "Generating the qFit test set", page 22, and "Generating synthetic data ..." at page 26; description of the test data set), At my personal taste, description of tests with simulated data (page 15) would be better before that of tests with real data.

      Thank you for this comment, but we stand by our original decision to keep the general flow of the paper as it was submitted.

      (3) I wonder if the term "quadratic programming" (e.g., A3, page 5) is appropriate. It supposes optimization of a quadratic function of the independent parameters and not of "some" parameters. This is like the crystallographic LS which is not a quadratic function of atomic coordinates, and I think this is a similar case here. Whatever the answer on this remark is, an example of the function and its parameters is certainly missed.

      We think that the term quadratic programming is appropriate. We fit a function with a loss function (observed density - calculated density), while satisfying the independent parameters. We fit the coefficients minimizing a quadratic loss. We agree that the quadratic function is missing from the paper, and we have now included it in the Methods section.

      Technical remarks to be answered by the authors :

      (1) Page 1, Abstract, line 3. The ensemble modeling is not the only existing frontier, and saying "one of the frontiers" may be better. Also, this phrase gives a confusing impression that the authors aim to predict the ensemble models while they do it with experimental data.

      We agree with this statement and have re-worded the abstract to reflect this.

      (2) Page 2. Burling & Brunger (1994) should be cited as predecessors. On the contrary, an excellent paper by Pearce & Gros (2021) is not relevant here.

      While we agree that we should mention the Burling & Brunger paper and the Pearce & Gros (2021) should not be removed as it is not discussing the method of ensemble refinement.

      (3) Page 2, bottom. "Further, when compared to ..." The preference to such approach sounds too much affirmative.

      We have amended this sentence to state:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot(Emsley et al. 2010) unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      “The point we were trying to make in this sentence was that ensemble-based models are much harder to manually manipulate in Coot or other similar software compared to multiconformer models. We think that the new version of this sentence states this point more clearly.”

      (4) Page 2, last paragraph. I do not see an obvious relation of references 15-17 to the phrase they are associated with.

      We disagree with this statement, and think that these references are appropriate.

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      (5) Page 3, paragraph 2. Cryo-EM maps should be also "high-resolution"; it does not read like this from the phrase.

      We agree that high-resolution should be added, and the sentence now states:

      “However, many factors make manually creating multiconformer models difficult and time-consuming. Interpreting weak density is complicated by noise arising from many sources, including crystal imperfections, radiation damage, and poor modeling in X-ray crystallography, and errors in particle alignment and classification, poor modeling of beam induced motion, and imperfect detector Detector Quantum Efficiency (DQE) in high-resolution cryo-EM.”

      (6) Page 3, last paragraph before "results". The words "... in both individual cases and large structural bioinformatic projects" do not have much meaning, except introducing a self-reference. Also, repeating "better than 2 A" looks not necessary.

      We agree that this was unnecessary and have simplified the last sentence to state:

      “With the improvements in model quality outlined here, qFit can now be increasingly used for finalizing high-resolution models to derive ensemble-function insights.”

      (7) Page 3. "Results". Could "experimental" be replaced by a synonym, like "trial", to avoid confusing with the meaning "using experimental data"?

      We have replaced experimental with exploratory to describe the use of qFit on CryoEM data. The statement now reads:

      “For cryo-EM modeling applications, equivalent metrics of map and model quality are still developing, rendering the use of qFit for cryo-EM more exploratory.”

      (8) Page 4, A.1. Should it be "steps +/- 0.1" and "coordinate" be "coordinate axis"? One can modify coordinates and not shift them. I do not understand how, with the given steps, the authors calculated the number of combinations ("from 9 to 81"). Could a long "Alternatively, ...absent" be reduced simply to "Otherwise"?

      We have simplified and clarified the sentence on the sampling of backbone coordinates to state:

      “If anisotropic B-factors are absent, the translation of coordinates occurs in the X, Y, and Z directions. Each translation takes place in steps of 0.1 along each coordinate axis, extending to 0.3 Å, resulting in 9 (if isotropic) or to 81 (if anisotropic) distinct backbone conformations for further analysis.”

      (9) Page 6, B.1, line 2. Word "linearly" is meaningless here.

      We have modified this to read:

      “Moving from N- to C- terminus along the protein,”

      (10) Page 9, line 2. It should be explained which data set is considered as the test set to calculate Rfree.

      We think this is clear and would be repetitive if we duplicated it.

      (11) Page 9, line 7. It should be "a valuable metric" and not "an"

      We agree and have updated the sentence to read:

      “Rfree is a valuable metric for monitoring overfitting, which is an important concern when increasing model parameters as is done in multiconformer modeling.”

      (12) Page 10, paragraph 3. "... as a string (Methods)". I did not find any other mention of this term "string", including in "Methods" where it supposed to be explained. Either this should be explained (and an example is given?), or be avoided.

      We agree that string is not necessary (discussing the programmatic datatype). We have removed this from the sentence. It now reads:

      “To quantify how often qFit models new rotameric states, we analyzed the qFit models with phenix.rotalyze, which outputs the rotamer state for each conformer (Methods).”

      (13) Page10, lines 3-4 from bottom. Are these two alternative conformations justified?

      We are unsure what this is referring to.

      (14) Page 12, Fig. 2A. In comparison with Supplement Fig 2C, the direction of axes is changed. Could they be similar in both Figures?

      We have updated Supplementary Figure 2C to have the same direction of axes as Figure 2A.

      (15) Page 15, section's title. Choose a single verb in "demonstrate indicate".

      We have amended the title of this section to be:

      “Simulated data demonstrate qFit is appropriate for high-resolution data.”

      (16) Page 15, paragraph 2. "Structure factors from 0.8 to 3.0 A resolution" does not mean what the author wanted apparently to tell: "(complete?) data sets with the high-resolution limit which varied from 0.8 to 3.0 A ...". Also, a phrase of "random noise increasing" is not illustrated by Figs.5 as it is referred to.

      We have edited this sentence to now read:

      “To create the dataset for resolution dependence, we used the ground truth 7KR0 model, including all alternative conformations, and generated artificial structure factors with a high resolution limit ranging from  0.8 to 3.0 Å resolution (in increments of 0.1 Å).”

      (17) Page 15, last paragraph is written in a rather formal and confusing way while a clearer description is given in the figure legend and repeated once more in Methods. I would suggest to remove this paragraph.

      We agree that this is confusing. Instead of create a true positive/false positive/true negative/false negative matrix, we have just called things as they are, multiconformer or single conformer and match or no match. We have edited the language the in the manuscript and figure legends to reflect these changes.

      (18) Page 16. Last two paragraphs start talking about a new story and it would help to separate them somehow from the previous ones (sub-title?).

      We agree that this could use a subtitle. We have included the following subtitle above this section:

      “Simulated multiconformer data illustrate the convergence of qFit.”

      (19) Page 20. "or static" and "we determined that" seem to be not necessary.

      We have removed static and only used single conformer models. However, as one of the main conclusions of this paper is determining that qFit can pick up on alternative conformers that were modeled manually, we have decided to the keep the “we determined that”.

      (20) Page 21, first paragraph. "Data" are plural; it should be "show" and "require"

      We have made these edits. The sentence now reads:

      “However, our data here shows that not only does qFit need a high-resolution map to be able to detect signal from noise, it also requires a very well-modeled structure as input.”

      (21) Page 21, References should be indicated as [41-45], [35,46-48], [55-57]. A similar remark to [58-63] at page 22.

      We have fixed the reference layout to reflect this change.

      (22) Page 21, last paragraph. "Further reduce R-factors" (moreover repeated twice) is not correct neither by "further", since here it is rather marginal, nor as a goal; the variations of R-factors are not much significant. A more general statement like "improving fit to experimental data" (keeping in mind density maps) may be safer.

      We agree with the duplicative nature of these statements. We have amended the sentence to now read:

      “Automated detection and refinement of partial-occupancy waters should help improve fit to experimental data further reduce Rfree15 and provide additional insights into hydrogen-bond patterns and the influence of solvent on alternative conformations.”

      (23) Page 22. Sub-sections of "Methods" are given in a little bit random order; "Parallelization of large maps" in the middle of the text is an example. Put them in a better order may help.

      We have moved some section of the Methods around and made better headings by using an underscore to highlight the subsections (Generating and running the qFit test set, qFit improved features, Analysis metrics, Generating synthetic data for resolution dependence).

      (24) Page 24. Non-convex solution is a strange term. There exist non-convex problems and functions and not solutions.

      We agree and we have changed the language to reflect that we present the algorithm with non-convex problems which it cannot solve.

      (25) Page 26, "Metrics". It is worthy to describe explicitly the metrics and not (only) the references to the scripts.

      For all metrics, we describe a sentence or two on what each metric describes. As these metrics are well known in the structural biology field, we do not feel that we need to elaborate on them more.

      (26) Page 26. Multiplying B by occupancy does not have much sense. A better option would be to refer to the density value in the atomic center as occ*(4*pi/B)^1.5 which gives a relation between these two entities.

      We agree and have update the B-factor figures and metrics to reflect this.

      (27) Page 40, suppl. Fig. 5. Due to the color choice, it is difficult to distinguish the green and blue curves in the diagram.

      We have amended this with the colors of the curves have been switched.

      (28) Page 42, Suppl. Fig. 7. (A) How the width of shaded regions is defined? (B) What the blue regions stand for? Input Rfree range goes up to 0.26 and not to 0.25; there is a point at the right bound. (C) Bounds for the "orange" occupancy are inversed in the legend.

      (A) The width of the shaded region denotes the standard deviations among the values at every resolution. We have made this clearer in the caption

      (B) The blue region denotes the confidence interval for the regression estimate. Size of the confidence interval was set to 95%. We have made this clearer in the caption

      (C) This has been fixed now

      The maximum R-free value is 0.2543, which we rounded down to 0.25.

      (29) Page 43. Letters E-H in the legend are erroneously substituted by B-E.

      We apologize for this mistake. It is now corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study makes a valuable empirical contribution to our understanding of visual processing in primates and deep neural networks, with a specific focus on the concept of factorization. The analyses provide solid evidence that high factorization scores are correlated with neural predictivity, yet more evidence would be needed to show that neural responses show factorization. Consequently, while several aspects require further clarification, in its current form this work is interesting to systems neuroscientists studying vision and could inspire further research that ultimately may lead to better models of or a better understanding of the brain.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper investigates visual processing in primates and deep neural networks (DNNs), focusing on factorization in the encoding of scene parameters. It challenges the conventional view that object classification is the primary function of the ventral visual stream, suggesting instead that the visual system employs a nuanced strategy involving both factorization and invariance. The study also presents empirical findings suggesting a correlation between high factorization scores and good neural predictivity.

      Strengths:

      (1) Novel Perspective: The paper introduces a fresh viewpoint on visual processing by emphasizing the factorization of non-class information.

      (2) Methodology: The use of diverse datasets from primates and humans, alongside various computational models, strengthens the validity of the findings.

      (3) Detailed Analysis: The paper suggests metrics for factorization and invariance, contributing to a future understanding & measurements of these concepts.

      Weaknesses:

      (1) Vagueness (Perceptual or Neural Invariance?): The paper uses the term 'invariance', typically referring to perceptual stability despite stimulus variability [1], as the complete discarding of nuisance information in neural activity. This oversimplification overlooks the nuanced distinction between perceptual invariance (e.g., invariant object recognition) and neural invariance (e.g., no change in neural activity). It seems that by 'invariance' the authors mean 'neural' invariance (rather than 'perceptual' invariance) in this paper, which is vague. The paper could benefit from changing what is called 'invariance' in the paper to 'neural invariance' and distinguish it from 'perceptual invariance,' to avoid potential confusion for future readers. The assignment of 'compact' representation to 'invariance' in Figure 1A is misleading (although it can be addressed by the clarification on the term invariance). [1] DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends in cognitive sciences. 2007 Aug 1;11(8):333-41.

      Thanks for pointing out this ambiguity. In our Introduction we now explicitly clarify that we use “invariance” to refer to neural, rather than perceptual invariance, and we point out that both factorization and (neural) invariance may be useful for obtaining behavioral/perceptual invariance.

      (2) Details on Metrics: The paper's explanation of factorization as encoding variance independently or uncorrelatedly needs more justification and elaboration. The definition of 'factorization' in Figure 1B seems to be potentially misleading, as the metric for factorization in the paper seems to be defined regardless of class information (can be defined within a single class). Does the factorization metric as defined in the paper (orthogonality of different sources of variation) warrant that responses for different object classes are aligned/parallel like in 1B (middle)? More clarification around this point could make the paper much richer and more interesting.

      Our factorization metric measures the degree to which two sets of scene variables are factorized from one another. In the example of Fig. 1B, we apply this definition to the case of factorization of class vs. non-class information. Elsewhere in the paper we measure factorization of several other quantities unrelated to class, specifically camera viewpoint, lighting conditions, background content, and object pose. In our revised manuscript we have clarified the exposition surrounding Fig. 1B to make it clear that factorization, as we define it, can be applied to other quantities as well and that responses do not need to be aligned/parallel but simply live in a different set of dimensions whether linearly or nonlinearly arranged. Thanks for raising the need to clarify this point.

      (3) Factorization vs. Invariance: Is it fair to present invariance vs. factorization as mutually exclusive options in representational hypothesis space? Perhaps a more fair comparison would be factorization vs. object recognition, as it is possible to have different levels of neural variability (or neural invariance) underlying both factorization and object recognition tasks.

      We do not mean to imply that factorization and invariance are mutually exclusive, or that they fully characterize the space of possible representations. However, they are qualitatively distinct strategies for achieving behavioral capabilities like object recognition. In the revised manuscript we also include a comparison to object classification performance (Figures 5C & S4, black x’s) as a predictor of brain-like representations, alongside the results for factorization and invariance.

      In our revised Introduction and beginning of the Results section, we make it more clear that factorization and invariance are not mutually exclusive – indeed, our results show that both factorization and invariance for some scene variables like lighting and background identity are signatures of brain-like representations. Our study focuses on factorization because we believe its importance has not been studied or highlighted to the degree that invariance to “nuisance” parameters has in concert with selectivity to object identity in individual neuron tuning functions. Moreover, the loss functions used for supervised training functions of neural networks for image classification would seem to encourage invariance as a representational strategy. Thus, the finding that factorization of scene parameters is an equally good if not better predictor of brain-like representations may motivate new objective functions for neural network training.

      (4) Potential Confounding Factors in Empirical Findings: The correlation observed in Figure 3 between factorization and neural predictivity might be influenced by data dimensionality, rather than factorization per se [2]. Incorporating discussions around this recent finding could strengthen the paper.

      [2] Elmoznino E, Bonner MF. High-performing neural network models of the visual cortex benefit from high latent dimensionality. bioRxiv. 2022 Jul 13:2022-07.

      We thank the Reviewer for pointing out this important, potential confound and the need for a direct quantification. We have now included an analysis computing how well dimensionality (measured using the participation ratio metric for natural images, as was done in [2] Elmoznino& Bonner bioRxiv. 2022) can account for model goodness-of-fit (additional pink bars in Figure 6). Factorization of scene parameters appears to add more predictive power than dimensionality on average (Figure 6, light shaded bars), and critically, factorization+classification jointly predict goodness-of-fit significantly better than dimensionality+classification for V4 and IT/HVC brain areas (Figure 6, dark shaded bars). Indeed, dimensionality+classification is only slightly more predictive than classification alone for V4 and IT/HVC indicating some redundancy in those measures with respect to neural predictivity of models (Figure 6, compare dark shaded pink bar to dashed line).

      That said, high-dimensional representations can, in principle, better support factorization, and thus we do not regard these two representational strategies necessarily in competition. Rather, our results suggest (consistent with [2]) that dimensionality is predictive of brain-like representation to some degree, such that some (but not all) of factorization’s predictive power may indeed owe to a partial correlation with dimensionality. We elaborate in the Discussion where this point comes up and now refer to the updated Figure 6 that shows the control for dimensionality.

      Conclusion:

      The paper offers insightful empirical research with useful implications for understanding visual processing in primates and DNNs. The paper would benefit from a more nuanced discussion of perceptual and neural invariance, as well as a deeper discussion of the coexistence of factorization, recognition, and invariance in neural representation geometry. Additionally, addressing the potential confounding factors in the empirical findings on the correlation between factorization and neural predictivity would strengthen the paper's conclusions.

      Taken together, we hope that the changes described above address the distinction between neural and perceptual invariance, provide a more balanced understanding of the contributions of factorization, invariance, and local representational geometry, and rule against dimensionality for natural images as contributing to the main finding of the benefits from factorization of scene parameters.

      Reviewer #2 (Public Review):

      Summary:

      The dominant paradigm in the past decade for modeling the ventral visual stream's response to images has been to train deep neural networks on object classification tasks and regress neural responses from units of these networks. While object classification performance is correlated to the variance explained in the neural data, this approach has recently hit a plateau of variance explained, beyond which increases in classification performance do not yield improvements in neural predictivity. This suggests that classification performance may not be a sufficient objective for building better models of the ventral stream. Lindsey & Issa study the role of factorization in predicting neural responses to images, where factorization is the degree to which variables such as object pose and lighting are represented independently in orthogonal subspaces. They propose factorization as a candidate objective for breaking through the plateau suffered by models trained only on object classification.

      They claim that (i) maintaining these non-class variables in a factorized manner yields better neural predictivity than ignoring non-class information entirely, and (ii) factorization may be a representational strategy used by the brain.

      The first of these claims is supported by their data. The second claim does not seem well-supported, and the usefulness of their observations is not entirely clear.

      Strengths:

      This paper challenges the dominant approach to modeling neural responses in the ventral stream, which itself is valuable for diversifying the space of ideas.

      This paper uses a wide variety of datasets, spanning multiple brain areas and species. The results are consistent across the datasets, which is a great sign of robustness.

      The paper uses a large set of models from many prior works. This is impressively thorough and rigorous.

      The authors are very transparent, particularly in the supplementary material, showing results on all datasets. This is excellent practice.

      Weaknesses:

      (1) The primary weakness of this paper is a lack of clarity about what exactly is the contribution. I see two main interpretations: (1-A) As introducing a heuristic for predicting neural responses that improve over-classification accuracy, and (1-B) as a model of the brain's representational strategy. These two interpretations are distinct goals, each of which is valuable. However, I don't think the paper in its current form supports either of them very well:

      (1-A) Heuristic for neural predictivity. The claim here is that by optimizing for factorization, we could improve models' neural predictivity to break through the current predictivity plateau. To frame the paper in this way, the key contribution should be a new heuristic that correlates with neural predictivity better than classification accuracy. The paper currently does not do this. The main piece of evidence that factorization may yield a more useful heuristic than classification accuracy alone comes from Figure 5. However, in Figure 5 it seems that factorization along some factors is more useful than others, and different linear combinations of factorization and classification may be best for different data. There is no single heuristic presented and defended. If the authors want to frame this paper as a new heuristic for neural predictivity, I recommend the authors present and defend a specific heuristic that others can use, e.g. [K * factorization_of_pose + classification] for some constant K, and show that (i) this correlates with neural predictivity better than classification alone, and (ii) this can be used to build models with higher neural predictivity. For (ii), they could fine-tune a state-of-the-art model to improve this heuristic and show that doing so achieves a new state-of-the-art neural predictivity. That would be convincing evidence that their contribution is useful.

      Our paper does not make any strong claim regarding the Reviewer’s point 1-A (on heuristics for neural predictivity). In the Discussion, last paragraph, we better specify that our work is merely suggestive of claim 1-A about heuristics for more neurally predictive, more brainlike models. We believe that our paper supports the Reviewer’s point 1-B (on brain representation) as we discuss below.

      We leave it to future work to determine if factorization could help optimize models to be more brainlike. This treatment may require exploration of novel model architectures and loss functions, and potentially also more thorough neural datasets that systematically vary many different forms of visual information for validating any new models.

      (1-B) Model of representation in the brain. The claim here is that factorization is a general principle of representation in the brain. However, neural predictivity is not a suitable metric for this, because (i) neural predictivity allows arbitrary linear decoders, hence is invariant to the orthogonality requirement of factorization, and (ii) neural predictivity does not match the network representation to the brain representation. A better metric is representational dissimilarity matrices. However, the RDM results in Figure S4 actually seem to show that factorization does not do a very good job of predicting neural similarity (though the comparison to classification accuracy is not shown), which suggests that factorization may not be a general principle of the brain. If the authors want to frame the paper in terms of discovering a general principle of the brain, I suggest they use a metric (or suite of metrics) of brain similarity that is sensitive to the desiderata of factorization, e.g. doesn't apply arbitrary linear transformations, and compare to classification accuracy in addition to invariance.

      We agree with the Reviewer about the shortcomings of neural predictivity for comparing representational geometries, and in our revised manuscript we have provided a more comprehensive set of results that includes RDM predictivity in new Figures 6 & 7, alongside the results for neural fit predictivity. In addition, as suggested we added classification accuracy predictivity in Figures 5C & S4 (black x’s) for visual comparison to factorization/invariance. In Figure S4 on RDMs, it is apparent how factorization is at least as good a predictor as classification on all V4 & IT datasets from both monkeys and humans (compared x’s to filled circles in Figure S4; note that some of the points from the original Figure S4 changed as we discovered a bug in the code that specifically affected the RDM analysis for a few of the datasets).

      We find that the newly included RDM analyses in Figures 6 & 7 are consistent with the conclusions of the neural fit regression analyses: that the correlation of factorization metrics with RDM matches are strong, comparable in magnitude to that of classification accuracy (Figure 6, 3rd & 4th columns, compare black dashed line to faded colored bars) and are not fully accounted for by the model’s classification accuracy alone (Figure 6, 3rd & 4th columns, higher unfaded bars for classification combined with factorization, and see corresponding example scatters in Figure 7 middle/bottom rows).

      It is encouraging that the added benefit of factorization for RDM predictivity accounting for classification performance is at least as good as the improvement seen for neural fit predictivity (Figure 6, 1st & 2nd columns for encoding fits versus 3rd & 4th columns for RDM correlations).

      (2) I think the comparison to invariance, which is pervasive throughout the paper, is not very informative. First, it is not surprising that invariance is more weakly correlated with neural predictivity than factorization, because invariant representations lose information compared to factorized representations. Second, there has long been extensive evidence that responses throughout the ventral stream are not invariant to the factors the authors consider, so we already knew that invariance is not a good characterization of ventral stream data.

      While we appreciate the Reviewer’s intuition that highly invariant representations are not strongly supported in the high-level visual cortex, we nevertheless thought it was valuable to put this intuition to a quantitative, detailed test. As a result, we uncovered effects that were not obvious a priori, at least to us – for example, that invariance for some scene parameters (camera view, object pose) is negatively correlated with neural predictions while invariance to others (background, lighting) is positively correlated. Thus, our work exercises the details of invariance for different types of information.

      (3) The formalization of the factorization metric is not particularly elegant, because it relies on computing top K principal components for the other-parameter space, where K is arbitrarily chosen as 10. While the authors do show that in their datasets the results are not very sensitive to K (Figure S5), that is not guaranteed to be the case in general. I suggest the authors try to come up with a formalization that doesn't have arbitrary constants. For example, one possibility that comes to mind is E[delta_a x delta_b], where 'x' is the normalized cross product, delta_a, and delta_b are deltas in representation space induced by perturbations of factors a and b, and the expectation is taken over all base points and deltas. This is just the first thing that comes to mind, and I'm sure the authors can come up with something better. The literature on disentangling metrics in machine learning may be useful for ideas on measuring factorization.

      Thanks to the Reviewer for raising this point. First, we wish to clarify a potential misunderstanding of the factorization metric: the number K of principal components we choose is not an arbitrary constant, but rather calibrated to capture a certain fraction of variance, set to 90% by default in our analyses. While this variance threshold is indeed an arbitrary hyperparameter, it has a more intuitive interpretation than the number of principal components.

      Nonetheless, the Reviewer’s comment did inspire us to consider another metric for factorization that does not depend on any arbitrary parameters. In the revised version, we now include a covariance matrix based metric which simply measures the elementwise correlation of the covariance matrices induced by varying the scene parameter of interest and the covariance matrix induced by varying the other parameters (and then subtracts this quantity from 1).

      Correspondingly, we now present results for both the new covariance based measure and the original PCA based one in Figures 5C, 6, and 7. The main findings remain largely the same when using the covariance based metric, and the covariance based metric (Figure 5C, compare light shaded to dark shaded filled circles; Figure 6, compare top row to bottom row; Figure 7, compare middle rows to bottom rows).

      Ultimately, we believe these two metrics are complementary and somewhat analogous to two metrics commonly used for measuring dimensionality (the number of components needed to explain a certain fraction of the variance, analogous to our original PCA based definition; the participation ratio, analogous to our covariance based definition). We have added the formula for the covariance based factorization metric along with a brief description to the Methods.

      (4) The authors defined the term "factorization" according to their metric. I think introducing this new term is not necessary and can be confusing because the term "factorization" is vague and used by different researchers in different ways. Perhaps a better term is "orthogonality", because that is clear and seems to be what the authors' metric is measuring.

      We agree with the Reviewer that factorization has become an overloaded term. At the same time, we think that in this context, the connotation of the term factorization effectively conveys the notion of separating out different latent sources of variance (factors) such that they can be encoded in orthogonal subspaces.

      To aid clarity, we now mention in the Introduction that factorization defined here is meant to measure orthogonalization of scene factors. Additionally, in the Discussion section, we now go into more detail comparing our metric to others previously used in the literature, including orthogonality, to help put it in context.

      (5) One general weakness of the factorization paradigm is the reliance on a choice of factors. This is a subjective choice and becomes an issue as you scale to more complex images where the choice of factors is not obvious. While this choice of factors cannot be avoided, I suggest the authors add two things: First, an analysis of how sensitive the results are to the choice of factors (e.g. transform the basis set of factors and re-run the metric); second, include some discussion about how factors may be chosen in general (e.g. based on temporal statistics of the world, independent components analysis, or something else).

      The Reviewer raises a very reasonable point about the limitation of this work. While we limited our analysis to generative scene factors that we know about and that could be manipulated, there are many potential factors to consider. It is not clear to us exactly how to implement the Reviewer’s suggestion of transforming the basis set of factors, as the factors we consider are highly nonlinear in the input space. Ultimately, we believe that finding unsupervised methods to characterize the “true” set of factors that is most useful for understanding visual representations is an important subject for future work, but outside the scope of this particular study. We have added a comment to this effect in the Discussion.

      Reviewer #3 (Public Review):

      Summary:

      Object classification serves as a vital normative principle in both the study of the primate ventral visual stream and deep learning. Different models exhibit varying classification performances and organize information differently. Consequently, a thriving research area in computational neuroscience involves identifying meaningful properties of neural representations that act as bridges connecting performance and neural implementation. In the work of Lindsey and Issa, the concept of factorization is explored, which has strong connections with emerging concepts like disentanglement [1,2,3] and abstraction [4,5]. Their primary contributions encompass two facets: (1) The proposition of a straightforward method for quantifying the degree of factorization in visual representations. (2) A comprehensive examination of this quantification through correlation analysis across deep learning models.

      To elaborate, their methodology, inspired by prior studies [6], employs visual inputs featuring a foreground object superimposed onto natural backgrounds. Four types of scene variables, such as object pose, are manipulated to induce variations. To assess the level of factorization within a model, they systematically alter one of the scene variables of interest and estimate the proportion of encoding variances attributable to the parameter under consideration.

      The central assertion of this research is that factorization represents a normative principle governing biological visual representation. The authors substantiate this claim by demonstrating an increase in factorization from macaque V4 to IT, supported by evidence from correlated analyses revealing a positive correlation between factorization and decoding performance. Furthermore, they advocate for the inclusion of factorization as part of the objective function for training artificial neural networks. To validate this proposal, the authors systematically conduct correlation analyses across a wide spectrum of deep neural networks and datasets sourced from human and monkey subjects. Specifically, their findings indicate that the degree of factorization in a deep model positively correlates with its predictability concerning neural data (i.e., goodness of fit).

      Strengths:

      The primary strength of this paper is the authors' efforts in systematically conducting analysis across different organisms and recording methods. Also, the definition of factorization is simple and intuitive to understand.

      Weaknesses:

      This work exhibits two primary weaknesses that warrant attention: (i) the definition of factorization and its comparison to previous, relevant definitions, and (ii) the chosen analysis method.

      Firstly, the definition of factorization presented in this paper is founded upon the variances of representations under different stimuli variations. However, this definition can be seen as a structural assumption rather than capturing the effective geometric properties pertinent to computation. More precisely, the definition here is primarily statistical in nature, whereas previous methodologies incorporate computational aspects such as deviation from ideal regressors [1], symmetry transformations [3], generalization [5], among others. It would greatly enhance the paper's depth and clarity if the authors devoted a section to comparing their approach with previous methodologies [1,2,3,4,5], elucidating any novel insights and advantages stemming from this new definition.

      [1] Eastwood, Cian, and Christopher KI Williams. "A framework for the quantitative evaluation of disentangled representations." International conference on learning representations. 2018.

      [2] Kim, Hyunjik, and Andriy Mnih. "Disentangling by factorising." International Conference on Machine Learning. PMLR, 2018.

      [3] Higgins, Irina, et al. "Towards a definition of disentangled representations." arXiv preprint arXiv:1812.02230 (2018).

      [4] Bernardi, Silvia, et al. "The geometry of abstraction in the hippocampus and prefrontal cortex." Cell 183.4 (2020): 954-967.

      [5] Johnston, W. Jeffrey, and Stefano Fusi. "Abstract representations emerge naturally in neural networks trained to perform multiple tasks." Nature Communications 14.1 (2023): 1040.

      Thanks to the Reviewer for this suggestion. We agree that our initial submission did not sufficiently contextualize our definition of factorization with respect to other related notions in the literature. We have added additional discussion of these points to the Discussion section in the revised manuscript and have included therein the citations provided by the Reviewer (please see the third paragraph of Discussion).

      Secondly, in order to establish a meaningful connection between factorization and computation, the authors rely on a straightforward synthetic model (Figure 1c) and employ multiple correlation analyses to investigate relationships between the degree of factorization, decoding performance, and goodness of fit. Nevertheless, the results derived from the synthetic model are limited to the low training-sample regime. It remains unclear whether the biological datasets under consideration fall within this low training-sample regime or not.

      We agree that our model in Figure 1C is very simple and does not fully capture the complex interactions between task performance and features of representational geometry, like factorization. We intend it only as a proof of concept to illustrate how factorized representations can be beneficial for some downstream task use cases. While the benefits of factorized representations disappear for large numbers of samples in this simulation, we believe this is primarily a consequence of the simplicity and low dimensionality of the simulation. Real-world visual information is complex and high-dimensional, and as such the relevant sample size regime in which factorization offers tasks benefits may be much greater. As a first step toward this real-world setting, Figure 2 shows how decreasing the amount of factorization in neural population data in macaque V4/IT can have an effect on object identity decoding.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Missing citations: The paper could benefit from discussions & references to related papers, such as:

      Higgins I, Chang L, Langston V, Hassabis D, Summerfield C, Tsao D, Botvinick M. Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nature communications. 2021 Nov 9;12(1):6456.

      We have added additional discussion of related work, including the suggested reference and others on disentanglement, to the Discussion section in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Here are several small recommendations for the authors, all much more minor than those in the public review:

      I suggest more use of equations in methods sections about Figure 1C and macaque neural data analysis.

      Thanks for this suggestion. We have added new Equation 1 for the method transforming neural data to reduce factorization of a variable while preserving other firing rate statistics.

      In Figure 1-C, the methods indicate that Gaussian noise was added. This is a very important detail, and complexifies the interpretation of the figure because it adds an assumption about the structure of noise. In other words, if I understand correctly, the correct interpretation of Figure 1C is "assuming i.i.d. noise, decoding accuracy improves with factorization." The i.i.d. noise is a big assumption, and it is debated how well the brain satisfies this assumption. I suggest you either omit noise for this figure or clearly state in the main text (e.g. caption) that the figure must be interpreted under an i.i.d. noise assumption.

      We have added an explicit statement of the i.i.d. noise assumption to the Figure 1C legend.

      For Figure 2B, I suggest labeling the x-axis clearly below the axis on both panels. Currently, it is difficult to read, particularly in print.

      We have made the x-axis labels more clear and included on both panels.

      Figure 3A is difficult to read because of the very small task. I suggest avoiding such small fonts.

      We agree that Figure 3A is difficult to read. We have broken out Figure 3 into two new Figures 3 & 4 to increase clarity and sizing of text in Figure 3A.

      Reviewer #3 (Recommendations For The Authors):

      To strengthen this work, it is advisable to incorporate more comprehensive comparisons with previous research, particularly within the machine learning (ML) community. For instance, it would be beneficial to explore and reference works focusing on disentanglement [1,2,3]. This would provide valuable context and facilitate a more robust understanding of the contributions and novel insights presented in the current study.

      We have added additional discussion of related work and other notions similar to factorization to the Discussion section in the revised manuscript.

      Additionally, improving the quality of the figures is crucial to enhance the clarity of the findings:

      • Figure 2: The caption of subfigure B could be revised for greater clarity.

      Thank you, we have substantially clarified this figure caption.

      • Figure 3: Consider a more equitable approach for computing the correlation coefficient, such as calculating it separately for different types of models. In the case of supervised models, it appears that the correlation between invariance and goodness of fit may not be negligible across various scene parameters.

      We appreciate the suggestion, but we are not confident in our ability to conclude much from analyses restricted to particular model classes, given the relatively small N and the fact that the different model classes themselves are an important source of variance in our data.

      • Figure 4: To enhance the interpretability of subfigures A and B, it may be beneficial to include p-values (indicating confidence levels).

      As we supply bootstrapped confidence intervals for our results, which provide at least as much information as p-values, and most of the effects of interest are fairly stark when comparing invariance to factorization, p-values were not needed to support our points. We added a sentence to the legend of new Figure 5 (previously Figure 4) indicating that error bars reflect standard deviations over bootstrap resampling of the models.

      • Figure 5: For subfigure B, it could be advantageous to plot the results solely for factorization, allowing for a clear assessment of whether the high correlation observed in Classification+Factorization arises from the combined effects of both factors or predominantly from factorization alone.

      First, we clarify/note that the scatters solely for factorization that the Reviewer seeks are already presented earlier in the manuscript across all conditions in Figures 4A,B and Figure S2.

      While we could also include these in new Figure 7 (previously Figure 5B) as the Reviewer suggests, we believe it would distract from the message of that figure at the end of the manuscript – which is that factorization is useful as a supplement to classification in predictive matches to neural data. Nonetheless, new Figure 6 (old Figure 5A) provides a summary quantification of the information that the reviewer requests (Fig. 6, faded colored bars reflect the contribution of factorization alone).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study elucidates a detailed molecular mechanism of the initial stages of transport in a medically relevant GABA neurotransmitter transporter GAT1 and thus generates useful new insights for this protein family. In particular, it presents convincing evidence for the presence of a "staging binding site" that locally concentrates Na+ ions to increase transport activity, whilst solid evidence for how Na+ binding affects the larger scale dynamics.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript authored by Stockner and colleagues delves into the molecular simulations of Na+ binding pathway and the ionic interactions at the two known sodium binding sites site 1 and site 2. They further identify a patch of two acidic residues in TM6 that seemingly populate the Na+ ions prior to entry into the vestibule. These results highlight the importance of studying the ion-entry pathways through computational approaches and the authors also validate some of their findings through experimental work. They observe that sodium site 1 binding is stabilized by the presence of the substrate in the S1 site and this is particularly vital as the GABA carboxylate is involved in coordinating the Na+ ion unlike other monoamine transporters and binding of sodium to the Na2 site stabilizes the conformation of the GAT1 by reducing flexibility among the helical bundles involved in alternating access.

      Strengths:

      The study displays results that are generally consistent with available information from experiments on SLC6 transporters particularly GAT1 and puts forth the importance of this added patch of residues in the extracellular vestibule that could be of importance to the ion permeation in SLC6 transporters. This is a nicely performed study and could be improved if the authors could comment on and fix the following queries.

      We thank our reviewer for the overall positive evaluation.

      Weaknesses:

      (1) How conserved are the residue pair of D281-E283 in other SLC6 transporters. The authors commented on the presence of these residues in SERT but it would be nice to know how widespread these residues are in other SLC6 transporters like NET, GlyT, and DAT.

      We have created a sequence alignment of the entire human SLC6 family (Supplementary Figure 1) and found that E283 is polar or charged in all SLC6 transporters. D281 shows a higher level of conservation across the family compared to E283. D281 is negatively charged in approximately 50% of the SLC6 family members, an aspartate in all GABA transporters and a glutamate in all monoamine transporters.

      (2) Further, one would like to see the effect of individual mutations D281A and E283A on transport, surface expression, and EC50 of Na+ to gauge the effect on transport.

      We have carried out experiments to investigate the effects of the individual mutations. The results revealed intermediate effects between WT and the double mutant (D281A-E283A) and showed that the effects mostly align with the degree of conservation, as a neutralisation of D281 by alanine has a stronger effect than the E283A mutant. Both single mutants had minimal effects on the sodium dependence of uptake, D281A had a stronger effect on expression, Km and Vmax as compared to E283. Only D281A reduced surface expression, while E283A expresses to a similar level as wild type GAT1.

      (3) A clear figure of the S1 site where Na+ tends to stay prior to Na1 site interactions needs to be provided with a clear figure. Further, it is not entirely clear how access to S1 is altered if the transporter is in an outwardoccluded conformation if F294 is blocking solvent access. Please comment.

      We have modified the structural images in Figure 1, 5, 6 and 7 to improve their comprehensibility. We have also added a comment on the role of F294 as part of the outer hydrophobic gate to the discussion. In short, F294 does not occlude the passage to the S1 as long as GAT1 is outward open, and we find that GAT1 is outward open in all sodium binding simulations.

      (4) The p-value of the EC50 differences between GAT1WT and GAT1double mutant need to be mentioned. The difference in sodium dependence EC50 seems less than twofold, and it would be useful to mention how critical the role of the recruitment site is. Since the transport is not affected the site could play a transient role in attracting ions.

      We have added p-values or standard deviation to our data.

      (5) It would be very nice to know how K+ ions are attracted by this recruitment site. This could further act as a control simulation to test the preference for Na+ ions among SLC6 members.

      We think that attraction of potassium to the recruitment site is not of relevance, as the residues are at the extracellular side and exposed to bulk, where the concentration of sodium is high (typically 130-150 mM), while the concentration of potassium is very small (3-5 mM). Exploring sodium binding by simulations for all SLC6 members could be interesting, but clearly outside the scope of this manuscript.

      (6) Some of the important figures are not very clear. For instance, there should be a zoomed-in view of the recruitment site. The current one in Fig. 1b and 1c could be made clearer. Similarly as mentioned earlier the Na residence at the S1 site away from the Na1 and Na2 sites needs to be shown with greater clarity by putting side chain information in Fig. 6d.

      We have modified the structural images in Figure 1, 5, 6 and 7 to improve their comprehensibility.

      (7) The structural features that comprise the two principal components PC1 and PC2 should be described in greater detail.

      We have modified Figure 6 and added images that show the motions along PC1 and PC2. In addition, these are now better explained in the text.

      Reviewer #2 (Public Review):

      Summary:

      Starting from an AlphaFold2 model of the outward-facing conformation of the GAT1 transporter, the authors primarily use state-of-the-art MD simulations to dissect the role of the two Na+ ions that are known to be cotransported with the substrate, GABA (and a co-transported Cl- ion). The simulations indicated that Na+ binding to OF GAT depends on the electrostatic environment. The authors identify an extracellular recruiting site including residues D281 and E283 which they hypothesized to increase transport by locally increasing the available Na+ concentration and thus increasing binding of Na+ to the canonical binding sites NA1 and NA2. The charge-neutralizing double mutant D281A-E283A showed decreased binding in simulations. The authors performed GABA uptake experiments and whole-cell patch clamp experiments that taken together validated the hypothesis that the Na+ staging site is important for transport due to its role in pulling in Na+.

      Detailed analysis of the MD simulations indicated that Na+ binding to NA2 has multiple structural effects: The binding site becomes more compact (reminiscent of induced fit binding) and there is some evidence that it stabilizes the outward-facing conformation.

      Binding to NA1 appears to require the presence of the substrate, GABA, whose carboxylate moiety participates in Na+ binding; thus the simulations predict cooperativity between binding of GABA and Na+ binding to NA1.

      Strengths:

      -  MD simulations were used to propose a hypothesis (the existence of the staging Na+ site) and then tested with a mutant in simulations AND in experiments. This is an excellent use of simulations in combination with experiments.

      -  A large number of repeat MD simulations are generally able to provide a consistent picture of Na+ binding. Simulations are performed according to current best practices and different analyses illuminate the details of the molecular process from different angles.

      -  The role of GABA in cooperatively stabilizing Na+ binding to the NA1 site looks convincing and intriguing.

      We thank the review for the very supportive assessment.

      Weaknesses:

      -  Assessing the effects of Na+ binding on the large-scale motions of the transporter is more speculative because the PCA does not clearly cover all of the conformational space and the use of an AlphaFold2 model may have introduced structural inconsistencies. For example, it is not clear if movements of the inner gate are due to an AF2 model that's not well packed or really a feature of the open outward conformation.

      The long range effect of sodium binding to GAT1 and destabilisation of the inner gate has, based on our data, a causal effect. PCA separates conformational motions into degrees of freedom and sorts them according to the largest motions. Motions of TM5a were among the 2 largest motions, which suggests that these are relevant motions. To directly quantify their behaviour, we measured informative distances at the inner gate of GAT1, as shown in Figure 6i,j,k and separated data according to the presence of sodium in NA2.

      For the following reasons we exclude that the results are a consequence of structural inconsistencies introduced by AlphaFold2 and therefore not reflecting functionally relevant effects:

      (1) If depending on the model instead of sodium binding, the effects should not be correlated with the presence of sodium in the NA2 binding site.

      (2)  We carried out new simulations starting from the occluded GAT1 structure (Figure 6j,k). The data shows that in the occluded state the distance across the inner vestibule and the length of TM5a differ, consistent with our interpretation of the data. As sodium binding fixes GAT1 outwardfacing, as it also occurs in other SLC6 family members (Szöllősi and Stockner, 2022), the distances of the outward-open GAT1 are at the short extreme of the scale, distances of the inward-open state of the cryo-EM structure(s) are at the other extreme, while the occluded conformation of GAT1 shows intermediate values.

      (3)  We have observed the same property in SERT, for which we used experimental structures as starting structure (Gradisch et al., 2024), suggesting that this could be a generally mechanism.

      (4)  All available structures from the entire SLC6 family are consistent with structural effects of TM5a in response to bundle domain motions and therefore to binding of sodium to NA2 as it stabilized the outward-open state as well as transition to the inward facing conformation.

      - Quantitative analyses are difficult with the existing data; for example, the tICA "free energy" landscape is probably not converged because unbinding events haven't been observed.

      Simulations can always be too short and therefore not fully describe the complete underlying conformational ensemble. We added a statement in the discussion indicating this shortcoming. With respect to the tICA analysis in our manuscript, the tICA approach does, by design, not need long simulations that capture the full binding and unbinding in multiple instances to construct a correct free energy landscape. Instead, the tICA method builds on Markov chain dependencies and relies only on the convergence of transitions between hundreds of conformational microstates and the fluxes between them. The free energy profile derived for the S1, including NA1, TMP and NA2 and up to the salt bridge of the outer gate is well converged and we observed many transitions. In contrast, the entry from the recruitment side to the S1 has most likely a too low density of microstate and a too small number of transition to be considered converged with respect to quantifying the free energy of binding from bulk. We now explain this shortcoming.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      Authors should furnish p-values in the figure legends for experimental results.

      We have added the p-values to text and figure legends.

      Reviewer #2 (Recommendations For The Authors):

      -  Deposit simulation data in a public repository (input files, trajectories (possibly subsampled)).

      We deposited the data to Zenodo and provided the DOI: 10.5281/zenodo.10686813 to the data. As we were unable to upload the trajectories to zenodo, we deposited the starting and the end structures of the simulations.

      -  Please include a short discussion of the reliability of using an AF2 model instead of experimental structures. What is expected to be correct/which parts of the structure are potentially incorrect? What makes you think that the AF2 model is a good model of the OF conformation of GAT1?

      Unfortunately, an outward-facing structure of GAT1 is not available. We have initially worked with an outward-open homology model of GAT1 based on SERT (build with MODELLER), but the structural differences between SERT and GAT1 are sufficiently large that these models did not behave well in simulations and too frequently could not maintain a sealed inner gate, also forming a channel. In contrast to the SERT-based GAT1 model, the AlphaFold2 model of GAT1 behaved as expected and consistent with the behaviour of SERT in simulations and with general knowledge of protein dynamics from literature. Based on structural analysis of our simulations and on the comparison to SERT we could not identify a region of GAT1 which would be potentially behave incorrect or unexpectedly. We added a statement to the discussion on this potential limitation of the use of homology models.

      -  Fig 1a: Na+ densities are not very clear (both due to small size and the transparency). I have a hard time seeing where bulk, 2*bulk regions are --- are you showing "onion shells" of density? Perhaps investigate presenting as cuts through the full density?

      I like the labelling in terms of absolute density and multiples of bulk.

      We have created new images to improve the visualisation of data. The data are shown as onion shells (isosurface), with the shells at the indicated densities. This is now clearly stated. Transparency is needed, otherwise e.g. the inner onion shells would not be visible. The cut-through is intuitive, but we could not find a useful plain, as the densities are too extensively distributed in 3D and not on a single plain.

      -  Fig 1h-k: would be clearer if "recruitment site" (TMP?) was indicated in the figure.

      We have created a new image for the recruiting site (Figure 1b,c) and temporary site (Figure 1g) and indicated these two sites as appropriate.

      -  Show time series of Na+ binding with a suitable order parameter (z or distances to NA1 and NA2?) to show how ions bind spontaneously. Mark the different sites. Mark pre- and post-binding parts of trajectories.

      We have added time series for every simulation that shows sodium binding to the NA1 or NA2 to the supplementary information Figure 2a,b,c. These quantify the distances to the recruiting site, the temporary site and the respective sodium binding site.

      -  PCA - how much of the total variance was captured by PC1 and PC2?

      The variance captured by the PCs are shown as eigenvalues in supplementary information Figure 4. PC1 captures about 19% of the variance, PC2 8%.

      -  "We found that the inner hydrophobic gate is dynamic in the absence of Na2" -- is this instability due to the AF2 model or likely realistic? E.g. was similar behaviour ever observed in simulations of the occluded state?

      In simulations of the occluded state we do not see such instabilities as observed in the outward-open state in the absence of sodium (Figure 6). As these larger scale fluctuations are not randomly distributed across all simulations starting from the AlphaFold2 models, but confined to the systems without sodium, it is unlikely an effect of the AlphaFold2 model.

      Please note, we have seen comparable behaviour in simulations of SERT starting from experimental structures (Gradisch et al., 2024), therefore suggesting a more general mechanism.

      -  Cooperativity between GABA-binding and Na+ binding to NA1: How would this lead to an experimentally measurable signature, i.e., which experiments could validate this interesting prediction?

      Direct detection of cooperativity is difficult to separate from other effects in experiments, as sodium binding and transport involves NA1 and NA2, NA2 has a higher affinity according to our data, while mutations will not only affect cooperativity, but will also have other effects.

      Conformational changes can also complicate experimental detection, as NA2 stabilises the outward-open conformation, while NA1+GABA binding triggers the transition to the inward-open state. To quantify cooperativity, it would be important to isolate the cooperative from all other effects, which is a challenge. Support for cooperativity has been found by (Zhou, Zomot and Kanner, 2006; Meinild and Forster, 2012) using this route. In the first paper the authors make use of lithium that only binds to the NA2, even though lithium is not only a mere NA2 selective ligand and otherwise identical to sodium. By comparing two GABA concentrates the authors showed that the sodium dependence of GABA transport is left shifted at higher GABA concentrations, which is not the case in the absence of lithium. This data is indirect, but consistent with cooperativity between GABA and NA1-bound sodium, as GABA transport mainly reflects binding of sodium to NA1. Similar approaches could be further explored, for example by varying the GABA concentration instead of sodium. Other options could be to create an outward-facing and conformationally locked GAT1 and to measure the cooperativity of sodium and GABA binding using for example the scintillation proximity assay. Most likely the assay would also need a way to be NA2 binding independent. We are not aware of such a GABA transporter system.

      -  There are some instances of [SI Figure] or [citation needed] that should be cleaned up.

      We have corrected these instances.

      References

      Gradisch, R. et al. (2024) ‘Ligand coupling mechanism of the human serotonin transporter differentiates substrates from inhibitors’, Nature Communications, 15(1), p. 417. Available at: https://doi.org/10.1038/s41467-023-44637-6.

      Meinild, A.-K. and Forster, I.C. (2012) ‘Using lithium to probe sequential cation interactions with GAT1’, American Journal of Physiology. Cell Physiology, 302(11), pp. C1661-1675. Available at: https://doi.org/10.1152/ajpcell.00446.2011.

      Szöllősi, D. and Stockner, T. (2022) ‘Sodium Binding Stabilizes the Outward-Open State of SERT by Limiting Bundle Domain Motions’, Cells, 11(2), p. 255. Available at: https://doi.org/10.3390/cells11020255.

      Zhou, Y., Zomot, E. and Kanner, B.I. (2006) ‘Identification of a lithium interaction site in the gamma-aminobutyric acid (GABA) transporter GAT-1’, The Journal of Biological Chemistry, 281(31), pp. 22092–22099. Available at: https://doi.org/10.1074/jbc.M602319200.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Response to reviewer #1:

      We thank the reviewer for the further recommendations for improving our presentation. We would like to carefully address the remaining concerns of the reviewer.

      (1) I realize now that I didn't make my point clear enough, which was that as far as I know there is no reason to believe that an oscillatory state cannot be induced with synaptic depression as with spike frequency adaptation when used in the context of the author's model. I'm fine with how the authors have distinguished their model from R&T 2015, but I think the more interesting question is whether there is any reason to believe that STD is not equally capable of doing all the things mentioned in this paper as SFA, and if not why not. I would like the authors to go out on a limb and address this, if only with a few sentences in the discussion. 

      Thank you for pointing this out again. In response to your query regarding the comparison between STD and SFA in generating bump sweeps, we have done simulations based on STD. The results showed that both STD and SFA are capable of inducing bi-directional sweeps. However, (based on our simulations) only SFA can produce uni-directional sweeps. The absence of uni-directional sweeps based on STD may be due to the subtle yet important differences between the two mechanisms. Specifically, STD modulates the neural activity by weakening the recurrent connections, which theoretically can only inhibit recurrent inputs, while SFA can attenuate all forms of excitatory inputs, including external inputs. However, since we did not exhaustively explore the entire parameter space, we cannot conclude that STD is incapable of producing uni-directional sweeps. Future simulations are required.

      According to the Reviewer’s suggestion, we added few sentences to discuss the distinctions between STD and SFA in generating theta sweeps in the CANN in line 432 to 440 in the Discussion session:

      “Based on our simulation, both STD and SFA show the ability to produce bi-directional sweeps within a CANN model, with the SFA uniquely enabling uni-directional sweeps in the absence of external theta inputs. This difference might be due to the lack of exhaustively exploration of the entire parameter space. However, it might also attribute to the subtle yet important theoretical distinctions between STD and SFA. Specifically, STD attenuates the neural activity through a reduction in recurrent connection strength, whereas SFA provides inhibitory input directly to the neurons, potentially impacting all excitatory inputs. These differences might explain the diverse dynamical behaviors observed in our simulations. Future experiments could clarify these distinctions by monitoring changes in synaptic strength and inhibitory channel activation during theta sweeps.”

      (2) I appreciate the inclusion of the experimental data in Fig 6a (though I don't find the left-most panel very useful). I also understand what the authors are trying to convey with plots in 6c and 6c. However, I don't find the text that was added above very helpful at all. I was hoping for a simpler demonstration of the effect, by plotting a series of sequential sweeps (cell index vs time, with color indicating firing rate, as in Fig 2d) in the case of both the slow speed and fast speed regimes. Here, vertical lines could mark the individual theta cycles and the firing of individual cells, showing the constancy of the former but change of the latter. 

      Thank you for your constructive feedback. It seems there might be a misunderstanding in our previous explanation, for which we apologize. The phenomenon we want to elucidate is not an increase in the theta frequency as detected in LFPs, but rather the slope of phase precession with respect to the animal's movement speed. Due to phase precession, the oscillations of place cells as the animal traverses the field is higher than the theta frequency. A plot as Fig 2.d will not make this point clearer, since it shows the baseline theta frequency (i.e., theta sweeps as we claimed previously). A straightforward way of thinking this point is as we added previously: “…The faster the animal runs, the faster the extra half cycle can be accomplished. Consequently, the firing frequency will increase more (a steeper slope in Fig. 6c red dots) than the baseline frequency”. We hope this clarification addresses the concerns raised.

      (3) This is still confusing to me. I just don't understand how the *phase* of the oscillating activity bump has anything to do with the movement of the animal. I would like to see a plot of the sweeps (again, cell index vs time, with color indicating the firing rate) before and after inactivation for short and long duration inactivation. Perhaps I am not understanding or appreciating how the bump recovers after inactivation and how this is related to the motion of the animal. 

      Thank you for pointing this out again. The activity bump will naturally pop out at the input location (which moves forward than before) after we remove the inactivation and then starts to sweep again as before the inactivation. Single cell phase precession and populational theta sweeps are actually the two sides of the same coin (if all cells start at roughly the same phase in theta cycles). If the reviewer accept this, then at the new location, the activity bump sweeps again (around the new location), and therefore phase precession starts again at a further phase, since phase codes the position as the animal traverses the place field.

      (4) I am glad the authors are spending more time discussing this phenomenon, but I am unsure of their explanation: for a sweep moving at constant speed, neurons all along the path will be equally affected (inhibited), so where does the bias for suppressing the "end" neurons come from? 

      While it may appear that neurons along the path are equally inhibited as the bump sweeps over them, our model incorporates external inputs with Gaussian profiles. These inputs bias neurons closer to the input location, resulting in fewer activations in neurons further away from the input position.

      (5) Here I was hoping that the authors might comment on what they suspect happens when the animal starts (or stops) moving, and how the network shifts from tracking regime to oscillatory regime (or vice versa), as is typically seen in experimental data (see for example, Kay et al., 2020, fig 4b,c). My apologies for not making this point clearer. 

      Thank you for pointing this out. In our model, we observed that when the animal stops, the network continues to generate theta oscillations near the input location, albeit with reduced amplitude (so the network dynamics looks like in the tracking regime). However, we hypothesize that when the animal pauses its movement for enough time (immobile but awake states), sensory input into the hippocampus also decreases, which is similar to removing external inputs in our model. In this case, the activity bump spontaneously moves away, resembling the phenomenon of replay (see also Romani & Tsodyks 2015).

      Regarding the experimental data (Kay et al.), it indeed appears that theta sweeps decoded from neural activity become less pronounced when the mouse moves at slower speeds. This observation could potentially correspond to a decrease in the amplitude of bump oscillations when external inputs associated with movement are halted but not entirely removed in our model. However, in experiments, when the mouse's movement slows down, hippocampal activity no longer oscillates at theta frequency, making it challenging to decode theta sweeps.

      We appreciate your clarification on this point and recognize the importance of further investigating how our model can accurately replicate the transition between tracking and oscillatory regimes observed in experimental data.

    1. Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      (2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      - The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      (3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      (4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation can not simply be translated to the aperiodic signal (slope).

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      (5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      (6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges? Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      (7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      (8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx andGABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

    2. Author response:

      eLife assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are only very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies which relates MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 1.5 years.

      We took care of the validity of our results with two measures; first, assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 38 individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential  neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022).

      In the present study, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022)) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we will add the recently published MRS quality assessment form to the supplementary materials. Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we will more clearly indicate that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject.

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences.  

      In the revised manuscript, we will discuss the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable.

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we will clearly indicate that the exploratory correlation analyses are reported to put forth hypotheses for future studies.

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to  be predominantly driven by an effect of chronological age.

      In the revised manuscript, we will add the linear regressions with age as a covariate included below, for the relationship between aperiodic intercept and Glx concentration in the CC group. 

      a. A linear regression was conducted within the CC group to predict the intercept during visual stimulation, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.82_, t_(2,7)=16.1_, 𝑝=0.0024._ Note that the coefficient for age was not significant, 𝛽=0.007, t(7)=0.82, 𝑝=0.439. The regression coefficients and their respective statistics are presented in Author response table 1.

      Author response table 1.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Visual Stimulation) in the CC group

      b. A linear regression was conducted to predict the intercept during eye opening at rest, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.842_, t_(2,7)=18.6,  𝑝=0.00159_._ Note that the coefficient for age was not significant, 𝛽=−0.005, t(7)=−0.90, 𝑝=0.400. The regression coefficients and their respective statistics are presented in Author response table 2.

      Author response table 2.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Eyes Open) in the CC group

      c. Given that the Glx coefficient is significant in both models and age does not significantly predict either outcome, it can be concluded that Glx independently predicts the intercept of the aperiodic intercept.

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we will improve the phrasing. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we will add this analysis to the supplementary material.

      Author response image 1.

      Aperiodic intercept (top) and slope (bottom) for congenital cataract-reversal (CC, red) and age-matched normally sighted control (SC, blue) individuals. Distributions of these parameters are displayed as violin plots for three conditions; at rest with eyes closed (EC), at rest with eyes open (EO) and during visual stimulation (LU). Aperiodic parameters were calculated across electrodes Fp1 and Fp2. Solid black lines indicate mean values, dotted black lines indicate median values. Coloured lines connect values of individual participants across conditions.

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration. However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature.

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz).  

      As in the discussion section and response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the  Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we will more clearly indicate in the discussion that these are possible post-hoc interpretations. We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such. The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we will detail the advantages and disadvantages of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. It is worth noting that our EEG results fully align  with those of a larger sample of CC individuals (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures, as predicted.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      We marked the correlation analyses as exploratory; note that we do not base most of our discussion on the results of these analyses. As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot.  Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights.  

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      In the revised manuscript, we will add permutation analyses for our outcomes, as well as multiple regression models investigating whether the variance in visual history might have driven these results. Note that in the supplementary materials (S6, S7), we have reported the correlations between visual history metrics and MRS/EEG outcomes.

      The alpha level used for the ANOVA models specified in the methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age and sex. Moreover, the controls were recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition, the ANOVA conducted on the EEG metrics was 2x3 because it had group (CC, SC) and condition (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the methods and figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and will upload the consistency report with the revised supplementary material.

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we will cite those studies not already included in the introduction.

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/ sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects. There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. We will clarify our interpretation in the revised manuscript. Note that Ossandon et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range. We will allude to these findings in the revised manuscript.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in addition to monkey ECoG (Medel et al., 2020) (now published as (Medel et al., 2023)) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG. We will make more clear in the introduction of the revised manuscript that this metric is indirect.

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged . We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two channels, O1 and O2, neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023).

      In both published works, we did not consider frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations. The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used the cleanline.m function to remove line noise before filtering, and the group differences remained stable. We will report this analysis in the supplementary version of the revised manuscript. Further, both groups were measured in the same lab, making line noise as an account for the observed group effects highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition is below. Mean percentage of 6.25 long segments rejected in each group for the visual stimulation condition are also included, and will be added to the revised manuscript:

      Author response table 3.

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This will be explicitly stated in the revised manuscript.

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values.  Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023); The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group. We will add the fit quality metrics and show individual subjects’ fits in the revised manuscript.

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey, which was released in 2020 and uses linear combination modeling to fit the peak as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited spectral analysis toolbox at the time, and still is widely used. In the revised manuscript, we will include a supplementary section reanalyzing the main findings with Osprey.

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript we will remove the statement regarding stability, and add the Blant-Altman plot.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we will rephrase this sentence accordingly. 

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We indicate clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the discussion section as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revision, we will check that speculations are clearly marked and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript we will add “might imply” to better indicate the hypothetical character of this idea.

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we will add a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandon et al. We will adapt the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandon et al. (2023).

      References

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Medel, V., Irani, M., Ossandón, T., & Boncompte, G. (2020). Complexity and 1/f slope jointly reflect cortical states across different E/I balances. bioRxiv, 2020.09.15.298497. https://doi.org/10.1101/2020.09.15.298497

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the reviewers for their insightful comments, which have helped to improve the manuscript. We provide specific examples and a point-by-point response to all comments, below. Based on the Reviewers’ comments, we revised our manuscript, adding considerable amount of new data (found in Fig. 1A,B, 4E-G, 7C,D, 8C,E, S1B,C, S2C-G, S4C, and Video 1). In the main manuscript text, blue fonts indicate added or revised texts. An additional author (Lauren N. Juga) is added for the newly generated data in the revised manuscript.

      Reviewer #1: 

      Sekulovski et al present an interesting and timely manuscript describing the temporal transition from epiblast to amnion. The manuscript builds on their previous work describing this process using stem cell models. 

      They suggest a multi-step process initiated by BMP induction of GATA3, followed by expression of TFAP2A, followed by ISL1/HAND1 in parallel with loss of pluripotency markers. This transition was reproduced through IF analysis of CS6/7 NHP embryo. 

      There are significant similarities in the expression of trophectoderm and the amnion. There are also ample manuscripts showing trophoblast induction following BMP stimulation of primed pluripotent stem cells. The authors should ensure that the amnion indeed is only amnion and not trophectoderm (or the amount of contribution to trophectoderm). As an extension, does the amnion character remain after the 48h BMP4 treatment, and is a trophectoderm-like state adopted as suggested by Ohgushi et al 2022?  

      Thank you for this insightful comment. As pointed out, Ohgushi et al. showed that, in their culture method, amnion is first induced, and extended culturing leads to the formation of trophectoderm-like cells (Ohgushi et al., 2022).

      Importantly, we would like to note that our culture system differs substantially from that of Ohgushi et al. in several respects. First our system uses a 3D culture method while Ohgushi et al. employ 2D hPSC monolayers. Second, the two systems are chemically quite distinct. In our Glass-3D+BMP protocol, cells are cultured in mTeSR media (which contains FGF2 and TGFb1) for two days, by which time they generate 3D pluripotent cysts. BMP is then added to the culture medium for 24 hours, followed by another 24 hours without BMP4. In stark contrast, Ohgushi et al. employ A83-01, an Activin/Nodal signaling inhibitor, and PD173074, an FGF signaling inhibitor (a protocol which they call AP). This treatment leads to spontaneous activation of BMP signaling, but it also clearly inhibits Activin/Nodal and FGF signaling pathways, which remain active in our system. As a result of these distinct chemical as well as geometrical culturing protocols, their system produces amnion and trophectoderm, while our system produces exclusively amnion.

      Further analysis of gene expression data provides additional data supporting our contention that our system produces amnion. Though the gene expression profiles of amnion and trophectoderm are quite similar, specific markers of trophectoderm have been identified including GCM1, PSG1, PSG4 and CGB (Blakeley et al., 2015; Meistermann et al., 2021; Ohgushi et al., 2022; Okae et al., 2018; Petropoulos et al., 2016; Yabe et al., 2016). Importantly, while all of these markers are abundantly expressed in the Ohgushi et al. system, bulk RNA sequencing analysis of our Glass-3D+BMP hPSC-amnion cells reveals that none of these markers are detectable. Indeed, SDC1, a marker that Ohgushi et al. claim distinguishes trophoblast from amnion actually decreases (more than 8-fold) as pluripotent cysts transition to amnion in Glass3D+BMP. Finally, Ohgushi et al. report that ISL1, a key marker of specified amnion population, is initially increased in their system, but is reduced to a basal level overtime. In contrast, in Glass3D+BMP hPSC-amnion, ISL1 expression continuously increases with time, and ISL1 protein expression is seen uniformly throughout the amnion cysts. This uniform expression is also seen in CS6/7 cynomolgus macaque amnion. Together, these results support out conclusion that the Glass-3D+BMP system leads to the formation of amniotic cells, and not trophectoderm cells.

      The functional data does not support a direct function of GATA3 prior to TFAP2A and the authors suggest compensatory mechanisms from other GATAs. If so, which GATAs are expressed in this system, with and without GATA3 targeting? Would it not be equally likely that the other early genes could be the key drivers of amnion initiation, such as ID2? 

      We appreciate this helpful comment. We agree that our data do not provide sufficient evidence for the role of GATA3 in early amniogenesis. We also agree that other early genes could be key drivers, and apologize for including our speculation that focuses only on GATA2. GATA2 was selected because, among the other GATAs, GATA2 and GATA3 are the only abundantly expressed GATA factors. This point suggesting a potentially redundant role of GATA2 is now removed from the manuscript (Line#355 of the original manuscript).

      The targeting of TFAP2A displays a very interesting phenotype which suggests that amnion and streak share an initial trajectory but where TFAP2A is necessary to adopt amnion fate. It would again be important to ensure that this alternative fate is indeed in streak and not misannotated alternative lineages, including trophoblast. 

      Is TBXT induced in this setting as well as in the wt situation during amnion induction? This should be displayed as in Figure 3D and would be nice to be complimented by NHP IF analysis.

      We will address these two closely related comments together.

      TFAP2A-KO cysts contain ISL1+ squamous cells as well as SOX2+ pluripotent cells, suggesting that, while the initial focal amniogenesis is seen, subsequent spreading event is not seen. Interestingly, our new data show that TFAP2A-KO cysts display cells with high TBXT expression (Fig. 8E, Line#373-374). This result suggests that, in the absence of TFAP2A, once amnion lineage progression is halted, more primitive streak-like (TBXThigh) lineage emerges. It is important to note that TBXT expression is not seen in the trophectoderm population of cynomolgus macaque peri-gastrula (Sasaki et al., 2016; Yang et al., 2021).

      As suggested, we now include a TBXT expression time course during hPSC-amnion formation in Fig. S2D of the revised manuscript. These data show weak TBXT expression (transcripts) starting at the 24-hr timepoint. However, a clear TBXT protein signal could not be detected using IF (Fig. S2C), likely because TBXT expression is very low (Line#264-265). While statistically significant compared to the 12-hr timepoint, TBXT expression is 31 FPKM +/- 0.8 (standard deviation) at 24-hr and 48 FPKM +/- 6 at 48-hr. These are low expression values compared to, for example, TFAP2A, which displays 572 FPKM +/- 23 at 12-hr and 1169 FPKM +/- 27 at 24-hr, at which TFAP2A is readily detected using IF. While weak nuclear TFAP2A is seen using IF at 6hr (187 FPKM +/- 7), no clear TFAP2A is detected at 3-hr (74 FPKM +/- 7). Another example is ISL1, which displays 758 FPKM +/- 55 at 24-hr and 1505 FPKM +/- 26 at 48-hr, when ISL can be detected using IF. Importantly, we were not able to detect ISL1 protein expression using IF at

      12-hr, at which its expression level is 12 FPKM +/-18. Lastly, we now show that, in the cynomolgus macaque peri-gastrula, while pSMAD1/5+ primitive streak-derived disseminating cells show abundant TBXT expression, no clear TBXT expression is seen in the amnion territory (Fig. S2G, Line#291-293). 

      Together, these results show that while a TBXTlow state clearly emerges during hPSC-amnion development, in wild-type hPSC cultured in Glass-3D+BMP, TBXT levels remain low throughout amnion differentiation. However, in the absence of TFAP2A, a TBXThigh state is seen, suggesting that TFAP2A is critical for suppressing this TBXThigh state in fate spreading cells, perhaps by preventing BMP responding cells from acquiring embryonic lineages (e.g., mesodermal and/or primordial germ cells).

      The authors should address why they get different results from Castillo-Venzor et al 2023 DOI: 10.26508/lsa.202201706  

      Thank you very much for this helpful suggestion, and we now include a section detailing this in the Discussion (Line#410-432). In short, we propose several possibilities. First, culturing conditions are highly distinct. Castillo-Venzor et al. (Castillo-Venzor et al., 2023) utilize initial “pre-mesoderm” conditioning by Activin and CHIR, followed by treating floating embryoid bodies with a growth factor cocktail (BMP, SCF, EGF and LIF). In contrast, our system (Glass-3D+BMP) employs BMP stimulation of pluripotent cysts. Thus, we suspect that, in the PGCLC differentiation condition, cells are conditioned to the pre-mesodermal lineage. Moreover, we propose that amnion fate spreading may not be present in the PGCLC system, perhaps due to differences in geometry (aggregates versus cysts), or due to differing lineage commitment programs. That is, while initial amniogenesis is seen in the PGCLC system, most cells may already be committed to the PGC-like or mesodermal lineages by the time amnion fate spreading can occur. Alternatively, because several cell types (PGC-like, mesodermal and amniotic) co-exist in the culture by Castillo-Venzor et al., PGC-like and/or mesodermal cells may compensate for the loss of TFAP2A.

      Reviewer #2: 

      In this study, Sekulovski and colleagues report refinements to an in vitro model of human amnion formation. Working with 3D cultures and BMP4 to induce differentiation, the authors chart the time course of amnion induction in human pluripotent stem cells in their system using immunofluorescence and RNA-seq. They carry out validation through comparison of their data to existing embryo datasets, and through immunostaining of post-implantation marmoset embryos. Functional experiments show that the transcription factor TFAP2C drives the amnion differentiation program once it has been initiated. 

      There is currently great interest in the development of in vitro models of human embryonic development. While it is known that the amnion plays an important structural supporting role for the embryo, its other functions, such as morphogen production and differentiation potential, are not fully understood. Since a number of aspects of amnion development are specific to primates, models of amniogenesis will be valuable for the study of human development. Advantages of this model include its efficiency and the purity of the cell populations produced, a significant degree of synchrony in the differentiation process, benchmarking with single-cell data and immunocytochemistry from primate embryos, and identification of key markers of specific phases of differentiation. Weaknesses are the absence of other embryonic tissues in the model, and overinterpretation of certain findings, in particular relating bulk RNA-seq results to scRNA-seq data from published analyses of primate embryos and results from limited (though high quality) embryo immunostainings.  

      We are happy that Reviewer #2 agrees that our Glass-3D+BMP model is important for investigating additional roles of amniogenesis, as well as roles of amnion as a signaling hub, due to the purity of the amniotic cell population, and a high degree of synchrony of differentiation.

      We respectfully disagree that the absence of other embryonic tissues in the model is a weakness: rather, we believe it is a strength because this single lineage amnion model allows us to directly (and independently) investigate mechanisms underlying amnion lineage progression. For example, as noted above in our response to Reviewer #1, use of our hPSCamnion model allowed us to see a very specific and interesting phenotype in the absence of TFAP2A (reduced amnion formation and emergence of an alternative lineage), though previous findings by Castilllo-Venzor et al. concluded that amniogenesis is not affected by loss of TFAP2A. We noted that the culture method used by Castillo-Venzor et al. contains several cell types (amniotic, mesodermal and PGC-like), and that amniogenesis may be intact in that model due to compensation by the presence of these other cell types. That is, while cell-cell interactions can indeed be gleaned in culture systems with several cell types, the presence of multiple cell types and their additional signaling inputs can also confound some aspects of mechanistic investigations. We now include a paragraph in the Discussion of the revised manuscript (Line#410-432), in which we detail these ideas, and suggest that, because of the cell purity, our Glass-3D+BMP model enables robust mechanistic examinations, specifically during amnion formation.

      We address Reviewer #2’s point about bulk vs. single cell transcriptomic similarity analysis in Reviewer’s specific point #4 below. We do, however, want to note here that we have performed the same analysis using a 14-day old cynomolgus macaque peri-gastrula single cell RNA sequencing dataset generated by Yang et al. (Yang et al., 2021), and obtained a lineage trajectory (Fig. 4F, Line#265-268) similar to that seen when the Tyser et al. dataset (Tyser et al., 2021) was used (Fig. 4C).

      Importantly, while cynomolgus macaque early embryo samples are limited, we now include additional staining (Fig. S2G). 

      Reviewer #2 (Recommendations For The Authors): 

      Provide more confirmation of key findings in more than one stem cell line. 

      We now confirm key findings in the H7 human embryonic stem cell line (Fig. S1C).

      Provide stronger evidence e.g. scRNA-seq to support the existence of intermediate cells or tone down the conclusions.  

      We agree that this is a very important point. In our recent study (Sekulovski et al., 2023), we performed single cell RNA sequencing of Gel-3D, another hPSC-amnion model. In this study, we comprehensively described the transcriptome associated with the “intermediate” cell types, as well as CLDN10 as a marker of these cell types. Moreover, we now include additional data showing the molecular characteristics of the TBXTlow intermediate cells during amniogenesis in hPSC-amnion (Fig. S2C, S2D) and d14 cynomolgus macaque peri-gastrula (Fig 4G, replot of single cell RNAseq by (Yang et al., 2021), Line#264-268).

      Provide more data on the expression of DLX5 in the model. 

      We now provide a DLX5 staining time course in Fig. 7C. We find that, similar to ISL1, prominent DLX5 staining is seen in the focal cells at 24-hr post-BMP. Interestingly, at 48-hr, while some cells show high levels of DLX5, some cells show low DLX5 levels; this is of an interest for future investigations.

      (1) L159 - the authors should repeat more of the key results in at least one other hPSC line, to ensure reproducibility of the method. Figure S1 contains minimal information (one timepoint, three genes, one biological replicate) on a single different hPSC line. 

      We now include additional validation analysis using the H7 human ESC line (Fig. S1).

      (2) Figure 1- it is a little difficult to appreciate cyst formation from images taken at one level in the stack, can the authors perhaps show a 3D rendering or video to display morphogenesis better? 

      We now provide all optical sections of cysts shown in Movie 1.

      (3) Figure 1-did the authors carry out podocalyxin staining? This is a standard marker for lumenogenesis.  

      We now provide PODXL staining (Fig. 1A,1B).

      (4) L248 onwards and Figure 4-I am a little skeptical concerning conclusions drawn from an overlay of bulk RNA-seq onto scRNA-seq UMAP plots. I think the authors need to provide some strong justification for this approach. I would be particularly careful about concluding that cells depicted in Fig 4D represent an intermediate close to primitive streak and even more careful about claiming any lineage relationship between T-positive "primitive streak like intermediates" and the trajectory of cells in the model. UMAP is a dimension-reduction technique for the visualization of clusters in high-dimensional data. It is not a lineage-tracing methodology. It would have been preferable for the authors to present their own scRNA-seq data from the model.  

      We are sorry that it was not clear that our approach to find similarity between bulk and single cell RNA-seq data is largely based on a published work (Granja et al., Nature Biotechnology 2019, (Granja et al., 2019)) named projectLSI. Please refer to our Methods section for details of the implementation and how we modified it for better visualization (addressed in Line#667-676 of the original manuscript, now in Line#718-730). The performance of projectLSI was extensively evaluated in the original article. Furthermore, as pointed out, UMAP is indeed a dimension reduction method that has been widely used in single cell RNA-seq research. In addition to visualizing clusters, trajectory analysis, such as RNA-velocity (which is used in this study), is another successful and widely adapted application of UMAP to gauge fate progression. Therefore, we believe that UMAP can be effectively used as a lineage prediction methodology, and that our use of bulk to single cell transcriptomic similarity analysis leveraging projectLSI is well justified at conceptual and technical levels.

      As illustrated in Fig. 5A, we performed RNA-velocity analysis of the Tyser et al. dataset, and our result clearly predicts a differentiation trajectory from Epiblast, a part of the TBXTlow population shown in Fig. 4D, and, then, to Ectoderm/Amnion cells. Consistent with this bioinformatic result, we now show that some cells show some but weak TBXT expression (at the transcript level) at the 24-hr post-BMP timepoint in control hPSC-amnion (Fig. S2D, Line#264-265). Importantly, our conclusion is drawn from a trajectory based on our time course (0, 0.5, 1, 3, 6, 12, 24, and 48 hours post-BMP treatment) which shows a clear transition from epiblast cells to TBXTlow and then finally to the ectoderm/amnion population. Moreover, using the transcriptomic similarity analysis, we found that the loss of TFAP2A leads to emergence of more primitive streak-like transcriptional characteristics (Fig. 8D). Indeed, using IF, we now show that several fate spreading cells in the TFAP2A-KO cysts are TBXThigh (Fig. 8E, Line#373-374). Thus, the new data provide additional evidence for the successful implementation of this bulk/single cell transcriptomic similarity analysis.

      Together, our bioinformatic and localization analyses show that the Glass-3D+BMP system recapitulates the trajectory found in our Tyser et al. RNA-velocity analysis, further supporting the validity of this differentiation trajectory. To avoid confusion, however, we now omit the “primitive streak-like” phrase when describing the TBXTlow cells because, while they may show some TBXT expression, they are likely intermediate fate transitioning cells. Indeed, a recent study by Ton et al. (Ton et al., 2023) showed that the Tyser et al. Primitive Streak cells consist of a mix of several lineage progressing cells (e.g., Epiblast, Non-neural ectoderm, Anterior or caudal primitive streak, PGC). Therefore, these cells are now specifically described as “TBXTlow” state; TBXThigh cells are described as primitive streak-like state.

      (5) L276 Tyser data do come from a primate model; the authors mean NHP.  

      We now specifically state that the validation is performed in a non-human primate model (Line#280).

      (6) Figure 5-though the immunostaining of the CS6/7 monkey embryos is excellent, the authors should not overinterpret these images. What is shown is not a time course, and one can only infer that a particular pattern of gene expression exists in a spatial sense from these images. In the model (Figure 2), the epiblast markers gradually fade and overlap for a time with emergent amnion markers, but in Figure 5 the transition between epiblast and amnion in the embryo seems pretty sharp, at least in terms of gene expression. There may be a few cells in D that show overlap of SOX2 and TFAP2A, but if the authors want to claim that a transition zone exists, they need to produce stronger evidence. Figure 7 is more convincing but see the next point. 

      Thank you for this insightful comment. We now address the nature of the transitioning boundary cell population extensively in our other recent study (Sekulovski et al., 2023).

      (7) Figure 7 further confuses the issue. A zone at either end of the epiblast is clearly positive for Sox2 and the two amnion markers, clearer than in Figure 5, but why does the marker DLX5 overlap with SOX2 in the embryo (7d) but not the model (7C)? Arguments regarding intermediate cell populations would be greatly strengthened by scRNA-seq data on the model system. 

      In our original manuscript, our DLX5 staining was performed at 48-hr post-BMP, at which SOX2 expression is absent in all cells. Our new analysis at the 24-hr timepoint now shows that DLX5 is expressed in SOX2+ cells (this is now presented in Fig. 7C).

      As stated in the point #6, our recent study comprehensively describes the transcriptomic and spatial characteristics of the transitioning boundary cell population (Sekulovski et al., 2023).

      (8) L357 TFAP2C KO does not resemble intermediate cysts in Figure 2. In Figure 2, both SOX2 and amnion markers are co-expressed in the same cells. In 8C, SOX2 and ISL1 are mutually exclusive.  

      We agree with this comment, and now removed this statement pointing out the resemblance (Line#359 of the original manuscript).

      (9) Figure 8d-the same caveats noted above regarding the interpretation of superposition of bulk RNA-seq data with scRNA-seq UMAP analysis apply here.  

      Please refer to our explanation in point#4.

      Reviewer #3: 

      In this work, the authors tried to profile time-dependent changes in gene and protein expression during BMP-induced amnion differentiation from hPSCs. The authors depicted a GATA3 - TFAP2A - ISL1/HAND1 order of amniotic gene activation, which provides a more detailed temporary trajectory of amnion differentiation compared to previous works. As a primary goal of this study, the above temporal gene/protein activation order is amply supported by experimental data. However, the mechanistic insights on amniotic fate decision, as well as the transcriptomic analysis comparing amnion-like cells from this work and other works remain limited. While this work allows us to see more details of amnion differentiation and understand how different transcription factors were turned on in a sequence and might be useful for benchmarking the identity of amnion in ex utero cultured human embryos/embryoids, it provides limited insights on how amnion cells might diverge from primitive streak / mesoderm-like cells, despite some transcriptional similarity they shared, during early development.  

      We are happy that Reviewer #3 appreciates that our model can be used effectively to identify previously unrecognized amniotic gene activation cascade, providing a comprehensive timecourse transcriptomic resource.

      As detailed below, we address specific concerns raised by Reviewer #3. We now provide additional mechanistic insights into amnion fate progression, and include additional transcriptomic comparisons with a cynomolgus macaque single cell RNA sequencing dataset.

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors generated KO cell lines lacking GATA3 and TFAP2A, respectively. Their results showed some disrupted amnion differentiation only in TFAP2A-KO. Therefore, these data do not provide sufficient evidence to support whether these transcription factors are crucial for amnion fate specification. Perhaps an experiment could be done with overexpression of these markers and testing if they could force hPSC to adopt amnion-like fate.  

      Thank you for this insightful comment. We generated cell lines that enable us to inducibly express GATA3 or TFAP2A, and the transgene expression was induced at d2 (when BMP treatment is normally initiated) until d4. However, this inducible expression did not lead to amniogenesis, and cysts maintained pluripotency. Due to the uninterpretable nature, these results are not included in the revised manuscript.

      As detailed extensively in the manuscript, within each cyst, amniogenesis is initially seen focally, then spreads laterally resulting in fully squamous amnion cysts. This is also seen in our previously published Gel-3D amnion model (extensively described in (Shao et al., 2017)). In the absence of TFAP2A, we showed that the focal amniogenesis is observed, but spreading is not seen, suggesting that TFAP2A controls amnion fate progression. Therefore, while TFAP2A is not critical for the amnion fate specification in the focal cells, our results show that TFAP2A indeed helps to promote amniotic specification of cells neighboring the focal amniotic cells. Moreover, in the revised manuscript, we now show that TFAP2A transgene expression in the TFAP2A-KO background restores formation of fully squamous hPSC-amnion, further establishing the role of TFAP2A in amnion fate progression (Fig. 8C of the revised manuscript, Line#362-364).

      (2) The transcriptomic analysis made by the authors provides some comparison between BMPinduced amnion-like cells in vitro and the amnion-like cells from CS7 human embryo in vivo. However, the data set from the human embryo contains only a limited number of cells, and might not provide a sufficient base for decisive assessment of the true identity of amnion-like cells obtained in vitro. It might help if the authors could integrate their bulk sequencing data with other primate embryo data sets.  

      Thank you for this helpful comment. We have now performed our transcriptional similarity analysis using early (day 14) cynomolgus macaque embryo datasets generated in a study by (Yang et al., 2021), and found that the bulk time-course transcriptome of our hPSC-amnion model overlaps with the cynomolgus macaque amniotic lineage progression (Fig. 4F, Line#265268). We also now provide the expression of key markers within the Yang et al. dataset (GATA3, TFAP2A, ISL1, TBXT, DLX5, Fig. 4G, S2F).

      (3) Following the point above, the authors used transcriptomic analysis to identify several intermediate states of cells during amnion differentiation and claimed that there is a primitivestreak-like intermediate. However, this might be an overstatement. During stem cell culture and differentiation, intermediate states showing a mixture of biomarkers are very common and do not imply that such intermediates have any biological meaning. However, stating that amnion differentiation passes through primitive streak-like intermediates, might imply a certain connection between these two lineages, for which there is a lack of solid support. Instead, a more interesting question might be how amnion and primitive streak differentiation, despite some transcriptomic similarity, diverge from each other during early development. What factors make this difference? The authors might further analyze RNA-seq data to provide some insights.  

      Thank you very much for the insightful comments. 

      We understand Reviewer #3’s concern that the intermediate state that we see may not recapitulate a primitive streak-like state. However, in our original manuscript, we described these cells as “Primitive Streak-like” because those cells were annotated as Primitive Streak in the dataset by Tyser et al. Interestingly, a recent study by Ton et al. showed that the Tyser et al. Primitive Streak cells actually consist of a mixture of different cell lineages (e.g., Epiblast, Nonneural ectoderm, Anterior or caudal primitive streak, PGC (Ton et al., 2023)). Therefore, we agree that it was an overstatement to call them “Primitive Streak-like”, and, to avoid confusions, we now label the TBXTlow sub-population found in the Tyser et al. Primitive Streak population as “TBXTlow state” throughout the manuscript.

      Our data indicate that TFAP2A may play a role in controlling the lineage decision between amnion and primitive streak cells that abundantly express TBXT (TBXThigh). In the original manuscript, we included data showing that 48-hr TFAP2A-KO cysts show transcriptomic characteristics similar to some Primitive Streak cells (Fig. 8D). Intriguingly, our new data show that, in the absence of TFAP2A, some TBXThigh cells are indeed seen (Fig. 8E, Line#373-374). These results provide a body of evidence for the role of TFAP2A in promoting the amniotic lineage, perhaps by suppressing the TBXThigh state. This point is now addressed in the Discussion (Line#401-409).

      Additional new data:

      Using Western blot, we now show that GATA3 is absent in the GATA3-KO lines (Fig. S4C). We noticed that this was lacking in the original manuscript.

      We now show that an inducible expression of TFAP2A in the TFAP2A-KO cysts leads to controllike cysts (Fig. 8C, Line#362-364).

      Additional changes:

      Typos were fixed in Fig. 5I – “boundary” and “disseminating” were not spelled correctly.

      Line#350 – we originally noted “GATA3 expression precedes TFAP2A expression by approximately 12 hours”. This was incorrect, and is changed to 9 hours in the revised manuscript. We apologize for this mistake.

      REFERENCES

      Blakeley, P., Fogarty, N.M., del Valle, I., Wamaitha, S.E., Hu, T.X., Elder, K., Snell, P., Christie, L., Robson, P., and Niakan, K.K. (2015). Defining the three cell lineages of the human blastocyst by single-cell RNA-seq. Development 142, 3151-3165.

      Castillo-Venzor, A., Penfold, C.A., Morgan, M.D., Tang, W.W., Kobayashi, T., Wong, F.C., Bergmann, S., Slatery, E., Boroviak, T.E., Marioni, J.C., et al. (2023). Origin and segregation of the human germline. Life Sci Alliance 6.

      Granja, J.M., Klemm, S., McGinnis, L.M., Kathiria, A.S., Mezger, A., Corces, M.R., Parks, B., Gars, E., Liedtke, M., Zheng, G.X.Y., et al. (2019). Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nature biotechnology 37, 1458-1465. Meistermann, D., Bruneau, A., Loubersac, S., Reignier, A., Firmin, J., Francois-Campion, V., Kilens, S., Lelievre, Y., Lammers, J., Feyeux, M., et al. (2021). Integrated pseudotime analysis of human pre-implantation embryo single-cell transcriptomes reveals the dynamics of lineage specification. Cell stem cell 28, 1625-1640 e1626.

      Ohgushi, M., Taniyama, N., Vandenbon, A., and Eiraku, M. (2022). Delamination of trophoblastlike syncytia from the amniotic ectodermal analogue in human primed embryonic stem cellbased differentiation model. Cell reports 39, 110973.

      Okae, H., Toh, H., Sato, T., Hiura, H., Takahashi, S., Shirane, K., Kabayama, Y., Suyama, M., Sasaki, H., and Arima, T. (2018). Derivation of Human Trophoblast Stem Cells. Cell stem cell 22, 50-63 e56.

      Petropoulos, S., Edsgard, D., Reinius, B., Deng, Q., Panula, S.P., Codeluppi, S., Plaza Reyes, A., Linnarsson, S., Sandberg, R., and Lanner, F. (2016). Single-Cell RNA-Seq Reveals Lineage and X Chromosome Dynamics in Human Preimplantation Embryos. Cell 165, 1012-1026.

      Sasaki, K., Nakamura, T., Okamoto, I., Yabuta, Y., Iwatani, C., Tsuchiya, H., Seita, Y., Nakamura, S., Shiraki, N., Takakuwa, T., et al. (2016). The Germ Cell Fate of Cynomolgus Monkeys Is Specified in the Nascent Amnion. Developmental cell 39, 169-185.

      Sekulovski, N., Juga, L.L., Cortez, C.L., Czerwinski, M., Whorton, A.E., Spence, J.R., Schmidt, J.K., Golos, T.G., Gumucio, D.L., Lin, C.-W., et al. (2023). Identification of amnion progenitor-like cells at the amnion-epiblast bounday in the primate peri-gastrula. bioRxiv doi:

      10.1101/2023.09.07.556553.

      Shao, Y., Taniguchi, K., Townshend, R.F., Miki, T., Gumucio, D.L., and Fu, J. (2017). A pluripotent stem cell-based model for post-implantation human amniotic sac development. Nature communications 8, 208.

      Ton, M.N., Keitley, D., Theeuwes, B., Guibentif, C., Ahnfelt-Ronne, J., Andreassen, T.K., Calero-Nieto, F.J., Imaz-Rosshandler, I., Pijuan-Sala, B., Nichols, J., et al. (2023). An atlas of rabbit development as a model for single-cell comparative genomics. Nature cell biology 25, 10611072.

      Tyser, R.C.V., Mahammadov, E., Nakanoh, S., Vallier, L., Scialdone, A., and Srinivas, S. (2021). Single-cell transcriptomic characterization of a gastrulating human embryo. Nature 600, 285289.

      Yabe, S., Alexenko, A.P., Amita, M., Yang, Y., Schust, D.J., Sadovsky, Y., Ezashi, T., and Roberts, R.M. (2016). Comparison of syncytiotrophoblast generated from human embryonic stem cells and from term placentas. Proceedings of the National Academy of Sciences of the United States of America 113, E2598-2607.

      Yang, R., Goedel, A., Kang, Y., Si, C., Chu, C., Zheng, Y., Chen, Z., Gruber, P.J., Xiao, Y., Zhou, C., et al. (2021). Amnion signals are essential for mesoderm formation in primates. Nature communications 12, 5126.

    1. Contents move to sidebar hide (Top) 1Text 2Printing history 3The production process: Das Werk der Bücher Toggle The production process: Das Werk der Bücher subsection 3.1Pages 3.2Ink 3.3Type 3.4Type style 3.5Rubrication, illumination and binding 4Early owners 5Influence on later Bibles 6Forgeries 7Surviving copies Toggle Surviving copies subsection 7.1Substantially complete copies 8Recent history 9See also 10General bibliography 11References 12External links Toggle the table of contents Gutenberg Bible 48 languages العربية閩南語 / Bân-lâm-gúБеларускаяБеларуская (тарашкевіца)БългарскиCatalàČeštinaCymraegDanskDeutschEestiΕλληνικάEspañolEsperantoEstremeñuEuskaraفارسیFrançaisFrysk한국어Հայերենहिन्दीHrvatskiBahasa IndonesiaInterlinguaItalianoעבריתქართულიLatviešuМакедонскиമലയാളംमराठीNederlands日本語Norsk bokmålPolskiPortuguêsРусскийSimple EnglishSlovenčinaСрпски / srpskiSuomiSvenskaதமிழ்TürkçeУкраїнськаاردو中文 Edit links ArticleTalk English ReadEditView history Tools Tools move to sidebar hide Actions ReadEditView history General What links hereRelated changesUpload fileSpecial pagesPermanent linkPage informationCite this pageGet shortened URLDownload QR codeWikidata item Expand allEdit interlanguage links Print/export Download as PDFPrintable version In other projects Wikimedia Commons From Wikipedia, the free encyclopedia Earliest major book printed in Europe The copy of the Gutenberg Bible held at the Richelieu - Bibliothèques, musée, galeries. The Gutenberg Bible, also known as the 42-line Bible, the Mazarin Bible or the B42, was the earliest major book printed in Europe using mass-produced metal movable type. It marked the start of the "Gutenberg Revolution" and the age of printed books in the West. The book is valued and revered for its high aesthetic and artistic qualities[1] and its historical significance. The Gutenberg Bible is an edition of the Latin Vulgate printed in the 1450s by Johannes Gutenberg in Mainz, in present-day Germany. Forty-nine copies (or substantial portions of copies) have survived. They are thought to be among the world's most valuable books, although no complete copy has been sold since 1978.[2][3] In March 1455, the future Pope Pius II wrote that he had seen pages from the Gutenberg Bible displayed in Frankfurt to promote the edition, and that either 158 or 180 copies had been printed. The 36-line Bible, said to be the second printed Bible, is also sometimes referred to as a Gutenberg Bible, but may be the work of another printer.[4] Text[edit] Gutenberg Bible in the Beinecke Rare Book & Manuscript Library at Yale University in New Haven, Connecticut The Gutenberg Bible, an edition of the Vulgate, contains the Latin version of the Hebrew Old Testament and the Greek New Testament. It is mainly the work of St Jerome who began his work on the translation in AD 380, with emendations from the Parisian Bible tradition, and further divergences.[5] Printing history[edit] Gutenberg Bible of the New York Public Library; purchased by James Lenox in 1847, it was the first Gutenberg Bible to be acquired by a United States citizen. While it is unlikely that any of Gutenberg's early publications would bear his name, the initial expense of press equipment and materials and of the work to be done before the Bible was ready for sale suggests that he may have started with more lucrative texts, including several religious documents, a German poem, and some editions of Aelius Donatus's Ars Minor, a popular Latin grammar school book.[6][7][8] Preparation of the Bible probably began soon after 1450, and the first finished copies were available in 1454 or 1455.[9] It is not known exactly how long the Bible took to print. The first precisely datable printing is Gutenberg's 31-line Indulgence which certainly existed by 22 October 1454.[10] Gutenberg made three significant changes during the printing process.[11] Spine of the Lenox copy Some time later, after more sheets had been printed, the number of lines per page was increased from 40 to 42, presumably to save paper. Therefore, pages 1 to 9 and pages 256 to 265, presumably the first ones printed, have 40 lines each. Page 10 has 41, and from there on the 42 lines appear. The increase in line number was achieved by decreasing the interline spacing, rather than increasing the printed area of the page. Finally, the print run was increased, necessitating resetting those pages which had already been printed. The new sheets were all reset to 42 lines per page. Consequently, there are two distinct settings in folios 1–32 and 129–158 of volume I and folios 1–16 and 162 of volume II.[11][12] The most reliable information about the Bible's date comes from a letter. In March 1455, the future Pope Pius II wrote that he had seen pages from the Gutenberg Bible, being displayed to promote the edition, in Frankfurt.[13] It is not known how many copies were printed, with the 1455 letter citing sources for both 158 and 180 copies. Scholars today think that examination of surviving copies suggests that somewhere between 160 and 185 copies were printed, with about three-quarters on paper and the others on vellum.[14][15] The production process: Das Werk der Bücher[edit] A vellum copy of the Gutenberg Bible owned by the U.S. Library of Congress, on display at the Thomas Jefferson Building in Washington, D.C. In a legal paper, written after completion of the Bible, Johannes Gutenberg refers to the process as Das Werk der Bücher ("the work of the books"). He had introduced the printing press to Europe and created the technology to make printing with movable types finally efficient enough to facilitate the mass production of entire books.[16] Many book-lovers have commented on the high standards achieved in the production of the Gutenberg Bible, some describing it as one of the most beautiful books ever printed. The quality of both the ink and other materials and the printing itself have been noted.[1] Pages[edit] First page of the first volume: the epistle of St Jerome to Paulinus from the University of Texas copy. The page has 40 lines. The paper size is 'double folio', with two pages printed on each side (four pages per sheet). After printing the paper was folded once to the size of a single page. Typically, five of these folded sheets (ten leaves, or twenty printed pages) were combined to a single physical section, called a quinternion, that could then be bound into a book. Some sections, however, had as few as four leaves or as many as twelve leaves.[17] Gutenberg Bible on display at the U.S. Library of Congress The 42-line Bible was printed on the size of paper known as 'Royal'.[18] A full sheet of Royal paper measures 42 cm × 60 cm (17 in × 24 in) and a single untrimmed folio leaf measures 42 cm × 30 cm (17 in × 12 in).[19] There have been attempts to claim that the book was printed on larger paper measuring 44.5 cm × 30.7 cm (17.5 in × 12.1 in),[20] but this assertion is contradicted by the dimensions of existing copies. For example, the leaves of the copy in the Bodleian Library, Oxford, measure 40 cm × 28.6 cm (15.7 in × 11.3 in).[21] This is typical of other folio Bibles printed on Royal paper in the fifteenth century.[22] Most fifteenth-century printing papers have a width-to-height ratio of 1:1.4 (e.g. 30:42 cm) which, mathematically, is a ratio of 1 to the square root of 2 or, simply, 2 {\textstyle {\sqrt {2}}} . Many suggest that this ratio was chosen to match the so-called Golden Ratio, 1 + 5 2 {\textstyle {\tfrac {1+{\sqrt {5}}}{2}}} , of 1:1.6; in fact the ratios are, plainly, not at all similar (equating to a difference of about 12 per cent). The ratio of 1:1.4 was a long established one for medieval paper sizes.[23] A single complete copy of the Gutenberg Bible has 1,288 pages (4×322 = 1288) (usually bound in two volumes); with four pages per folio-sheet, 322 sheets of paper are required per copy.[24] The Bible's paper consists of linen fibers and is thought to have been imported from Caselle in Piedmont, Italy based on the watermarks present throughout the volume.[25] Ink

      we have

      FORK LYFT

      https://philosophybreak.com/articles/if-a-tree-falls-in-the-forest-and-theres-no-one-around-to-hear-it-does-it-make-a-sound/#:~:text=So%2C%20the%20answer%20to%20this%20age-old%20question%20seems,lonesome%20falling%20tree%20does%20not%20make%20a%20sound.

      [

      Philosophy BreakYour home for learning about philosophy

      ](https://philosophybreak.com/)

      CoursesReading ListsLatest BreaksAbout UsSign In

      Join 12,000+ Subscribers

      [

      Courses

      Introductory philosophy courses distilling the subject's greatest wisdom.

      ](https://philosophybreak.com/courses/)

      [

      Reading Lists

      Curated reading lists on philosophy's best and most important works.

      ](https://philosophybreak.com/reading-lists/)

      [

      Latest Breaks

      Bite-size philosophy articles designed to stimulate your brain.

      ](https://philosophybreak.com/articles/)

      [

      About Us

      ](https://philosophybreak.com/about/)

      [

      Sign In

      ](https://academy.philosophybreak.com/)

      [

      Instagram

      ](https://www.instagram.com/philosophybreak/)

      [

      Twitter

      ](https://twitter.com/philosophybreak)

      If a tree falls in the forest, and there's no one around to hear it, does it make a sound?

      If a Tree Falls in the Forest, and There's No One Around to Hear It, Does It Make a Sound?

      The age-old question of whether a falling tree makes a sound when there's no one around to hear it exploits the tension between perception and reality. This article explores possible answers and their consequences.

      Jack Maden

      By Jack Maden  |  September 2022

      3-MIN BREAK  

      If a tree falls in the forest, and there's no one around to hear it, does it make a sound? Well, if by 'sound' we mean vibrating air, then yes, when the tree falls, it vibrates the air around it.

      However, if by 'sound' we mean the conscious noise we hear when our sensory apparatus interacts with the vibrating air, then if no one is around to hear the tree when it falls, there'd be no sensory apparatus for the vibrating air to interact with, and thus no conscious noise would be heard.

      So, the answer to this age-old question seems to be simple: it depends on how we define 'sound'. If we define it as 'vibrating air', the falling tree makes a sound. If we define it as a conscious experience, the lonesome falling tree does not make a sound.

      There, problem solved.

      The point of asking this question, however, is not so that it can be answered quickly and put aside.

      Rather, its point is to draw out the rather strange tension between our two very different definitions of the word 'sound'.

      On the one hand, we classify sound as a mechanistic process that exists without us, 'out there' in the world. On the other, we regard it as a private conscious experience, its existence entirely dependent on us.

      And when you dwell on this latter definition, you realize it doesn't just extend to sounds. Everything we experience --- everything we see, hear, smell, touch, taste --- all of it depends on our sensory apparatus, on us. Without us, our experiences would not exist.

      As the great 16th-century astronomer Galileo Galilei put it:

      Tastes, odors, colors, and so on... reside only in consciousness. If the living creature were removed, all these qualities would be wiped away and annihilated.

      Take away our senses, and the world of our experience would be replaced by a colorless, soundless, odorless, tasteless nothingness. Without us, what remains?

      The reason our original question --- When a tree falls in the forest, and there's no one around to hear it, does it make a sound? --- is such a teaser, is because it hits on a deeper question. Namely:

      If there was no conscious life, would the physical universe exist?

      Our kneejerk reaction to this question might be, 'of course it would'. But let's think about it again: if there was nothing conscious, then nothing would be experienced. There would be nothing resembling anything we call 'existence'. No colors, no sounds, no smells, no tastes, no touch, no sense of time, no sense of space.

      In one concise email each Sunday, I break down a famous idea from philosophy. You get the distillation straight to your inbox:

      Join 12,000+ Subscribers

      💭 One short philosophical email each Sunday. Unsubscribe any time.

      Is consciousness more fundamental than matter?

      Reflecting on this strange state of affairs, numerous great thinkers have concluded that consciousness must be more fundamental than the 'stuff' that consciousness experiences.

      Southwest Airlines

      Wanna spring into summer?

      Sponsored By Southwest Airlines

      Earn 50,000 points.

      Learn More

      For instance, in his 1710 work, A Treatise Concerning the Principles of Human Knowledge, the philosopher George Berkeley discusses the absurdity of a world existing independently of our conscious minds:

      It is indeed an opinion strangely prevailing amongst people that houses, mountains, rivers, and in a word all sensible objects, have an existence natural or real, distinct from their being perceived by the understanding... for what are the forementioned objects but things we perceive by sense? And what do we perceive besides our own ideas or sensations? And is it not plainly repugnant that any one of these or any combination of them should exist unperceived?

      On this view, it is absurd to say a lonesome falling tree makes a sound. For Berkeley, it is absurd to say the tree, without a conscious mind there perceiving it, even exists. (You can learn more about his mind-bending arguments for this position in our short explainer piece on Berkeley's subjective idealism, his theory that the world is in our minds).

      But to conclude this brief reflection on the tension between perception and reality, consider a comment from the Nobel Prize-winning quantum physicist Max Planck in a 1931 interview (italics added):

      I regard consciousness as fundamental. I regard matter as derivative from consciousness. We cannot get behind consciousness. Everything that we talk about, everything that we regard as existing, postulates consciousness.

      What do you think? Can we get behind consciousness?

      This is a short exploration of themes covered in our celebrated 5-day introduction to philosophy course, Life's Big Questions, in which you can learn thousands of years of philosophy with just 30 minutes of thought-provoking reading per day. Learn more and see if it's for you now:

      life's big questions

      Life's Big Questions: Your Concise Guide to Philosophy's Most Important Wisdom

      From why anything exists to how we should live, unlock philosophy's best answers to life's big questions.

      Get Instant Access

      ★★★★★ (50+ reviews for our courses)

      Get one mind-opening philosophical idea distilled to your inbox every Sunday (free):

      Join 12,000+ Subscribers

      💭 One short philosophical email each Sunday. Unsubscribe any time.

      About the Author

      Jack Maden

      Jack MadenFounder\ Philosophy Break

      Having received great value from studying philosophy for 15+ years (picking up a master's degree along the way), I founded Philosophy Break in 2018 as an online social enterprise dedicated to making the subject's wisdom accessible to all. Learn more about me and the project here.

      If you enjoy learning about humanity's greatest thinkers, you might like my free Sunday email. I break down one mind-opening idea from philosophy, and invite you to share your view.

      Subscribe for free here, and join 12,000+ philosophers enjoying a nugget of profundity each week (free forever, no spam, unsubscribe any time).

      Philosophy Break

      WEEKLY EMAILS

      Get one mind-opening philosophical idea distilled to your inbox every Sunday (free)

      From the Buddha to Nietzsche: join 12,000+ subscribers enjoying a nugget of profundity from the great philosophers every Sunday:

      Join 12,000+ Subscribers

      ★★★★★ (50+ reviews for Philosophy Break). Unsubscribe any time.

      Philosophy Basics

      Take Another Break

      Each break takes only a few minutes to read, and is crafted to expand your mind and spark your philosophical curiosity.

      [

      The Buddha's Four Noble Truths

      The Buddha's Four Noble Truths: the Cure for Suffering

      7-MIN BREAK

      ](https://philosophybreak.com/articles/the-buddha-four-noble-truths-the-cure-for-suffering/)

      [

      Compatibilism: Philosophy's Favorite Answer to the Free Will Debate

      Compatibilism: Philosophy's Favorite Answer to the Free Will Debate

      10-MIN BREAK

      ](https://philosophybreak.com/articles/compatibilism-philosophys-favorite-answer-to-the-free-will-debate/)

      [

      Splendor, by Albert Bierstadt

      The Last Time Meditation: a Stoic Tool for Living in the Present

      5-MIN BREAK

      ](https://philosophybreak.com/articles/the-last-time-meditation-a-stoic-tool-for-living-in-the-present/)

      [

      Stormy Sea at Night, by Ivan Aivazovsky

      Nietzsche On Why Suffering is Necessary for Greatness

      3-MIN BREAK

      ](https://philosophybreak.com/articles/nietzsche-on-why-suffering-is-necessary-for-greatness/)

      View All Breaks

      PHILOSOPHY 101

      ABOUT US

      FOLLOW US

      Philosophy Break is an online social enterprise dedicated to making the wisdom of philosophy instantly accessible (and useful!) for people striving to live happy, meaningful, and fulfilling lives. Learn more about us here. To offset a fraction of what it costs to maintain Philosophy Break, we participate in the Amazon Associates Program. This means if you purchase something on Amazon from a link on here, we may earn a small percentage of the sale, at no extra cost to you. This helps support Philosophy Break, and is very much appreciated.

      Access our generic Amazon Affiliate link here

      Privacy Policy | Cookie Policy

      © Philosophy Break Ltd, 2024

      Social enterprise badge

      Close ad

      https://www.poetryfoundation.org/poems/44272/the-road-not-taken

      The Road Not Taken

      Launch Audio in a New Window

      BY ROBERT FROST

      Two roads diverged in a yellow wood,

      And sorry I could not travel both

      And be one traveler, long I stood

      And looked down one as far as I could

      To where it bent in the undergrowth;

      Then took the other, as just as fair,

      And having perhaps the better claim,

      Because it was grassy and wanted wear;

      Though as for that the passing there

      Had worn them really about the same,

      And both that morning equally lay

      In leaves no step had trodden black.

      Oh, I kept the first for another day!

      Yet knowing how way leads on to way,

      I doubted if I should ever come back.

      I shall be telling this with a sigh

      Somewhere ages and ages hence:

      Two roads diverged in a wood, and I---

      I took the one less traveled by,

      And that has made all the difference.

      n/a

      THIS POEM HAS A POEM GUIDE

      View Poem Guide

    1. Consequently, our freedom has become a psychological problem, it has isolated us from the connections necessary for our survival and development (Fromm, 1941). The danger with this situation, according to Fromm, is that when an entire society is suffering from feelings of isolation and disconnection with the natural order (from nature itself, in Fromm’s view), the members of that society may seek connection with a societal structure that destroys their freedom and, thus, integrates their self into the whole (albeit in a dysfunctional way).

      Fromm's theory stated that freedom was a problem for humans due to the separation from nature it caused. Being separated from our basic instincts humans struggle to essentially find their identity. In doing so they encounter crisis which leads to mental un-wellness. As a society we have proven time and again that this disconnection from the natural order of things is beyond our scope of coping. Fromm uses STalin/Hitler as examples but more recently I think we can see this happening with former president Trump and his followers. We see this in members of cults. (I am not equating Trump supporters to cultists) I can see Fromm's theory playing out in many situations where humans feel powerless and seek power through connections with a belief or a person supporting certain beliefs.

  3. May 2024
    1. We can achieve that goal by several practical devices. For example, under the American or Latin American presidential arrangements, we can allow both the president and the Congress to dissolve an impasse by calling early elections. The early elections would always have to be bilateral: the branch exercising the constitutional prerogative would share the electoral risk.

      I suspect such innovations would yield some benefit, but not nearly as much as we might imagine.

      I think is a classic case of the primacy of being. yes structure (i.e. institutions) can make a bad case worse, or an ok case better. But they are limited in how much they can move things. The US is mostly hamstrung by deep cultural differences at a moment of paradigmatic change. Yes the electoral college or the senata composition may make things worse but it is minor compared to what is going on at the cultural foundations.

      Structural innovations are therefore most interesting that would hasten cultural evolution.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses

      (1) The authors face a technical challenge (which they acknowledge): they use two numbers (mean and variance) to characterize synaptic variability, whereas in the brain there are three numbers (number of vesicles, release probability, and quantal size). Turning biological constraints into constraints on the variance, as is done in the paper, seems somewhat arbitrary. This by no means invalidates the results, but it means that future experimental tests of their model will be somewhat nuanced.

      Agreed. There are two points to make here.

      First, the mean and variance are far more experimentally accessible than n, p and q. The EPSP mean and variance is measured directly in paired-patch experiments, whereas getting n, p and q either requires far more extensive experimentation, or making strong assumptions. For instance, the data from Ko et al. (2013) gives the EPSP mean and variance, but not (directly) n, p and q. Thus, in some ways, predictions about means and variances are easier to test than predictions about n, p and q.

      That said, we agree that in the absence of an extensive empirical accounting of the energetic costs at the synapse, there is inevitably some arbitrariness as we derive our energetic costs. That was why we considered four potential functional forms for the connection between the variance and energetic cost, which covered a wide range of sensible forms for this energetic cost. Our results were robust to this wide range functional forms, indicating that the patterns we describe are not specifically due to the particular functional form, but arise in many settings where there is an energetic cost for reliable synaptic transmission.

      (2) The prediction that the learning rate should increase with variability relies on an optimization scheme in which the learning rate is scaled by the inverse of the magnitude of the gradients (Eq. 7). This seems like an extra assumption; the energy efficiency framework by itself does not predict that the learning rate should increase with variability. Further work will be needed to disentangle the assumption about the optimization scheme from the energy efficiency framework.

      Agreed. The assumption that learning rates scale with synapse importance is separate. However, it is highly plausible as almost all modern state-of-the-art deep learning training runs use such an optimization scheme, as in practice it learns far faster than other older schemes. We have added a sentence to the main text (line 221), indicating that this is ultimately an assumption.

      Major

      (1) The correspondence between the entropy term in the variational inference description and the reliability cost in the energetic description is a bit loose. Indeed, the entropy term scales as −log(σ) while reliability cost scales as σ−ρ. While the authors do make the point that σ−ρ upper bounds −log(σ) (up to some constant), those two cost terms are different. This raises two important questions:

      a. Is this difference important, i.e. are there scenarios for which the two frameworks would have different predictions due to their different cost functions?

      b. Alternatively, is there a way to make the two frameworks identical (e.g. by choosing a proposal distribution Q(w) different from a Gaussian distribution (and tuneable by a free parameter that could be related to ρ) and therefore giving rise to an entropy term consistent with the reliability cost of the energy efficiency framework)?

      To answer b first, there is no natural way to make the two frameworks identical (unless we assume the reliability cost is proportional to log_σsyn_, and we don’t think there’s a biophysical mechanism that would give rise to such a cost). Now, to answer a, in Fig. 7 we extensively assessed the differences between the energy efficient σsyn and the Bayesian σpost. In Fig.7bc, we find that σsyn and σpost are positively correlated in all models. This positive correlation indicates that the qualitative predictions made by the two frameworks (Bayesian inference and energy efficiency) are likely to be very similar. Importantly though, there are systematic differences highlighted by Fig. 7ab. Specifically, the energy efficient σsyn tends to vary less than the Bayesian σpost. This appears in Fig. 7b which shows the relationship between σsyn (on the y-axis) and σpost (on the x-axis). Specifically, this plot has a slope that is smaller than one for all our models of the biophysical cost. Further, the pattern also appears in the covariance ellipses in Fig. 7a, in that the Bayesian covariance ellipses tend to be long and thin, while the energy efficient covariance ellipsis are rounder. Critically though both covariance ellipses show the same pattern in that there is more noise along less important directions (as measured by the Hessian).

      We have added a sentence (line 273) noting that the search for a theoretical link is motivated by our observations in Fig. 7 of a strong, but not perfect link between the pattern of variability predicted by Bayesian and energy-efficient synapses.

      (2) Even though I appreciate the effort of the authors to look for experimental evidence, I still find that the experimental support (displayed in Fig. 6) is moderate for three reasons.

      a. First, the experimental and simulation results are not displayed in a consistent way. Indeed, Fig 6a displays the relative weight change |Dw|/w as a function of the normalised variability σ_2/|_µ| in experiments whereas the simulation results in Fig 5c display the variance σ_2 as a function of the learning rate. Also, Fig 6b displays the normalised variability _σ_2/|_µ| as a function of the input rate whereas Fig 5b displays the variance _σ_2 as a function of the input rate. As a consequence the comparison between experimental and simulation results is difficult.

      b. Secondly, the actual power-law exponents in the experiments (see Fig 6a resp. 6b) should be compared to the power-law exponents obtained in simulation (see Fig 5c resp. Fig 5b). The difficulty relies here on the fact that the power-law exponents obtained in the simulations directly depend on the (free) parameter ρ. So far the authors precisely avoided committing to a specific ρ, but rather argued that different biophysical mechanisms lead to different reliability exponents ρ. Therefore, since there are many possible exponents ρ (and consequently many possible power-law exponents in simulation results in Fig 5), it is likely that one of them will match the experimental data. For the argument to be stronger, one would need to argue which synaptic mechanism is dominating and therefore come up with a single prediction that can be falsified experimentally (see also point 4 below).

      c, Finally, the experimental data presented in Fig6 are still “clouds of points". A coefficient of r \= 0_.52 (in Fig 6a) is moderate evidence while the coefficient of _r \= −0_._26 (in Fig 6b) is weak evidence.

      The key thing to remember is that our paper is not about whether synapses are “really" Bayesian or energy efficient (or both/neither). Instead, the key point of our paper, as expressed in the title, is to show that the experimental predictions of Bayesian synapses are very similar to the predictions from energy efficient synapses. And therefore energy efficient synapses are very difficult to distinguish experimentally from Bayesian synapses. In that context, the two plots in Fig. 6 are not really intended to present evidence in favour of the energy efficiency / Bayesian synapses. In fact, Fig. 6 isn’t meant to constitute a contribution of the paper at all, instead, Fig. 6 serves merely as illustrations of the kinds of experimental result that have (Aitchison et al. 2021) or might (Schug et al. 2021) be used to support Bayesian synapses. As such, Fig. 6 serves merely as a jumping-off point for discussing how very similar results might equally arise out of Bayesian and energy-efficiency viewpoints.

      We have modified our description of Fig. 6 to further re-emphasise that the panels in Fig. 6 is not our contribution, but is taken directly from Schug et al. 2021 and Aitchison et al. 2021 (we have also modified Fig 6 to be precisely what was plotted in Schug et al. 2021, again to re-emphasise this point). Further, we have modified the presentation to emphasise that these plots serve merely as jumping off points to discuss the kinds of predictions that we might consider for Bayesian and energy efficient synapses.

      This is important, because we would argue that the “strength of support" should be assessed for our key claim, made in the title, that “Signatures of Bayesian inference emerge from energy efficient synapses".

      a) To emphasise that these are previously published results, we have chosen axes to matchthose used in the original work (Aitchison et al. 2021) and (Schug et al. 2021).

      b) We agree that a close match between power-law exponents would constitute strong evidencefor energy-efficiency / Bayesian inference, and might even allow us to distinguish them. We did consider such a comparison, but found it was difficult for two reasons. First, while the confidence intervals on the slopes exclude zero, they are pretty broad. Secondly, while the slopes in a one-layer network are consistent and match theory (Appendix 5) the slopes in deeper networks are far more inconsistent. This is likely to be due to a number of factors such as details of the optimization algorithm and initialization. Critically, if details of the optimization algorithm matter in simulation, they may also matter in the brain. Therefore, it is not clear to us that a comparison of the actual slopes is can be relied upon.

      To reiterate, the point of our article is not to make judgements about the strength ofevidence in previously published work, but to argue that Bayesian and energy efficient synapses are difficult to distinguish experimentally as they produce similar predictions. That said, it is very difficult to make blanket statements about the strength of evidence for an effect based merely on a correlation coefficient. It is perfectly possible to have moderate correlation coefficients along with very strong evidence of an effect (and e.g. very strong p-values), e.g. if there is a lot of data. Likewise, it is possible to have a very large correlation coefficient along with weak evidence of an effect (e.g. if we only have three or four datapoints, which happen to lie in a straight line). A small correlation coefficient is much more closely related to the effect-size. Specifically, the effect-size, relative to the “noise", which usually arises from unmeasured factors of variation. Here, we know there are many, many unmeasured factors of variation, so even in the case that synapses are really Bayesian / energy-efficient, the best we can hope for is low correlation coefficients

      As mentioned in the public review, a weakness in the paper is the derivation of the constraints on σi given the biophysical costs, for two reasons.

      a.First, it seemed a bit arbitrary whether you hold n fixed or p fixed.

      b.Second, at central synapses, n is usually small – possibly even usually 1: REF(Synaptic vesicles transiently dock to refill release sites, Nature Neuroscience 23:1329-1338, 2020); REF(The ubiquitous nature of multivesicular release Trends Neurosci. 38:428-438, 2015). Fixing n would radically change your cost function. Possibly you can get around this because when two neurons are connected there are multiple contacts (and so, effectively, reasonably large n). It seems like this is worth discussing.

      a) Ultimately, we believe that the “real” biological cost function is very complex, and most likely cannot be written down in a simple functional form. Further, we certainly do not have the experimental evidence now, and are unlikely to have experimental evidence for a considerable period into the future to pin down this cost function precisely. In that context, we are forced to resort to two strategies. First, using simplifying assumptions to derive a functional form for the cost (such as holding n or p fixed). Second, considering a wide range of functional forms for the cost, and ensuring our argument works for all of them.

      b) We appreciate the suggestion that the number of connections could be used as a surrogate where synapses have only a single release site. As you suggest we can propose an alternative model for this case where n represents the number of connections between neurons. We have added this alternative interpretation to our introduction of the quantal model under title “Biophysical costs". For a fixed PSP mean we could either have many connections with small vesicles or less connections with larger vesicles. Similarly for the actin cost we would certainly require more actin if the number of connections were increased.

      Minor

      (1) A few additional references could further strengthen some claims of the paper:

      Davis, Graeme W., and Martin Muller. “Homeostatic Control of Presynaptic Neurotransmitter Release." Annual Review of Physiology 77, no. 1 (February 10, 2015): 251-70. https://doi.org/10.1146/annurev-physiol-021014-071740. This paper provides elegant experimental support for the claim (in line 538 now 583) that µ is kept constant and q acts as a compensatory variable.

      Jegminat, Jannes, Simone Carlo Surace, and Jean-Pascal Pfister. “Learning as Filtering: Implications for Spike-Based Plasticity." Edited by Blake A Richards. PLOS Computational Biology 18, no. 2 (February 23, 2022): e1009721. https://doi.org/10.1371/journal.pcbi.1009721.

      This paper also showed that a lower uncertainty implies a lower learning rate (see e.g. in line 232), but in the context of spiking neurons.

      Figure 1 of the the first suggested paper indeed shows that quantal size is a candidate for homeostatic scaling (fixing µ). This review also references lots of further evidence of quantal scaling and evidence for both presynaptic and postsynaptic scaling of q leaving space for speculation on whether vesicle radius or postsynaptic receptor number is the source of a compensatory q. On line 583 we have added a few lines pointing to the suggested review paper.

      The second reference demonstrates Bayesian plasticity in the context of STDP, proposing learning rates tuned to the covariance in spike timing. We have added this as extra support for assuming an optimisation scheme that tunes learning rates to synapse importance and synapse variability (line 232).

      In the numerical simulations, the reliability cost is implemented with a single power-law expression (reliability cost ). However, in principle, all the reliability costs will play in conjunction, i.e. reliability cost . While I do recognise that it may be difficult to estimate the biophysical values of the various ci, it might be still relevant to comment on this.

      Agreed. Limitations in the literature meant that we could only form a cursory review of the relative scale of each cost using estimates by Atwell, (2001), Engl, (2015). On line 135 we have added a paragraph explaining the rationale for considering each cost independently.

      (3) In Eq. 8: σ_2 doesn’t depend on variability in _q, which would add another term; barring algebra mistakes, it’s . It seems worth mentioning why you didn’t include it. Can you argue that it’s a small effect?

      Agreed. Ultimately, we dropped this term because we expected it to be small relative to variability in vesicle release, and because it would be difficult to quantify In practice, the variability is believed to be contributed mostly by variability in vesicle release. The primary evidence for this is histograms of EPSP amplitudes which show classic multi-peak structure, corresponding to one, two three etc. EPSPs. Examples of these plots include:

      - “The end-plate potential in mammalian muscle”, Boyd and Martin (1956); Fig. 8.

      - “Structure and function of a neocortical synapse”, Holler-Rickauer et al. (2019); Extended Figure 5.

      (3) On pg. 7 now pg. 8, when the Hessian is introduced, why not say what it is? Or at least the diagonal elements, for which you just sum up the squared activity. That will make it much less mysterious. Or are we relying too much on the linear model given in App 2? If so, you should tell us how the Hessian was calculated in general. Probably in an appendix.

      With the intention of maintaining the interest of a wide audience we made the decision to avoid a mathematical definition of the Hessian, opting instead for a written definition i.e. line 192 - “Hii; the second derivatives of the objective with respect to wi.” and later on a schematic (Fig. 4) for how the second derivative can be understood as a measure of curvature and synapse importance. Nonetheless, this review point has made us aware that the estimated Hessian values plotted in Fig. 5a have been insufficiently explained so we have added a reference on line 197 to the appendix section where we show how we estimated the diagonal values of the Hessian.

      (4) Fig. 5: assuming we understand things correctly, Hessian ∝ |x|2. Why also plot σ_2 versus |_x|? Or are we getting the Hessian wrong?

      The Hessian is proportional to . If you assume that time steps are small and neurons spike, then , and . it is difficult to say what timestep is relevant in practice.

      (5) To get Fig. 6a, did you start with Fig. Appendix 1-figure 4 from Schug et al, and then use , drop the q, and put 1 − p on the x-axis? Either way, you should provide details about where this came from. It could be in Methods.

      We have modified Fig. 6 to use the same axes as in the original papers.

      (6) Lines 190-3: “The relationship between input firing rate and synaptic variability was first observed by Aitchison et al. (2021) using data from Ko et al. (2013) (Fig. 6a). The relationship between learning rate and synaptic variability was first observed by Schug et al. (2021), using data from Sjostrom et al. (2003) as processed by Costa et al. (2017) (Fig. 6b)." We believer 6a and 6b should be interchanged in that sentence.

      Thank you. We have switched the text appropriately.

      (7) What is posterior variance? This seems kind of important.

      This refers to the “posterior variance" obtained using a Bayesian interpretation of the problem of obtaining good synaptic weights (Aitchison et al. 2021). In our particular setting, we estimate posterior variances by setting up the problem as variational inference: see Appendix 4 and 5, which is now referred to in line 390.

      (8) Lines 244-5: “we derived the relationships between the optimized noise, σi and the posterior variable, σpost as a function of ρ (Fig. 7b;) and as a function of c (Fig. 7c)." You should tell the reader where you derived this. Which is Eq. 68c now 54c. Except you didn’t actually derive it; you just wrote it down. And since we don’t know what posterior variance is, we couldn’t figure it out.

      If H is the Hessian of the log-likelihood, and if the prior is negligable relative to the the likelihood, then we get Eq. 69c. We have added a note on this point to the text.

      (9) We believe Fig. 7a shows an example pair of synapses. Is this typical? And what about Figs. 7b and c. Also an example pair? Or averages? It would be helpful to make all this clear to the reader.

      Fig. 7a shows an illustrative pair of synapses, chosen to best display the relative patterns of variability under energy efficient and Bayesian synapses. We have noted this point in the legend for Fig. 7. Fig. 7bc show analytic relationships between energy efficient and Bayesian synapses, so each line shows a whole continuum of synapses(we have deleted the misleading points at the ends of the lines in Fig. 7bc).

      (10)  The y-axis of Fig 6a refers to the synaptic weight as w while the x-axis refers to the mean synaptic weight as mu. Shouldn’t it be harmonised? It would be particularly nice if both were divided by µ, because then the link to Fig. 5c would be more clear.

      We have changed the y-axis label of Fig. 6a from w to µ. Regarding the normalised variance, we did try this but our Gaussian posteriors allowed the mean to become small in our simulations, giving a very high normalised variance. To remedy this we would likely need to assume a log- posterior, but this was out of scope for the present work.

      (11) Line 250 (now line 281): “Finally, in the Appendix". Please tell us which Appendix. Also, why not point out here that the bound is tightest at small ρ?

      We have added the reference to the the section of the appendix with the derivation of the biological cost as a bound on the ELBO. We have also referenced the equation that gives the limit of the biological cost as ρ tends to zero.

      (12) When symbols appear that previously appeared more than about two paragraphs ago, please tell us where they came from. For instance, we spent a lot of time hunting for ηi. And below we’ll complain about undefined symbols. Which might mean we just missed them; if you told us where they were, that problem would be eliminated.

      We have added extra references for the symbols in the text following Eq. 69.

      (13) Line 564, typo (we think): should be σ−2.

      Good spot. This has been fixed.

      (14)  A bit out of order, but we don’t think you ever say explicitly that r is the radius of a vesicle. You do indicate it in Fig. 1, but you should say it in the main text as well.

      We have added a note on this to the legend in Fig. 1.

      (15) Eq. 14: presumably there’s a cost only if the vesicle is outside the synapse? Probably worth saying, since it’s not clear from the mechanism.

      Looking at Pulido and Ryan (2021) carefully, it is clear that they are referring to a cost for vesicles inside the presynaptic side of the synapse. (Importantly, vesciles don’t really exist outside the synapse; during the release process, the vesicle membrane becomes part of the cell membrane, and the contents of the vesicle is ejected into the synaptic cleft).

      (16) App. 2: why solve for mu, and why compute the trace of the Hessian? Not that it hurts, but things are sort of complicated, and the fewer side points the better.

      Agreed, we have removed the solution for μ, and the trace, and generally rewritten Appendix 2 to clarify definitions, the Hessian etc.

      (17) Eq. 35: we believe you need a minus sign on one side of the equation. And we don’t believe you defined p(d|w). Also, are you assuming g = partial log p(d|w)/partial w? This should be stated, along with its implications. And presumably, it’s not really true; people just postulate that p(d|w) ∝ exp(−log_loss_)?

      We have replaced p(d|w) with p(y, x|w), and we replaced “overall cost” with log P(y|w, x). Yes, we are also postulating that p(y|w, x) ∝ exp(−log loss), though in our case that does make sense as it corresonds to a squared loss.

      As regards the minus sign, in the orignal manuscript, we had the second derivative of the cost. There is no minus sign for the cost, as the Hessian of the cost at the mode is positive semi-definite. However, once we write the expression in terms of a log-likelihood, we do need a minus sign (as the Hessian of the log-likelihood at a mode is negative semi-definite).

      (18) Eq. 47 now Eq. 44: first mention of CBi;i?

      We have added a note describing CB around these equations.

      (19) The “where" doesn’t make sense for Eqs. 49 and 50; those are new definitions.

      We have modified the introduction of these equations to avoid the problematic “where”.

      (20) Eq. 57 and 58 are really one equation. More importantly: where does Eq. 58 come from? Is this the H that was defined previously? Either way, you should make that clear.

      We have removed the problematic additional equation line number, and added a reference to where H comes from.

      (21) In Eq. 59 now Eq. 60 aren’t you taking the trace of a scalar? Seems like you could skip this.

      We have deleted this derivation, as it repeats material from the new Appendix 2.

      (22) Eq. 66 is exactly the same as Eq. 32. Which is a bit disconcerting. Are they different derivations of the same quantity? You should comment on this.

      We have deleted lots of the stuff in Appendix 5 as, we agree, it repeats material from Appendix 2 (which has been rewritten and considerably clarified).

      (23) Eq. 68 now 54, left column: please derive. we got:

      gai = gradient for weight i on trial

      where the second equality came from Eq. 20. Thus

      Is that correct? If so, it’s a lot to expect of the reader. Either way, a derivation would

      be helpful.

      We agree it was unnecessary and overly complex, so we have deleted it.

      (24) App 5–Figure 2: presumably the data for panel b came from Fig. 6a, with the learning rate set to Δw/w? And the data for panel c from Fig. 6b? This (or the correct statement, if this is wrong) should be mentioned.

      Yes, the data for panel c came from Fig. 6b. We have deleted the data in panel b, as there are some subtleties in interpretation of the learning rates in these settings.

      (25) line 952 now 946: typo, “and the from".

      Corrected to “and from".

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, the authors investigate whether the connectivity of the hippocampus is altered in individuals with aphantasia ¬- people who have reduced mental imagery abilities and where some describe having no imagery, and others describe having vague and dim imagery. The study investigated this question using a fMRI paradigm, where 14 people with aphantasia and 14 controls were tested, and the researchers were particularly interested in the key regions of the hippocampus and the visual-perceptual cortices. Participants were interviewed using the Autobiographical Interview regarding their autobiographical memories (AMs), and internal and external details were scored. In addition, participants were queried on their perceived difficulty in recalling memories, imagining, and spatial navigation, and their confidence regarding autobiographical memories was also measured. Results showed that participants with aphantasia reported significantly fewer internal details (but not external details) compared to controls; that they had lower confidence in their AMs; and that they reported finding remembering and imagining in general more difficult than controls. Results from the fMRI section showed that people with aphantasia displayed decreased hippocampal and increased visual-perceptual cortex activation during AM retrieval compared to controls. In contrast, controls showed strong negative functional connectivity between the hippocampus and the visual cortex. Moreover, resting state connectivity between the hippocampus and visual cortex predicted better visualisation skills. The authors conclude that their study provides evidence for the important role of visual imagery in detail-rich vivid AM, and that this function is supported by the connectivity between the hippocampus and visual cortex. This study extends previous findings of reduced episodic memory details in people with aphantasia, and enables us to start theorising about the neural underpinnings of this finding.

      The data provided good support for the conclusion that the authors draw, namely that there is a 'tight link between visual imagery and our ability to retrieve vivid and detail-rich personal past events'. However, as the authors also point out, the exact nature of this relationship is difficult to infer from this study alone, as the slow temporal resolution of fMRI cannot establish the directionality between the hippocampus and the visual-perceptual cortex. This is an exciting future avenue to explore.

      We thank the reviewer for highlighting our contributions and suggesting that the relationship between visual imagery and autobiographical memory recall is an exciting future avenue.

      Weaknesses:

      A weakness of the study is that some of the questions used are a bit vague, and no objective measure is used, which could have been more informative. For example, the spatial navigation question (reported as 'How difficult is it typically for you to orient you spatially?' - a question which is ungrammatical, but potentially reflects a typo in the manuscript) could have been more nuanced to tap into whether participants relied mostly on cognitive maps (likely supported by the hippocampus) or landmarks. It would also have been interesting to conduct a spatial navigation task, as participants do not necessarily have insight into their spatial navigation abilities (they could have been overconfident or underconfident in their abilities).

      Secondly, the question 'how difficult is it typically for you to use your imagination?' could also be more nuanced, as imagination is used in a variety of ways, and we only have reason to hypothesise that people with aphantasia might have difficulties in some cases (i.e. sensory imagination involving perceptual details). It is unlikely that people with aphantasia would have more difficulty than controls in using their imagination to imagine counterfactual situations and engage in counterfactual thought (de Brigard et al., 2013, https://doi.org/10.1016%2Fj.neuropsychologia.2013.01.015) due to its non-sensory nature, but the question used does not distinguish between these types of imagination. Again, this is a ripe area for future research. The general phrasing of 'how difficult is [x]' could also potentially bias participants towards more negative answers, something which ought to be controlled for in future research.

      The main goal of our study was to examine autobiographical memory recall. Therefore, we used the gold standard Autobiographical Interview, or AI (Levine et al. 2002) and an fMRI paradigm to explore autobiographical memory recall as standardised, precisely, and objectively as possible.

      In addition to these experimentally rigorous tasks, we employed some loosely formulated questions with the intention for people to reflect on how they perceive their own abilities to recall autobiographical memories, navigate spatially, and use their imagination. We agree with the reviewer that these questions are vague and did not have the experimental standard for an investigation into spatial cognition or imagination associated with aphantasia. Nonetheless, we believe that these questions provide important additional insights into what participants think about their own cognitive abilities. In order to set these questions into perspective, we argue in the discussion that spatial cognition and other cognitive functions should be investigated in more depth in individuals with aphantasia in the future.

      As an additional note, all tasks were conducted in German. Thus, we were able to correct the wording of the debriefing question in our revision. We thank the reviewer for bringing this to our attention.

      Strengths:

      A great strength of this study is that it introduces a fMRI paradigm in addition to the autobiographical interview, paralleling work done on episodic memory in cognitive science (e.g. Addis and Schacter, 2007, https://doi.org/10.1016%2Fj.neuropsychologia.2006.10.016 ), which has examined episodic and semantic memory in relation to imagination (future simulation) in non-aphantasic participants as well as clinical populations. Future work could build on this study, and for example use the recombination paradigm (Addis et al. 2009, 10.1016/j.neuropsychologia.2008.10.026 ), which would shed further light on the ability of people with aphantasia to both remember and imagine events. Future work could also build on the interesting findings regarding spatial navigation, which together with previous findings in aphantasia (e.g. Bainbridge et al., 2021, https://doi.org/10.1016/j.cortex.2020.11.014 ) strongly suggests that spatial abilities in people with aphantasia are unaffected. This can shed further light on the different neural pathways of spatial and object memory in general. In general, this study opens up a multitude of new avenues to explore and is likely to have a great impact on the field of aphantasia research.

      We much appreciate the acknowledgment of our work into autobiographical memory employing both the autobiographical interview and fMRI. Furthermore, we hope that our work inspires future research in the way the reviewer outlines and in the way we describe in our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study investigates to what extent neural processing of autobiographical memory retrieval is altered in people who are unable to generate mental images ('aphantasia'). Self-report as well as objective measures were used to establish that the aphantasia group indeed had lower imagery vividness than the control group. The aphantasia group also reported fewer sensory and emotional details of autobiographical memories. In terms of brain activity, compared to controls, aphantasics had a reduction in activity in the hippocampus and an increase in activity in the visual cortex during autobiographical memory retrieval. For controls, these two regions were also functionally connected during autobiographical memory retrieval, which did not seem to be the case for aphantasics. Finally, resting-state connectivity between the visual cortex and hippocampus was positively related to autobiographical vividness in the control group but negatively in the aphantasia group. The results are in line with the idea that aphantasia is caused by an increase in noise within the visual system combined with a decrease in top-down communication from the hippocampus.

      Recent years have seen a lot of interest in the influence of aphantasia on other cognitive functions and one of the most consistent findings is deficits in autobiographical memory. This is one of the first studies to investigate the neural correlates underlying this difference, thereby substantially increasing our understanding of aphantasia and the relationship between mental imagery and autobiographical memory.

      We thank the reviewer for highlighting the importance of our findings.

      Strengths:

      One of the major strengths of this study is the use of both self-report as well as objective measures to quantify imagery ability. Furthermore, the fMRI analyses are hypothesis-driven and reveal unambiguous results, with alterations in hippocampal and visual cortex processing seeming to underlie the deficits in autobiographical memory.

      Once again, we thank the reviewer for highlighting the quality of our methods and our results.

      Weaknesses:

      In terms of weaknesses, the control task, doing mathematical sums, also differs from the autobiographical memory task in aspects that are unrelated to imagery or memory, such as self-relevance and emotional salience, which makes it hard to conclude that the differences in activity are reflecting only the cognitive processes under investigation.

      We agree with the reviewer that our control task differs from autobiographical memory in many different ways. In fact, for this first investigation of the neural correlates of autobiographical memory in aphantasia, this is precisely the reason why we chose this mental arithmetic (MA) task. We know from previous studies, that MA is, as much as possible, not dependent on hippocampal memory processes (Addis, et al. 2007, McCormick et al. 2015, 2017, Leelaarporn et al., 2024). The main goal of the current study was to establish whether there are any differences between individuals with aphantasia and controls. In the next investigation, we can now build on these findings to disentangle in more detail what this difference reflects. 

      Overall, I believe that this is a timely and important contribution to the field and will inspire novel avenues for further investigation.

      This highly positive conclusion is much appreciated.

      References

      Addis, D. R., Wong, A. T., & Schacter, D. L. (2007). Remembering the past and imagining the future: Common and distinct neural substrates during event construction and elaboration. Neuropsychologia45(7), 1363-1377.

      Kriegeskorte, N., Simmons, W., Bellgowan, P. et al. Circular analysis in systems neuroscience: the dangers of double dipping. Nat Neurosci 12, 535–540 (2009). https://doi.org/10.1038/nn.2303

      Leelaarporn, P., Dalton, M. A., Stirnberg, R., Stöcker, T., Spottke, A., Schneider, A., & McCormick, C. (2024). Hippocampal subfields and their neocortical interactions during autobiographical memory. Imaging Neuroscience.

      Levine, B., Svoboda, E., Hay, J. F., Winocur, G., & Moscovitch, M. (2002). Aging and autobiographical memory: dissociating episodic from semantic retrieval. Psychology and aging17(4), 677.

      McCormick, C., St-Laurent, M., Ty, A., Valiante, T. A., & McAndrews, M. P. (2015). Functional and effective hippocampal–neocortical connectivity during construction and elaboration of autobiographical memory retrieval. Cerebral cortex25(5), 1297-1305.

      McCormick, C., Moscovitch, M., Valiante, T. A., Cohn, M., & McAndrews, M. P. (2018). Different neural routes to autobiographical memory recall in healthy people and individuals with left medial temporal lobe epilepsy. Neuropsychologia110, 26-36.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting article that makes a substantial contribution to the field of the study of aphantasia as well as the neural mechanisms of autobiographical memory. I would strongly recommend this manuscript to be accepted (with these minor revisions), as it makes a substantial and well-evidenced contribution to the research, and it opens up many interesting avenues for researchers to explore. I was especially excited to see that the Autobiographical Interview had been paired with an fMRI paradigm, something which this field of research highly benefits from, as there are yet so few fMRI studies into aphantasia. I understand that it is the authors' decision whether to accept or reject any of the revisions I recommend here, but I would like to stress that I encourage accepting the recommended revisions, especially as there are some minor inaccuracies in the manuscript as it currently stands. Finally, I would like to stress that though I am based in the area of cognitive science, am not trained in fMRI imaging techniques, and therefore do not stand in a position where I can comment on the methodology pertaining to this part of the study - I encourage the Editors to seek a second reviewer's opinion on this.

      Thank you for the positive evaluation of our manuscript as well as your comments. We have revised our manuscript according to your important suggestions as further explained below.

      Line 33: "aphantasia prohibits people from experiencing visual imagery". This  characterisation of aphantasia is too strong, especially as the authors use 32 as a cut-off point on the VVIQ, which represents weak and dim imagery. I would recommend using language like 'people with aphantasia have reduced visual imagery abilities', as this more accurately captures the group of people studied. Please revise throughout the manuscript. Please consult Blomkvist and Marks (2023) on this point who have discussed this problem in the aphantasia literature.

      We agree that aphantasics may experience reduced visual imagery abilities. We have revised our wording throughout the manuscript.

      Line 49: The authors conclude that their results 'indicate that visual mental imagery is essential for detail-rich, vivid AM', but this seems to be a bit too strong, for example since AM can be detail-rich with external (rather than internal) detail, and a person could potentially use mnemonic tricks such as keeping a detail-rich diary in order to boost their memory. That visual imagery is 'essential' implies that it is the only way to achieve detail-rich vivid AM, and this does not seem to be supported by the findings. I would recommend rephrasing it as 'visual mental imagery plays an important role in detail-rich, vivid AM' or 'visual mental imagery mediated detail-rich vivid AM'.

      We altered the sentence in Line 49 using one of the recommended phrases:

      ‘Our results indicate that visual mental imagery plays an important role in detail-rich, vivid AM, and that this type of cognitive function is supported by the functional connection between the hippocampus and the visual-perceptual cortex.’

      Line 69: Blomkvist and Marks (2023) have warned against calling aphantasia a 'condition' and this moreover seems to fit with the authors' previous research (Monzel, 2022). Please consider instead calling aphantasia an 'individual difference' in mental imagery abilities.

      Thank you for the suggestion. We have revised our wording throughout the manuscript, avoiding the term ‘condition’.

      Line 72: Add reference for emotional strength which has also been researched (Wicken et al. 2021, https://doi.org/10.1016/j.cortex.2020.11.014).

      We have added the suggested reference in Line 75:

      ‘Indeed, a handful of previous studies report convergent evidence that aphantasics report less sensory AM details than controls (Bainbridge et al., 2021; Dawes et al., 2020, 2022; Milton et al., 2020; Zeman et al., 2020), which may also be less emotional (Monzel et al., 2023; Wicken et al., 2021).’

      72-73: 'absence of voluntary imagery' - too strong as many people with aphantasia report having weak/dim mental imagery on the VVIQ.

      We agree that aphantasics may experience reduced visual imagery. We have revised this notion throughout the manuscript.

      74: Add reference to Bainbridge study which found a difference between recall of object vs spatial memory. This would be relevant here.

      We have added the suggested reference in Line 76:

      ‘Spatial accuracy, on the other hand, was not found to be impaired (Bainbridge et al., 2021).’

      Lines 94-97: The authors mention 'a prominent theory' but it is unclear which theory is referred to here. The article cited by Pearson (2019) does not suggest the possibility that aphantasia is due to altered connectivity between the hippocampus and visual-perceptual cortices. It suggests that aphantasia is due to impairment in the ventral stream, and in fact says that the hippocampus is unlikely to be affected due to spared spatial abilities in people with aphantasia. Specifically, Pearson claims: "Accordingly, memory areas of the brain that process spatial properties, including the hippocampus, may not be the underlying cause of aphantasia." (page 631). The authors further come back to this point in the discussion section (see comment below), saying that the hypothesis attributed to Pearson is supported by their study. I do not disagree with the point that the hypothesis is supported by the data, but it is unclear to me why the hypothesis is attributed to Pearson.

      Thank you for pointing out this inaccuracy. We have edited the text to spell out our entire train of thought (see Lines 96-102):

      ‘A prominent theory posits that because of this hyperactivity, small signals elicited during the construction of mental imagery may not be detected (Pearson, 2019, Keogh et al., 2020). Pearson further speculates that since spatial abilities seem to be spared, the hippocampus may not be the underlying cause of aphantasia. In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Line 97: Blomkvist reference should be 2022 (when first published online).

      The article ‘Aphantasia: In search of a theory’ by Blomkvist was first published on 1st July 2022. However, a correction was added on 13th March 2023. Therefore, we had cited the corrected version in this manuscript. However, we agree that the first publication date should be used and edited the reference accordingly.

      Line 116: 'one aphantasic' could be seen as offensive. I would suggest 'one aphantasic participant'.

      We have altered the paragraph according to your suggestion.

      Line 138: In line with the recommendations put forward by Blomkvist and Marks (2023), I would suggest removing the word 'diagnosed', as this medicalises aphantasia in a way that is not consistent with its not being a kind of mental disorder (Monzel et al., 2022). I would say that aphantasia is instead operationalised as a score between 16-32. However, note that Blomkvist (2022) and Blomkvist and Marks (2023, https://doi.org/10.1016/j.cortex.2023.09.004 ) point out that there is also a lot of inconsistency in this score and how it is used in different studies. In your manuscript, I would recommend removing all wording that indicates that people with aphantasia have no experience of mental imagery, as you have operationalised for a score up to 32 which indicates vague and dim imagery. Describing vague and dim imagery as no imagery/absence of imagery is inconsistent (but common practice in the literature).

      Thank you for your suggestion. We have revised the entire manuscript to eliminate any ambiguous meanings regarding the definition of aphantasia. Moreover, we replaced the word ‘diagnosed’ with ‘identified’ in Line 146.

      Line 153: maybe 'correlated with imagery strength' rather than 'measures imagery strength'?

      We have altered the sentence according to your suggestion in Line 160:

      ‘Previous studies have shown that the binocular rivalry task validly correlated with mental imagery strength.’

      Line 162: "For participants who were younger than 34 years, the middle-age memory was replaced by another early adulthood memory". Is there precedence for this? Please add one sentence to explain/justify for the reader why a memory from this time period was chosen.

      To maintain the homogeneous data set of acquiring five episodic autobiographical memories from five different periods of life per one individual, we asked the participants who were at the time of the interview, younger than 34 years old, to provide another early adulthood memory instead of middle age memory, as they had not reached the age range of middle age. According to Levine et al. (2002), younger adults (age < 34 years old) selected 2 events from the early adulthood period. Hence, all participants provided the last time period with memories from their previous year. We have added an additional explanation in this section in Line 170:

      ‘In order to acquire five AMs in every participant, the middle age memory was replaced by another early adulthood memory for participants who were younger than 34 years old (see Levine et al., 2002). Hence, all participants provided the last time period with memories from their previous year.’

      Line 169: "During the general probe, the interviewer asked the participant encouragingly to promote any additional details." Consider a different word choice, 'promote' sounds odd.

      We have altered the sentence according to your suggestion in Line 180:

      ‘During the general probe, the interviewer asked the participant encouragingly to provide any additional details.’

      Line 196-198: the phrasing of these questions could have biased participants toward reporting it being more difficult. Did the authors control for this possibility in any way? The phrasing ‘How easy is it for you to [x]?’ might also be considered in a future study.

      Thank you for pointing this out. These debriefing questions were thought of as open questions to get people to talk about their experiences. They were not meant as rigorous scientific experiments. Framing it in a positive way is a good idea for future research.

      We have edited the manuscript on Line 394-396:

      ‘The debriefing questions were employed as a way for participants to reflect on their own cognitive abilities. Of note, these were not meant to represent or replace necessary future experiments.’

      Line 197: This question is ungrammatical. Is this a typo, or was this how the question was actually posed? What language was the study conducted in?

      All interviews within this study were conducted in German. Hence, the questions listed in this current manuscript were all translated from German into English. We have added this information in the Materials and Methods section in Line 169 as well as restructured the referred questions from Line 208-210:

      ‘All interviews were conducted in German.’

      (1) Typically, how difficult is it for you to recall autobiographical memories?

      (2) Typically, how difficult is it for you to orient yourself spatially? 

      (3) Typically, how difficult is it for you to use your imagination?’

      Line 211: The authors write that participants were asked to "re-experience the chosen AM and elaborate as many details as possible in their mind's eye" was this the instruction used? I think stating the explicit instruction here would be relevant for the reader. If this is the word choice, it is also interesting as the autobiographical interview does not normally specify to re-experience details 'in one's mind's eye'.

      The instructions gi‘en to ’he par’Icipa’ts were to choose an AM and re-experience/elaborate it in their mind with as many details as possible without explaining them out loud. We have clarified this in Lines 221-223.

      ‘For the rest of the trial duration, participants were asked to re-experience the chosen AM and try to recall as many details as possible without speaking out loud.’

      Line 213: Were ‘vivid’ and ‘faint’ the only two options? Why was a 5-point scale (like the VVIQ scale) not used to better be able to compare?

      During the scanning session, the participants were given a button box which contained two buttons with 'vivid' by pressing the index finger and 'faint' by pressing the middle finger. The 5-point scale was not used to avoid confusion with the buttons during the scanning session. We have clarified this in Line 224:

      ‘We chose a simple two-button response in order to keep the task as easy as possible.’

      Line 347: Do the authors mean the same thing by 'imagery strength' and 'imagery vividness'? This would be good to clarify as it is not clear that these words mean the same thing.

      Imagery strength is often used to describe the results of the Binocular Rivalry Task, whereas vividness of mental imagery is often used to describe the results of the VVIQ. Although both tasks are correlated, the VVIQ measures vividness, whereas the dimension of the Binocular Rivalry Task is not clearly defined. We added this information in a footnote on page 10.

      Lines 353 - 356: When the authors first say that aphantasics described fewer memory details than controls, does this refer to external + internal details? Please clarify.

      Lines 353-360: The authors first say that aphantasics report "internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94)" (line 355). But then they say: "a 2-way interaction was found between the type of memory details and group, F(1, 27)= 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b)" (line 358). This seems to first say that aphantasics didn't report fewer details than controls, but then that they did report fewer internal details than controls. Please clarify if this is correct.

      Line 383: Results from controls are not reported in this section.

      We have first reported the main effects of the different factors; thus, aphantasics reported less details than controls (no matter of group and type of memory details), the internal details were reported more often than external details (no matter of group and memory period), and more details were reported for recent than remote memories (no matter of group and type of memory details). Subsequently, we report the simple effects for aphantasics and controls separately. To further clarify, we added the following segment in line 360:

      ‘Regarding the AI, we found significant main effects of memory period, F(1, 27) = 11.88, p = .002, ηp2 = .31, type of memory details, F(1, 27) = 189.03, p < .001, ηp2 = .88, and group, F(1, 27) = 9.98, p = .004, ηp2 = .27. When the other conditions were collapsed, aphantasics (M = 26.29, SD = 9.58) described less memory details than controls (M = 38.36, SD = 10.99). For aphantasics and controls combined, more details were reported for recent (M = 35.17, SD = 14.19) than remote memories (M = 29.06, SD = 11.12), and internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94). More importantly, a 2-way interaction was found between type of memory details and group, F(1, 27) = 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b).’

      Overall, the results were reported for aphantasics and controls separately in Lines 368-372.

      Line 386: The question does not specify that it's asking about using imagination in daily life, even though this is what results report. I'm not sure that the question implies the use of imagination in daily life, so I would recommend removing this reference here.

      We have removed the “in daily life” since this was not part of the original debriefing question.

      Line 394: Could this slowness in response reflect uncertainty about the vividness?

      Since the reason for this slowness is not known, we have refrained from adding this to the discussion. However, we added this as a short insertion in line 406:

      ‘Moreover, aphantasics responded slower (M = 1.34 s, SD = 0.38 s) than controls (M = 1.00 s, SD = 0.29 s) when they were asked whether their retrieved memories were vivid or faint, t(28) = 2.78, p = .009, possibly reflecting uncertainty in their response.’

      Line 443: Graph E, significance not indicated on the graph.

      After preprocessing, the fMRI data were statistically analyzed using the GLM contrast AM versus MA. The resulting images were then thresholded at p < 0.001, so that the illuminated voxels in Fig. 3 A, B, C, and D show only voxel in which we know already that there is a statistical difference between our conditions. Graph E illustrates only the descriptive means and variance of the significant differences in Fig. 3 C and D. This display is useful since the reader can more easily assess the difference between two conditions and two groups at a glance. For a general discussion on this topic, please also see circular analysis in fMRI (Kriegeskorte et al. 2009)

      Line 521-522: The authors claim that Pearson (2019) forwards the hypothesis that heightened activity of visual-perceptual cortices hinders aphantasics from detecting small imagery-related signals. However, I find no statement of this hypothesis in Pearson (2019). It is unclear to me why this hypothesis is attributed to Pearson (2019). Please remove this reference or provide a correct citation for where the hypothesis is stated. Further, it is not clear from what is written how the results support this hypothesis as this is rather brief - please elaborate on this.

      We attributed this hypothesis to Pearson (2019) according to his Fig. 4, which states: ‘A strong top-down signal and low noise (bottom left) gives the strongest mental image (square), whereas a high level of neural noise and a weak top-down imagery signal would produce the weakest imagery experience (top right).’

      We have edited our manuscript to reflect Pearson better in Lines 543-550:

      ‘In a prominent review, Pearson synthesizes evidence about the neural mechanism of imagery strength (Pearson, 2019). Indeed, activity metrics in the visual cortex predict imagery strength (Cui et al., 2007; Dijkstra et al., 2017). Interestingly, lower resting activity and excitability result in stronger imagery, and reducing cortical activity in the visual cortex via transcranial direct current stimulation (tDCS) increases visual imagery strength (Keogh et al., 2020). Thus, one potential mechanism of aphantasia-related AM deficits is that the heightened activity of the visual-perceptual cortices observed in our and previous work hinders aphantasics to detect weaker imagery-related signals.’

      Line 575: Consider citing Blomkvist (2022) who has argued that aphantasia is an episodic memory condition

      We added the suggested reference in Line 601.

      Line 585: Consider citing Bainbridge et al (2021) https://doi.org/10.1016/j.cortex.2020.11.014

      We have added the suggested reference in Line 612.

      Line 581: It might be relevant here to also discuss non-visual details, which have indeed been investigated in your present study. E.g. the lower emotional details, temporal details, place details, etc.

      We have edited our discussion to reflect the non-visual details better in Line 605:

      ‘In fact, previous and the current study show that aphantasics and individuals with hippocampal damage report less internal details across several memory detail subcategories, such as emotional details and temporal details (Rosenbaum et al., 2008; St-Laurent et al., 2009; Steinvorth et al., 2005), and these deficits can be observed regardless of the recency of the memory (Miller et al., 2020). These similarities suggest that aphantasics are not merely missing the visual-perceptual details to specific AM, but they have a profound deficit associated with the retrieval of AM.’

      Place details are discussed on page 37 onwards.

      Line 605: I agree with this interesting suggestion for future research. It would also be relevant to reference Bainbridge (2021) here who tested spatial cognition in a drawing task and found that aphantasic participants correctly recalled spatial layouts of rooms but reported fewer objects than controls. It might also be worth pointing out that the present study does not actually test for accuracy in spatial cognition, so it could be the case that people with aphantasia feel confident that they can navigate well, but they might in fact not. Future studies relying on objective measures should test this possibility.

      We have added the suggested reference in Line 625.

      Lines 609-614: Is there any evidence that complex decision-making and complex empathy tasks depend on constructed scenes with visual-perceptual details? This hypothesis seems a bit far-fetched without any supporting evidence. In fact, it seems unlikely to be supported as we also know that people with aphantasia generally live normal lives, and often have careers that we can assume involve complex decision-making (see Zeman 2020 who report aphantasics who work as computer scientists, managers, etc). I would recommend that the authors provide evidence of the role of mental imagery in complex decision-making and complex empathy tasks, mediated by scene construction, to support this hypothesis as viable to test for future research. It is also unclear how this point connects to the argument made by Bergmann and Ortiz-Tudela (2023). In fact, Bergmann and Ortiz-Tudela seem to make the same argument as Pearson (2019) does - that aphantasia results from impairments in the ventral stream, but that the dorsal stream is unaffected. However, Blomkvist (2022) argues that this view is too simplistic to be able to account for the variety of deficits that we see in aphantasia. I would recommend either engaging more fully with this debate or cutting it, as it currently is too vague for a reader to follow.

      We have decided to leave the discussion about scene construction and its connection to complex decision making and empathy out of the current manuscript. We have included the argument of Bergmann & Ortiz-Tudela (2023) in the Introduction (Line 101):

      ‘In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Reviewer #2 (Recommendations For The Authors):

      In general, I really enjoyed reading this paper.

      Thank you very much for the positive evaluation of our manuscript as well as your comments.

      There were only a few things that I had some concerns about. For example, it was unclear to me whether the whole-brain analysis (Figures 3 and 4) was corrected for multiple comparisons or why only a small volume correction was applied for the functional connectivity analysis. If these results are borderline significant, this should be made more explicit in the manuscript. I don't think this is a major issue as the investigation of both the hippocampus and visual cortex was strongly hypothesis-driven, but it would still be good to be explicit about the strength of the findings.

      For the whole-brain analysis, we applied a threshold of p < .001, voxel cluster of 10, but no other multiple comparisons correction applied. The peak in the right hippocampus did survive the whole-brain threshold but we decided to lower this threshold just for display purposes in Figure 3, so that the readers can easily see the cluster.

      We have made the statistical thresholds more easily assessable for the reader on the following pages:

      Figure 3 (Page 27): ‘Images are thresholded at p < .001, cluster size 10, uncorrected, except (D) which is thresholded at p < .01, cluster size 10, for display purposes only (i.e., the peak voxel and adjacent 10 voxels also survived p < .001, uncorrected).’

      Figure 4 (Page 30): ‘Image is displayed at p < .05, small volume corrected, and a voxel cluster threshold of 10 adjacent voxels.’

      I was wondering whether it would be possible to use DCM to investigate the directionality of the connectivity. Given that there are only two ROIs and two alternative hypotheses (top-down versus bottom-up) this seems like an ideal DCM problem.

      We thank the reviewer for this suggestion and will consider testing the effective connectivity between both regions of interest in a future investigation. 

      Line 385: typo: 'great' should be 'greater'.

      We have altered the typo from ‘great’ to ‘greater’ in Line 397.

      Line 400: absence of evidence of an effect is not evidence of absence of an effect.

      We agree with the reviewer that this was unclear. We changed the wording in Line 412:

      ‘In addition, aphantasics and controls did not differ significantly in their time searching for a memory in AM trials, t(19) = 1.03, p = .315.’

      Typo line 623: 'overseas'.

      We have altered the mistyped word from ‘overseas’ to ‘oversees’ in Line 647.

    1. Author response:

      Reviewer #1 (Public review):

      (1) The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      Response: upon revision, we plan to rewrite the introduction of the manuscript.

      (2) For the sequencing, which kit was used on the Novaseq6000?

      Response: for sequencing, we used the Chromium Controller and Chromium Single Cell 3’Reagent Kits (v3 chemistry CG000183) on the Novaseq6000. We feel sorry for lacking this quite important part and will add the information in Methods.

      (3) Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      Response: we apologize for the inadequacy of descriptions of data analysis process due to word count limit. We plan to provide more information, and if possible we also would like to provide scripts as supplementary data in the revised manuscript.

      (4) For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      Response: we will add the list of marker genes for cell type annotation in the revised manuscript.

      (5) No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      Response: considering this inadequacy, we plan to use statistic approaches for further analyses to compare the differences between each set of groups up revision.

      (6) The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      Response: we feel sorry for impreciseness when presenting histograms such as Fig 2D and we will add labels in Y-axis. As for the width of bars, we just used the histograms generated originally from the data package. However, we did not intend to double the width on purpose to strengthen the visual importance. We sincerely feel sorry for this and will correct the similar mistakes alongside the whole manuscript.

      (7) Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      Response: we agree that many conclusions, which were based on bio-informatic predictions, are written in an over-affirmative way. Upon revision, we will rewrite these conclusions more precisely.

      (8) The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      Response: we are thankful for this suggestion. We think that each cluster of epithelial cells is specified from other clusters and identified by DEGs, but they are not heavily unconnected from others. Upon revision, we plan to add further validation for the existence of Epi_10_CYSTM1.

      (9) Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      Response: from the data of TCGA survival analysis for Epi_10, we found a not-so-slight trend of difference between groups (with a small P value). As a result, we presented this data and hoped to add more strength to the clinical significance of this cluster. However, this indeed caused controversy because the P value is non-significant. We plan to rewrite the conclusion more precisely or delete this data in the revised manuscript.

      (10) The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      Response: we feel thankful for this question. The conclusion “The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis” has indeed been written too concrete according to the sample distribution. We will correct the description in the up-coming revised manuscript. As for SLC26A3, we also do not think it is “broadly” expressed, but it is specified in later tumors. When we presented the data of IHC, we only showed the strongly-positive area of each slide in order to emphasize the differences, however, this has caused misunderstandings. Thus, upon revision, we would like to show the other areas of one case or even the scan of one whole slide as supplementary data.

      (11) The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      Response: we apologize for the ignorance of further validation of cytotoxic T cells. From fig. 4B and 4C, the four different clusters of T cells were basically identified based on canonical T cell markers. And then we focused mainly on the validation and further analysis of Tregs, neglecting the other clusters. In fig. 4D we intended to only show the top DEGs in each T cell cluster and hoped to find some potential marker genes for next-step analysis. However, we did not notice that there might be contamination of epithelial cells within cytotoxic T cells when clustering. We will optimize the analysis of this part in our revision.

      (12) Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

      Response: our initial purpose was to use GO analysis as supports for our conclusions. However we know these are only claims but not evidence, which is also the problem of our writing techniques as in question (7). Therefore, in our revised manuscript, we plan to rewrite the conclusion from the GO analysis in a more scientific way or delete these data.

      Reviewer #2 (Public review):

      (1) I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      Response: we understand that many of the conclusions are too sure but lack profound supporting evidence, thus we will optimize the writing in the revised manuscript. More importantly, to strengthen the validity of our data, we will try to use statistical approaches for further analysis.

      (2) This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      Response: we sincerely feel grateful for being questioned on the validity, appropriateness and the real potential of SLC26A3. We plan to add more explanation of the importance of SLC26A3 in the discussion part. We are also sorry for some over-sure conclusions about ADC-specific cell clusters, as well as the marker gene SLC26A3. However, we do not think these conclusions are problematic. In fact, due to the heterogeneity among different individuals, as well as even different sites within one individual when sampling, we think a “small faction” does not means it will not make sense. Also, these ADC-specific clusters (including Epi_10_CYSTM1) do have certain proportions when comparing with those “big fraction” groups (Fig. 2D). Furthermore, when considering the specificity of DEGs to ADC only, but not to SCC, we think it might be these ADC-specific cluster genes to have the central function to make a difference between ADC and SCC. And we further used validation experiment to support our hypothesis. Lastly and most importantly, SLC26A3 was coming from sample 7 whose clinical stage is FIGO IIIC (late stage) and pathological type is ADC. Among the 15 cases, there are only 4 cases whose clinical stages are late (within which 3 are ADC). At this point of view, we think 1 in 3 (33%) having expression of SLC26A3 (or existence of cluster Epi_10_CYSTM1) should be considered as a potential choice. Samples coming from early-staged and SCC patients do not have fractions of Epi_10_CYSTM1. This likewise indicates the specificity of this cell cluster to ADC. Therefore, in our revised manuscript, we plan to add more in-depth discussion about this question.

      (3) This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

      Response: do you mean Figure 1B and D? In the revised manuscript, we will list the canonical marker genes to cluster different types of cells to at least support that the clustering of cell types match most of the present published references. To further avoid the contamination of cells in each cluster, we will use quality controls and re-analyze these data upon revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work presents an in-depth characterization of the factors that influence the structural dynamics of the Clostridium botulinum guanidine-IV riboswitch (riboG). Using a single-molecule FRET, the authors demonstrate that riboG undergoes ligand and Mg2+ dependent conformational changes consistent with the dynamic formation of a kissing loop (KL) in the aptamer domain. Formation of the KL is attenuated by Mg2+ and Gua+ ligand at physiological concentrations as well as the length of the RNA. Interestingly, the KL is most stable in the context of just the aptamer domain compared to longer RNAs capable of forming the terminator stem. To attenuate transcription, binding of Gua+ and formation of the KL must occur rapidly after transcription of the aptamer domain but before transcription of the rest of the terminator stem.

      Strengths:

      (1) Single-molecule FRET microscopy is well suited to unveil the conformational dynamics of KL formation and the authors provide a wealth of data to examine the effect of the ligand and ions on riboswitch dynamics. The addition of complementary transcriptional readthrough assays provides further support for the author's proposed model of how the riboswitch dynamics contribute to function.

      (2) The single-molecule data strongly support that the effect of Gua+ ligand and Mg2+ influence the RNA structure differently for varying lengths of the RNA. The authors also demonstrate that this is specific for Mg2+ as Na+ and K+ ions have little effect.

      (3) The PLOR method utilized is clever and well adapted for both dual labeling of RNAs and examining RNA at various lengths to mimic co-transcriptional folding. Using PLOR, they demonstrate that a change in the structural dynamics and ligand binding can occur after the extension of the RNA transcript by a single nucleotide. Such a tight window of regulation has intriguing implications for kinetically controlled riboswitches.

      Weaknesses:

      (1) The authors use only one mutant to confirm that their FRET signal indicates the formation of the KL. Importantly, this mutation does not involve the nucleotides that are part of the KL interaction. It would be more convincing if the authors used mutations in both strands of the KL and performed compensatory mutations that restore base pairing. Experiments like this would solidify the structural interpretation of the work, particularly in the context of the full-length riboG RNA or in the cotranscriptional mimic experiments, which appear to have more conformational heterogeneity.

      We thank the reviewer for describing our work “in-depth characterization” of riboG. We agree with the reviewer and we have added two more mutants, G71C and U72C with the mutations located at the KL (Figure 2– figure supplement 8A, 8B, 9A, 9B, Figure 3– figure supplement 6A, 6B, 7A, 7B, and Figure 4– figure supplement 6A, 6B, 7A, 7B). Furthermore, we have performed compensatory mutations, C30G-G71C and A29G-U72C that restore base pairing in the KL (Figure 2– figure supplement 8C, 8D, 9C, 9D, Figure 3– figure supplement 6C, 6D, 7C, 7D, and Figure 4– figure supplement 6C, 6D, 7C, 7D). We added the experimental results in the revised manuscript accordingly as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).

      (2) The existence of the pre-folded state (intermediate FRET ~0.5) is not well supported in their data and could be explained by an acquisition artifact. The dwell times are very short often only a single frame indicating that there could be a very fast transition (< 0.1s) from low to high FRET that averages to a FRET efficiency of 0.5. To firmly demonstrate that this intermediate FRET state is metastable and not an artifact, the authors need to perform measurements with a faster frame rate and demonstrate that the state is still present.

      We thank the reviewer for the great comment. We added smFRET experiments at higher time resolution, 20 ms, as well as lower time resolution (Figure 2– figure supplement 3).  Based on our experimental results, the intermediate state (EFRET ~0.5) exists at the smFRET collected at 20 ms, 100 ms and 200 ms. 

      (3) The PLOR method employs a non-biologically relevant polymerase (T7 RNAP) to mimic transcription elongation and folding near the elongation complex. T7 RNAP has a shorter exit channel than bacterial RNAPs and therefore, folding in the exit channel may be different between different RNAPs. Additionally, the nascent RNA may interact with bacterial RNAP differently. For these reasons, it is not clear how well the dynamics observed in the T7 ECs recapitulate riboswitch folding dynamics in bacterial ECs where they would occur in nature. 

      We thank the reviewer for the comment. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 13–14).

      Reviewer #2 (Public Review):

      Summary:

      Gao et al. used single-molecule FRET and step-wise transcription methods to study the conformations of the recently reported guanidine-IV class of bacterial riboswitches that upregulate transcription in the presence of elevated guanidine. Using three riboswitch lengths, the authors analyzed the distributions and transitions between different conformers in response to different Mg2+ and guanidine concentrations. These data led to a three-state kinetic model for the structural switching of this novel class of riboswitches whose structures remain unavailable. Using the PLOR method that the authors previously invented, they further examined the conformations, ligand responses, and gene-regulatory outcomes at discrete transcript lengths along the path of vectorial transcription. These analyses uncover that the riboswitch exhibits differential sensitivity to ligand-induced conformational switching at different steps of transcription, and identify a short window where the regulatory outcome is most sensitive to ligand binding.

      Strengths:

      Dual internal labeling of long RNA transcripts remains technically very challenging but essential for smFRET analyses of RNA conformations. The authors should be commended for achieving very high quality and purity in their labelled RNA samples. The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality. The findings are significant because the paradigm uncovered here for this relatively simple riboswitch class is likely also employed in numerous other kinetically regulated riboswitches. The ability to quantitatively assess RNA conformations and ligand responses at multiple discrete points along the path towards the full transcript provides a rare and powerful glimpse into cotranscriptional RNA folding, ligand-binding, and conformational switching.

      Weaknesses:

      The use of T7 RNA polymerase instead of a near-cognate bacterial RNA polymerase in the termination/antitermination assays is a significant caveat. It is understandable as T7 RNA polymerase is much more robust than its bacterial counterparts, which probably will not survive the extensive washes required by the PLOR method. The major conclusions should still hold, as the RNA conformations are probed by smFRET at static, halted complexes instead of on the fly. However, potential effects of the cognate RNA polymerase cannot be discerned here, including transcriptional rates, pausing, and interactions between the nascent transcript and the RNA exit channel, if any. The authors should refrain from discussing potential effects from the DNA template or the T7 RNA polymerase, as these elements are not cognate with the riboswitch under study.

      We thank the reviewer for describing our work “The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14).

      Reviewer #3 (Public Review):

      Summary:

      In this article, Gao et. al. uses single-molecule FRET (smFRET) and position-specific labelling of RNA (PLOR) to dissect the folding and behavioral ligand sensing of the Guanidine-IV riboswitch in the presence and absence of the ligand guanidine and the cation Mg2+. The results provided valuable information on the mechanistic aspects of the riboswitch, including the confirmation of the kissing loop present in the structure as essential for folding and riboswitch activity. Co-transcriptional investigations of the system provided key information on the ligand-sensing behavior and ligandbinding window of the riboswitch. A plausible folding model of the Guanidine-IV riboswitch was proposed as a final result. The evidence presented here sheds additional light on the mode of action of transcriptional riboswitches.

      Strengths:

      The investigations were very thorough, providing data that supports the conclusions. The use of smFRET and PLOR to investigate RNA folding has been shown to be a valuable tool for the understanding of folding and behavior properties of these structured RNA molecules. The co-transcriptional analysis brought important information on how the riboswitch works, including the ligand-sensing and the binding window that promotes the structural switch. The fact that investigations were done with the aptamer domain, aptamer domain + terminator/anti-terminator region, and the full-length riboswitch were essential to inform how each domain contributes to the final structural state if in the presence of the ligand and Mg2+.

      Weaknesses:

      The system has its own flaws when compared to physiological conditions. The RNA polymerase used (the study uses T7 RNA polymerase) is different from the bacterial RNA polymerase, not only in complexity, but also in transcriptional speed, which can directly interfere with folding and ligand-sensing. Additionally, rNTPs concentrations were much lower than physiological concentrations during transcription, likely causing a change in the polymerase transcriptional speed. These important aspects and how they could interfere with results are important to be addressed to the broad audience. Another point of consideration to be aware of is that the bulky fluorophores attached to the nucleotides can interfere with folding to some extent.

      We thank the reviewer for describing our work as “The investigations were very thorough, providing data that supports the conclusions”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the cotranscriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14). And we also agree with the reviewer that the lower NTP may affect the transcriptional speed. Regarding the fluorophores, we purposely placed them away from the KL to avoid their influence on the formation of the KL.

      Reviewer #1 (Recommendations For The Authors):

      Related to weakness 1

      - The authors cite a paper that investigated mutations in the KL duplex but do not include these mutations in their analysis. It is unclear why the authors chose the G77C mutation and not the other mutants previously tested. Can the authors explain their choice of mutation in detail in the text? I also did not see the proposed secondary structure for the G77C mutant shown in Figure 2 -supp 3A in the cited paper, is this a predicted structure? Please explain how this structure was determined. 

      We thank the reviewer for the comment. The reason we chosen the G77C mutation is based on previous report that G77C can disturb the formation of the KL, as we stated in the manuscript as “Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)” ( page 7). And the secondary structure for the G77C mutant was predicted by Mfold, which as cited in the manuscript and added in the reference list as “Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31(13), 3406-3415”. 

      - It is not clear to me that the structural interpretation of their FRET states is correct and that the FRET signal reports on the base pairing of the KL in only the high FRET state. The authors should perform experiments with additional mutations in the KL duplex to confirm that their construct reports on KL duplex formation alone and not other structural dynamics. 

      We thank the reviewer for the comment. We have included additional mutations to establish a connection between the high-FRET state to the formation of the KL. The results have been added to the manuscript as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).  

      - For the full-length riboG-136 (Cy3Cy5 riboG in Figure 4), the authors have clearly defined peaks at 0.6 and 0.4. However, the authors do not explain their structural interpretation of these states. Do the authors believe that the KL is forming in these states? It would be helpful to have data on mutations in the KL in the context of the full-length riboG to better understand the structural transitions of these intermediate states. 

      Based on our mutation studies, we proposed that the peak with EFRET ~0.8 corresponds to the conformation with the KL, while the states with EFRET ~0.4 and 0.6 are the states without a stable KL. 

      Related to weakness 2:

      - For the riboG-apt and riboG-term RNAs, the proposed intermediate FRET state (EFRET = 0.5) is poorly fit by a Gaussian and the dwell times in the state are almost entirely single-frame dwells. It is likely that this state is the result of a camera blurring artifact, in which RNAs undergo a FRET transition between two frames giving an apparent FRET efficiency which is between that of the two transitioning states. This artifact arises when the average dwell times of the true states (Elow and Ehigh) are comparable to the frame duration (within a factor of ~5-10; see https://doi.org/10.1021/acs.jpcb.1c01036). To confirm the presence of the intermediate state, the authors should perform at least a few experiments with higher time resolution to support the existence of the 0.5 state with a lifetime of 0.1 s. Alternatively, the data should be refit to a two-state HMM and the authors could explain in the text that the density in the FRET histogram between the two states is likely due to transitions that are faster than the time resolution of the experiment. 

      We thank the reviewer for the great comment. Taking the suggestion into consideration, we performed smFRET experiments with a higher time resolution of 20 ms. As a result, we still detected the intermediate state, supporting that it is not an artifact. The new data has been included in the revised manuscript (Figure 2-figure supplement 3).  

      Related to weakness 3:

      - The authors depict the polymerase footprint differently in some of the figures and it is unclear if this is part of their model. Is the cartoon RNAP supposed to indicate the RNA:DNA hybrid or the footprint of T7 RNAP on the RNA? For example, in Figure 8a there are 8 nts (left) and 9 nts (right) covered by RNAP, and only 6nts in Figure 6 - supp 2A. This is particularly misleading for the EC-87 and EC-88 in Figure 6 - supp 2, where it is likely that this stem is not formed at all and the KL strand is single-stranded. The authors should clarify and at least indicate in the figure legend if the RNAP cartoon is part of the model or only a representation. 

      We thank the reviewer for bringing the issues to our attention. Due to space limitations, we chose to represent the polymerase footprint differently in Figure 8. However, we have included the statement “DNA templates from EC-87 to EC-105 are not displayed in the model” in the legend of Figure 8 to avoid the confusion.

      Moreover, we have corrected the error of 6 nts Figure 6-supplement figure 2.  

      - With a correct 9 bp RNA:DNA hybrid, the EC-88 construct would not be able to form the top part of the P2 stem and the second half of the KL RNA would be single-stranded. In this case, an interaction between the KL nucleotides would resemble a pseudoknot and not a kissing loop interaction. Can the authors explain if this could explain the heterogeneity they observe in the EC-88 construct compared to the riboGapt  RNA?

      Thank the reviewer for the comment. We have added the statement in the revised manuscript as “The T7 RNA polymerase (RNAP) sequestered about 8 nt of the nascent RNA, preventing the EC-88 construct from forming the P2 stem (Durniak et al., 2008; Huang & Sousa, 2000; Lubkowska et al., 2011; Tahirov et al., 2002; Wang et al., 2022; Yin & Steitz, 2002). Consequently, a pseudoknot structure potentially formed instead of the expected KL. This distinction may account for the observed heterogeneity between EC-88 and riboG-apt” ( page 11).

      Other comments:

      (1) It appears that the FRET histograms in the PLOR experiments (Figure 6 and related figures) only show the fits presumably to highlight the overlays. However, this makes it impossible to determine the goodness of the fit. The authors should instead show the outline of the raw histogram with the fit, or at least show the raw histograms with fits in the supplement. 

      We have replaced Figure 6- figure supplements 2-4 to enhance the clarity of the raw and fitted smFRET histograms.  

      (2) The authors should consider including a concluding paragraph to put the results into a larger context. How does the kinetic window compare to other transcriptional riboswitches? Would the authors comment on how the transcription speed compares to the kinetics for the formation of the KL? 

      We thank the reviewer for the comment. We have added the comparison of riboG to other transcription riboswitches to the manuscript as “Nevertheless, the ligand-sensitive windows of riboswitches during transcription vary. In a study conducted by Helmling et al. using NMR spectroscopy, they proposed a broad transcriptional window for deoxyguanosine-sensing riboswitches, whereby the ligand binding capability gradually diminishes over several nucleotide lengths (Helmling et al., 2017). However, more recent research by Binas et al. and Landgraf et al. on riboswitches sensing ZMP, c-di-GMP, and c-GAMP revealed a narrow window with a sharp transition in binding capability, even with transcript lengths differing by only one or three nucleotides (Binas et al., 2020; Landgraf et al., 2022). In line with the findings for the c-GAMP-sensing riboswitch, our study on the guanidine-IV riboswitch also demonstrated a sharp transition in binding capability with just a single nucleotide extension” ( page 14). 

      We appreciate the reviewer’s comment in comparing the transcription speed to the kinetics of the KL formation. However, we must acknowledge that we have limited kinetic data in this study to confidently make such a comparison.

      (3) Cy3Cy5 RiboG is a confusing name because it implies that the others are not also Cy3Cy5 labeled. The authors should consider changing the names and being consistent throughout. I suggest full-length riboG or riboG-136. 

      We have changed “Cy3Cy5 riboG” to “Cy3Cy5-full-length riboG” (pages 15 and 16).

      (4) The transcriptional readthrough experiment should be explained when first mentioned in line 109. 

      We have added the citation (Chien et al., 2023) of the transcriptional readthrough experiment to the manuscript as “we noted that the transcriptional read-through of the guanidine-IV riboswitch during the single-round PLOR reaction was sensitive to Gua+, exhibiting an apparent EC50 value of 68.7  7.3 μM (Figure 1D) (Chien et al., 2023)” (page 5). 

      (5) Kd values in text should have uncertainties, and the way these uncertainties are obtained should be explained.

      We have added the uncertainties of Kd values in the revised manuscript ( page 6) and the legend of Figure 2-supplement 6 as “The percentages of the folded state (EFRET ~ 0.8) of Cy3Cy5-riboG-apt were plotted with the concentrations of Gua+ at 0.5 mM Mg2+, with an apparent Kd of 286.0  18.1 μM in three independent experiments”.

      (6) The authors mention "strategies" on line 306, but it is unclear what they are referring to. Are the strategies referring to the constructs (EC-87, etc) or Steps 1-8 in the supplemental figure? Please clarify. 

      We have clarified the confusion by adding “The detailed procedures of strategies 1-8 were shown in Figure 7–figure supplement 1” to the manuscript ( page 12).

      (7) What are the fraction of dynamic traces versus static traces in the cases for the full-length riboG? This would help depict the structural heterogeneity in the population. 

      We have added the fractions of dynamic single-molecule traces of the full-length riboG to Figure 4-supplements 1-5. 

      (8) The labels in Figure 4 (A-E) don't match the caption (A-H). 

      We have corrected the error. 

      (9) The coloring of the RNA strands in Figure 4A should be explained in the figure legend. It could be interpreted as multiple strands annealed instead of a continuous strand. 

      We have revised the legend of Figure 4A by adding “The full-length riboG contains the aptamer domain (black), terminator (red) and the extended sequence (blue). Cy3 and Cy5 are shown by green and red sparkles, respectively”.

      (10) Reported quantities and uncertainties should have the same number of decimal places. In many places, the uncertainties likely have too many significant figures, for example, in Figure 5 and related figures. 

      We have corrected the significant figures of the uncertainties. 

      (11) In Figure 5, A and B should have the same vertical scale to facilitate comparison. 

      We have adjusted Figure 5A to match the vertical scale of Figure 5B in the revised manuscript.

      (12) In Figure 5C-D, the construct from which those trajectories come should be indicated in the legend. 

      We have added the construct to the legend of Figures 5C and D.  

      (13) In Figure 6J, the splines between data points are confusing and can be misleading. They suggest that the data has been fit to a model, but I am not sure if it represents a model. The data points should be colored instead and lines removed. 

      We thank the reviewer for the comment. We have changed Figure 6J by coloring the data points and removing the lines to avoid confusion. 

      (14) Line 330 mentions a P2 structure in Figure 8, but there is no such label in Figure. Please clarify. 

      We thank the reviewer for the comment and have added P2 to Figure 8. 

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1B. The authors don't seem to address the role of the blue stem-loop following Stems 1 and 2. Is this element needed at all for gene regulation? Does it impact the conformations or folding of the preceding Stems 1 and 2? It seems feasible to disrupt the stem and see whether there is an impact on riboswitch function. 

      We thank the reviewer for the comment. The presence of the sequence which formed blue stem-loop indicates the formation of an anti-terminator conformation in riboG during transcription. Our smFRET data shows that the inclusion of the stem-loop sequence induces additional peaks in the full-length riboG compared to the riboGterm. This indicates that the stem-loop influences the folding of the kissing loop (KL) and potentially also affects the stems 1 and 2.  

      (2) Figure 7 supplement 1, C &D. Maybe I am missing something, but it seems to me in reaction #8 (EC-105, last two lanes), the readthrough percentage is close to 50% based on the gel but plotted in D as 20%. Further, there is a strong effect of guanidine in reaction #8 but that is not reflected in the quantitation in panel D. 

      We thank the reviewer for the comment. The observed discrepancy between reaction 8 in (C) and (D) is from the differential handling of the crude product at the last step (step 17) in gel loading for (C), contrasted with the combination of crude products from steps 16 and 17 to calculate the read-through percentage in (D). We have corrected the discrepancy by replacing Figure 7-Supplement figure 1C (now Figure 7C), and revised the legend to include the following clarification: “Taking into consideration that the 17 step-PLOR reaction exhibited a pause within the terminator region, resulting in a significant amount of terminated product at step 16, crude products from steps 16 and 17 were collected for (C) and (D) of the 17 step-PLOR reaction (Lanes 15 and 16 in C)”.

      (3) Figure 7C is a control that shows the quality of the elongation complexes, which probably should be in the supplement. Instead, in Figure 7 supplement 1, panels C and D are actual experiments and could be moved into the main figure.  

      We thank the reviewer for the comment. We made the adjustment.  

      (4) Figure S7D. I would suggest not labelling the RNA polymerase halt/stoppage sites due to NTP deprivation as "pausing sites" because transcriptional pausing has previously been defined as natural sites where the RNA polymerase transiently halts itself, but not due to the lack of the next NTPs. In this case, the elongating complexes were artificially halted, which is technically not "pausing", as it will not restart/resume on its own without intervention. 

      We have changed the “pausing” to “halting”.  

      (5) Figure 7 is titled "In vitro transcriptional performance of riboG." But the data is actually not about the performance of the riboswitch, or how well it functions. I would suggest the authors revise the title. This is mostly about the observed sensitivity window of the riboswitch to ligand-mediated conformational switching. 

      We have changed the title of Figure 7 to “Ligand-mediated conformational switching of riboG during transcription”.

      (6) Figure 7A, the illustration gives the visual impression that there are multiple RNA polymerases on the same DNA template, which is not the case. 

      We have revised Figure 7A by adding arrows between RNA polymerases to illustrate the movement of a single RNAP, rather than multiple RNAP on the same template.

      (7) It could be informative to compare the guanidine-IV riboswitch with the first three classes (I, II, III), to see how their architectures or gene regulatory mechanisms are similar or different. 

      We thank the reviewer for the comment. We have added the comparison of the guanidine-IV riboswitch to other three guanidine riboswitches to the manuscript as “The guanidine-IV riboswitch exhibits similarities to the guanidine-I riboswitch in gene regulatory mechanism, functioning as a transcriptional riboswitch. Structurally, it resembles the guanidine-II riboswitch through the formation of loop-loop interactions upon binding to guanidine (Battaglia & Ke, 2018; L. Huang et al., 2017; Lin Huang et al., 2017; Lenkeit et al., 2020; Nelson et al., 2017; Reiss & Strobel, 2017; Salvail et al., 2020)” ( page 12).  

      Reviewer #3 (Recommendations For The Authors):

      In addition to the public review items, I provide the following recommendations:

      (1) As a second language speaker, I understand that writing a compelling and concise story may be hard, and we tend to write more than needed or more repetitively. That being said, I do think that the writing could be improved to make it more concise, clear, and avoid repetitions.

      We thank the reviewer for the comment. We re-wrote the abstract and some sentences in the manuscript.

      (2) In the abstract, instead of saying that "...This lack of understanding has impeded the application of this riboswitch", which makes the statement too strong, perhaps, stating something along the lines of "this understanding would assist the application of this riboswitch", would be a better fit. 

      We have re-wrote the abstract, and revised the sentence.  

      (3) Methods should state which RNA polymerase was used. PLOR uses T7 RNA pol, so I assume it was the same. 

      We have added the statement “T7 RNAP was utilized in the PLOR and in vitro transcription reactions except noted” in the Methods ( page 15). 

      (4) The impact statement says comprehensive structure-function, where perhaps comprehensive folding-function would be more appropriate. We are still missing a lot of structural information about this particular riboswitch. 

      We agree with the reviewer, and changed “comprehensive structure-function” to “folding-function” in Impact statement ( page 2).

      (5) Higher Mg2+ concentrations implicated in a lesser extent of the switch of RiboGapt, a sentence talking about it would be useful (how Mg2+ could have promiscuous interaction and interfere with folding). 

      We have added the role of higher Mg2+ to the manuscript as “However, at a higher concentration of 50.0 mM Mg2+, the proportion of the pre-folded and unfolded conformations were more prevalent at 50.0 mM Mg2+ than at 20.0 mM Mg2+. This suggests that an excess of Mg2+ may promote the pre-folded and even unfolded conformations” ( page 6).

      (6) In the investigations of RiboG-term and RiboG, seems like that monovalents from the buffer are sufficient to promote secondary structure. A statement commenting on this would benefit the paper and the audience. 

      We agree with the reviewer and have accordingly revised the manuscript accordingly by adding “This indicates that monovalent ions in the buffer can facilitate the formation of stable guanidine-IV riboswitch” ( page 8).

      (7) Figure 3. Figure goes to panel E and legend to panel H. G and H colors do not correspond to actual figure colors. 

      We made the correction.  

      (8) Figure 4. The same as Figure 3, the panels and figures are divergent.  

      We made the correction.  

      (9) During the discussion, stating that the DNA and RNA pol play a role in folding and ligand binding may be excessive. This could be an indirect effect of the transcriptional bubble hindering part of the nascent RNA from folding, which is something intrinsic to any transcription and not specific to this system. 

      We agree with the reviewer and deleted the statement about the DNA and RNAP play a role in folding and ligand binding.

      (10) PLOR is not properly cited. When introduced in the manuscript, please cite the original PLOR paper (Liu et. al. Nature 2015) and additional related papers. 

      We cited the original PLOR paper (Liu et al, Nature 2015) and the related papers (Liu et al, Nature Protocols 2018). ( pages 4 and 15)

      (11) The kinetics race of folding and binding could be a little more emphasized in discussion, particularly from the perspective of its physiological importance. 

      We agree with the reviewer and deleted the kinetics race of folding and binding from the Discussion part.

    1. that we might form great friendship, for I knew that they were a people who could be more easily freed and converted to our holy faith by love than by force, gave to some of them red caps, and glass beads to put round their necks, and many other things of little value, which gave them great pleasure, and made them so much our friends that it was a marvel to see.

      I found this sentence from the excerpt to be quite informative on how Columbus saw the natives of americas. He talked about wanting to be "great friends". But in my mind he means that they are simply easily people to use in his idea of furthering European powers. It also seems that the tone of force and love may cause different forms of his ideas to be on display. That being how he deals with them what happens to those who conform to christianity and those who do not want to. It's easy to see that Columbus saw the natives as less than because they didn't know the value of them items he shared with them. I think this statement was the start of furthering the notion of those who are not from Europe to be less than and in capable of being on the same level as Europeans. Simply because the natives put higher value in other things than the Europeans do.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      Weaknesses:

      The comparison of affinity predictions derived from AlphaFold2 and H3-opt models, based on molecular dynamics simulations, should have been discussed in depth. In some cases, there are huge differences between the estimations from H3-opt models and those from experimental structures. It seems that the authors obtained average differences of the real delta, instead of average differences of the absolute value of the delta. This can be misleading, because high negative differences might be compensated by high positive differences when computing the mean value. Moreover, it would have been good for the authors to disclose the trajectories from the MD simulations.

      Thanks for your careful checks. We fully understand your concerns about the large differences when calculating affinity. To understand the source of these huge differences, we carefully analyzed the trajectories of the input structures during MD simulations. We found that the antigen-antibody complex shifted as it transited from NVT to NPT during pre-equilibrium, even when restraints are used to determine the protein structure. To address this issue, we consulted the solution provided on Amber's mailing list (http://archive.ambermd.org/202102/0298.html) and modified the top file ATOMS_MOLECULE item of the simulation system to merge the antigen-antibody complexes into one molecule. As a result, the number of SOLVENT_POINTERS was also adjusted. Finally, we performed all MD simulations and calculated affinities of all complexes.

      We have corrected the “Afterwards, a 25000-step NVT simulation with a time step of 1 fs was performed to gradually heat the system from 0 K to 100 K. A 250000-step NPT simulation with a time step of 2 fs was carried out to further heat the system from 100 K to 298 K.” into “Afterwards, a 400-ps NVT simulation with a time step of 2 fs was performed to gradually heat the system from 0 K to 298 K (0–100 K: 100 ps; 100-298 K: 200 ps; hold 298 K: 100 ps), and a 100-ps NPT simulation with a time step of 2 fs was performed to equilibrate the density of the system. During heating and density equilibration, we constrained the antigen-antibody structure with a restraint value of 10 kcal×mol-1×Å-2.” and added the following sentence in the Method section of our revised manuscript: “The first 50 ns restrains the non-hydrogen atoms of the antigen-antibody complex, and the last 50 ns restrains the non-hydrogen atoms of the antigen, with a constraint value of 10 kcal×mol-1×Å-2”

      In addition, we have corrected the calculation of mean deltas using absolute values and have demonstrated that the average affinities of structures predicted by H3-OPT were closer to those of experimentally determined structures than values obtained through AF2. These results have been updated in the revised manuscript. However, significant differences still exist between the estimations of H3-OPT models and those derived from experimental structures in few cases. We found that antibodies moved away from antigens both in AF2 and H3-OPT predicted complexes during simulations, resulting in RMSDbackbone (RMSD of antibody backbone) exceeding 20 Å. These deviations led to significant structural changes in the complexes and consequently resulted in notable differences in affinity calculations. Thus, we removed three samples (PDBID: 4qhu, 6flc, 6plk) from benchmark because these predicted structures moved away from the antigen structure during MD simulations, resulting in huge energy differences from the native structures.

      Author response table 1.

      We also appreciate your reminder, and we have calculated all RMSDbackbone during production runs (SI Fig. 5).

      Author response image 1.

      Reviewer #3 (Public Review):

      Weaknesses:

      The proposed method lacks of a confidence score or a warning to help guiding the users in moderate to challenging cases.

      We were sorry for our mistakes. We have updated our GitHub code and added following sentences to clarify how we train this confidence score module in Method Section: “Confidence score prediction module

      We apply an MSE loss for confidence prediction, label error was calculated as the Cα deviation of each residue after alignment. The inputs of this module are the same as those used for H3-OPT, and it generates a confidence score ranging from 0 to 100. The dropout rates of H3-OPT were set to 0.25. The learning rate and weight decay of Adam optimizer are set to 1 × 10−5 and 1 × 10−4, respectively.”

      Reviewer #2 (Recommendations For The Authors):

      I would strongly suggest that the authors deepen their discussion on the affinity prediction based on Molecular Dynamics. In particular, why do the authors think that some structures exhibit huge differences between the predictions from the experimental structure and the predicted by H3-opt? Also, please compute the mean deltas using the absolute value and not the real value; the letter can be extremely misleading and hidden very high differences in different directions that are compensating when averaging.

      I would also advice to include graphical results of the MD trajectories, at least as Supp. Material.

      We gratefully thank you for your feedback and fully understand your concerns. We found the source of these huge differences and solved this problem by changing method of MD simulations. Then, we calculated all affinities and corrected the mean deltas calculation using the absolute value. The RMSDbackbone values were also measured to enable accurate affinity predictions during production runs (SI Fig. 5). There are still big differences between the estimations of H3-OPT models and those from experimental structures in some cases. We found that antibodies moved away from antigens both in AF2 and H3-OPT predicted complexes during simulations, resulting in RMSDbackbone exceeding 20 Å. These deviations led to significant structural changes in the complexes and consequently resulted in notable differences in affinity calculations. Thus, we removed three samples (PDBID: 4qhu, 6flc, 6plk) from benchmark.

      Thanks again for your professional advice.

      Reviewer #3 (Recommendations For The Authors):

      (1) I am pleased with the most of the answers provided by the authors to the first review. In my humble opinion, the new manuscript has greatly improved. However, I think some answers to the reviewers are worth to be included in the main text or supporting information for the benefit of general readers. In particular, the requested statistics (i.e. p-values for Cα-RMSD values across the modeling approaches, p-values and error bars in Fig 5a and 5b, etc.) should be introduced in the manuscript.

      We sincerely appreciate your advice. We have added the statistics values to Fig. 4 and Fig. 5 to our manuscript.

      Author response image 2.

      Author response image 3.

      (2) Similarly, authors state in the answers that "we have trained a separate module to predict the confidence score of the optimized CDR-H3 loops". That sounds a great improvement to H3-OPT! However, I couldn't find any reference of that new module in the reviewed version of the manuscript, nor in the available GitHub code. That is the reason for me to hold the weakness "The proposed method lacks of a confidence score".

      We were really sorry for our careless mistakes. Thank you for your reminding. We have updated our GitHub code and added following sentences to clarify how we train this confidence score module in Method Section:

      “Confidence score prediction module

      We apply an MSE loss for confidence prediction, label error was calculated as the Cα deviation of each residue after alignment. The inputs of this module are the same as those used for H3-OPT, and it generates a confidence score ranging from 0 to 100. The dropout rates of H3-OPT were set to 0.25. The learning rate and weight decay of Adam optimizer are set to 1 × 10−5 and 1 × 10−4, respectively.”

      (3) I acknowledge all the efforts made for solving new mutant/designed nanobody structures. Judging from the solved structures, mutants Y95F and Q118N seems critical to either crystallographic or dimerization contacts stabilizing the CDR-H3 loop, hence preventing the formation of crystals. Clearly, solving a molecular structure is a challenge, hence including the following comment in the manuscript is relevant for readers to correctly asset the magnitude of the validation: "The sequence identities of the VH domain and H3 loop are 0.816 and 0.647, respectively, comparing with the best template. The CDR-H3 lengths of these nanobodies are both 17. According to our classification strategy, these nanobodies belong to Sub1. The confidence scores of these AlphaFold2 predicted loops were all higher than 0.8, and these loops were accepted as the outputs of H3-OPT by CBM."

      We appreciate your kind recommendations and have revised “Although Mut1 (E45A) and Mut2 (Q14N) shared the same CDR-H3 sequences as WT, only minor variations were observed in the CDR-H3. H3-OPT generated accurate predictions with Cα-RMSDs of 1.510 Å, 1.541 Å and 1.411 Å for the WT, Mut1, and Mut2, respectively.” into “Although Mut1 (E45A) and Mut2 (Q14N) shared the same CDR-H3 sequences as WT (LengthCDR-H3 = 17), only minor variations were observed in the CDR-H3. H3-OPT generated accurate predictions with Cα-RMSDs of 1.510 Å, 1.541 Å and 1.411 Å for the WT, Mut1, and Mut2, respectively (The confidence scores of these AlphaFold2 predicted loops were all higher than 0.8, and these loops were accepted as the outputs of H3-OPT by CBM). ”. In addition, we have added following sentence in the legend of Figure 4 to ensure that readers can appropriately evaluate the significance and reliability of our validations: “The sequence identities of the VH domain and H3 loop are 0.816 and 0.647, respectively, comparing with the best template.”.

      (4) As pointed out in the first review, I think the work https://doi.org/10.1021/acs.jctc.1c00341 is worth acknowledging in section "2.2 Molecular dynamics (MD) simulations could not provide accurate CDR-H3 loop conformations" of supplementary material, as it constitutes a clear reference (and probably one of the few) to the MD simulations that authors pretend to perform. Similarly, the work https://doi.org/10.3390/molecules28103991 introduces a former benchmark on AI algorithms for predicting antibody and nanobody structures that readers may find interest to contrast with the present work. Indeed, this later reference is used by authors to answer a reviewer comment.

      Thanks a lot for your valuable comments. We have added these references in the proper positions in our manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript uses C. elegans as a model to interrogate the effects of autism-associated variants of previously unknown function in the RNA-binding protein RBM-26/RBM27.

      Despite its potential impact, there are several concerns related to the technical rigor and specificity of the observed effects.

      Major concerns: 1. The effects on PLM are interesting, but why was this neuron selected for study? Was this a lucky guess or are other axons also affected? It is important to clarify whether the effects of RBM-26 are specific to this neuron or act pleiotropically across many or all neurons. According to CeNGEN, rbm-26 is strongly expressed in the well-characterized neurons ASE, PVD, and HSN. Are there morphological defects in these neurons, or others? As a note, there are also functional assays for these neurons (salt sensing, touch response, and egg laying, respectively).

      We have added new data to the supplemental materials showing that loss of rbm-26 function also causes the beading phenotype in the axons and dendrites of the PVD neuron (Figure S4 and lines 196-199). We have focused on the PLM neuron because our preliminary studies indicated that it had a higher penetrance of axon defects relative to the PVD neuron. Moreover, we observed expression of endogenously tagged RBM-26 in the PLM neuron (Figure 3A-C and lines 210-215).

      Similarly, the choice of the MALSU homolog seemed like a shot in the dark. It is ranked 46th (out of 63 genes) for fold-enrichment following RBM-26 pull-down, and 9th for p-value. Were any of the mRNAs with greater fold-enrichment or smaller p-values examined further? It is important to determine whether many or all of these interacting genes are overexpressed in the absence of RBM-26 and whether they are also required for the phenotypic effects of RBM-26 mutants, or if the MALSU homolog is special.

      We have clarified our reasoning for selecting the MALS-1 ortholog of MALSU1 for further study (see lines 283-284 and Table S2). Amongst binding partners with human orthologs, MALS-1 was by far the top ranked candidate. The adjusted p-value for MALS-1 was 0.0008. The next smallest adjusted p-value was two orders of magnitude larger (0.028 for dpy-4). Moreover, the log2fold fold enrichment for MALS-1 was 1.98, about the same as the largest (ACADS with 2.13). Nonetheless, we agree that some of the other interactors may also be of interest and have thus included them in the supplemental table S2. Although these other potential binding partners are outside the scope of this study, we expect that future studies by ourselves or others may focus on the roles of these other binding partners.

      In addition to the specificity controls mentioned above, positive and negative controls are needed throughout the results. While each of these may be relatively minor by itself, as a group they raise questions about the technical rigor of the study. Briefly these include: Fig 1C. Missing loading controls and negative control (rbm-26 null allele). Additional exposures should be included to show whether RBM-26(P80L) protein or the lower band for RBM-26(L13V) are present at all, relative to the null allele.

      We have added no-stain loading controls to figure 1C. We have also switched to using ECL detection, which is much more sensitive and reveals faint bands for RBM-26(P80L) and additional faint bands for RBM-26(L13V). In addition, we have included a longer exposure for the blot (Figure S1). We are unable to test the null, as we can only produce a limited number of small maternally rescued progeny, thereby precluding western blot analysis.

      Fig 2. Controls to distinguish overextension of PLM axon from posterior mispositioning of ALM cell body are needed. Quantification of PLM axon lengths in microns (or normalized to body size) with standard deviation, not error of proportion, should be shown. Measurement of “beading phenotype” should be more rigorous, see for example the approach in Rawson et al. Curr. Biol. 2017 https://doi.org/10.1016/j.cub.2014.02.025 . The developmental stage examined, and the reason for choosing that stage, should be described for this and all figures.

      We have added new data that shows PLM axon length relative to body length for each of the RBM-26 mutants (Figure S2 and lines 183-185). These results indicate that the PLM axon has a larger axon length to body length ration, suggesting that the PLM/ALM overlap phenotype is a result of PLM axon overextension. For most experiments, we retain penetrance, as this has been standard practice in the field and allows for a much larger sample size (see examples listed below). We have also added examples of how the beading phenotype was measured (Figure S3). Moreover, we have now analyzed this phenotype and others at multiple developmental stages (Figures 2D-H and Table S1). In general, we have conducted experiments at the L3 stage because the rbm-26(null) mutants don’t survive past this stage. However, for many of our experiments we have also included additional stages as well. We have added this explanation to the methods section of phenotype analysis and also at various locations throughout the text. We have also labeled all graphs to clearly indicate the developmental stages and included.

      10.1038/s41467-019-12804-3 Article by laboratory of Brock Grill

      10.1371/journal.pgen.1002513 Article by laboratory of Ian Chin-Sang

      doi.org/10.1073/pnas.1410263111 Article by laboratory of Chun-Liang Pan

      10.1016/j.neuron.2007.07.009 Article by laboratory of Yishi Jin

      doi.org/10.1523/JNEUROSCI.5536-07.2008 Article by laboratory of William Wadsworth

      Fig 3. Controls without auxin and with neuronal TIR1 expression alone should be included. Controls demonstrating successful RBM-26 depletion, in larvae as well as in embryos at the time of PLM extension, should be included (weak embryonic depletion might explain why the overextension phenotype is only 14% instead of 40% as in the null). According to CeNGEN, rbm-26 expression in PLM is barely detected, thus depletion with a PLM-specific TIR1 should also be tested. To confirm the authors' identification of the cell marked "N" as the PLM cell body, co-expression of rbm-26 and a PLM-specific marker should be added. Rescue of the rbm-26 mutants with neuronal (and PLM-only) expression should be included to test sufficiency in PLM, and as a further control for potential artifacts of the AID system.

      We have added new data showing that an endogenously tagged RBM-26::Scarlet protein is expressed in the PLM neuron (Figure 3A-C). Moreover, we have added rescue experiments, showing that a Pmec-7::rbm-26::scarlet transgene can rescue the beading phenotype and the PLM/ALM overlap phenotype (Figure 3 F-G). We have also added controls without auxin (Figure S7) __and without the rbm-26::scarlet::aid gene (Figure S8). We have added a new figure showing auxin-mediated depletion of RBM-26::Scarlet::AID in the PLM neuron (Figure S10)__. We examined auxin-mediated depletion at the L3 stage for consistency with our auxin-mediated phenotypic experiments. Moreover, these were done at the L3 stage for consistency with other experiments that included the rbm-26(null) mutants, which don’t survive past this stage.

      In general, auxin-mediated knockdown tends to be hypomorphic in neurons. This is likely due to the fact that the neuronal TIR1 driver is expressed at much lower levels relative to the other drivers. In addition, the lower penetrance observed in auxin-mediated PLM/ALM overlap phenotype could reflect the fact that this phenotype resolves by the L4 stage in the hypomorphic mutants. For example, in P80L mutants at the L3 stage we see only about a 20% penetrance of the PLM/ALM overlap phenotype (relative to about 15% in auxin-mediated knockdown).

      Fig 4. More rigorous quantification of the distribution of mitochondria along the axon should be included, not only total number, and it should be clarified what region of the axon the images are taken from. Including the AID-depletion strain with and without auxin would further add to the sense of rigor. For the mitoTimer experiments, why is RBM-26(L13V) not included and why do wild-type values differ ~5-fold between experiments (despite error bars being almost non-existent)? A more rigorous approach to standardizing imaging conditions may be needed. Positive controls using compounds that affect oxidation should be included. Measurements of individual mitochondria with standard deviations should be shown, rather than aggregate averages with error of proportion.

      We have changed our methodology for measuring mitochondria, so that we now report the density of mitochondria in the axon (number per 100µm), (Figure 4E-F). We agree that this method is much better than counting the total number of mitochondria per axon, as it corrects for differences in body length and axon length). We also now include data for the whole axon (Figure 4E), proximal axon (Figure 4G), and distal axon (Figure 4H). These data suggest that the mitochondrial density defects occur in the proximal axon but not in the distal axon. Using the null allele, we have also examined the timing of mitochondria defects in the axon and report that the defects begin in the L1 stage and continue throughout larval development (Figure 4F). Individual datapoints have been added for all graphs in Figure 4.

      For the mitoTimer experiments (Figure 5), we have added data for L13V and have added the individual datapoints to the graph. In the prior version, the values did not differ 5-fold between experiments with the same stage, rather the different graphs were from different stages (as noted in the figure legends/main text) and the L4 stage has much more oxidation than the L2 stage. To clear this up, we have added labels to the graphs to indicate the stages for each experiment. We have also added new data, so that we now show results for the L2, L3, and L4 stages for all three rbm-26 mutants (see Figure 5C-E). We didn’t test the L1 stage because the signal was not sufficient for accurate quantitation.

      Fig 5. Additional positive and negative controls should be added, including additional rbm-26 alleles, the AID-tagged strain with and without auxin, and a rescued mutant.

      The old Figure 5 has become Figure 6 in the new version. We have added the rbm-26(L13V) allele to each experiment, (Figure 6B-D). We have also added the loading controls for the western blot along with quantification for 3 biological replicates of the western blot analysis (Figure 6D). We agree that these additions significantly strengthen the data because they show that two independent alleles of rbm-26 cause very substantial increase in the expression of mals-1 at both the mRNA and protein levels. We did not do these experiments with the rescuing transgene or with the AID-tagged strain because these experiments are done on whole worm lysates, whereas the AID-tagged and rescuing transgene are neuron-specific.

      Fig 6. Controls showing whether the Scarlet-tagged protein is functional are needed, to rule out dominant negative or toxicity-related effects.

      This is Figure 7 in the new version. For this experiment, we are showing that overexpression of MALS-1 does cause defects. The idea is that excessive amounts of MALS-1 causes deleterious effects to the mitochondria. In fact, these defects could be considered as dominant negative or toxic. We considered the possibility of crossing the Pmec-7::mals-1::scarlet transgene with rbm-26; mals-1 double mutants. However, this does not seem workable, because the single copy Pmec-7::mals-1::scarlet transgene produces the phenotypes at penetrances that are similar to what we observe in rbm-26; mals-1 double mutants. We concede that the results of the overexpression experiments in Figure 7 are limited when considered in isolation. However, we think that they are meaningful when considered in combination with the results on the mals-1;rbm-26 double mutants in Figure 8.

      Fig 8. Controls for other mitochondrial components need to be included. It is important to determine if the decrease in ribosomes is specific or reflects a general decrease in mitochondria. If there are fewer mitochondria as suggested in Fig. 4, then of course mitochondrial ribosomal protein levels are also reduced. Additional rbm-26 alleles should be included here as well. Is this effect dependent on the MALSU homolog?

      This is Figure 8D-E in the new version. We have added new data showing that the decrease in MRPL-58 expression that is caused by the rbm-26(P80L) mutation is dependent on MALS-1. We concede that these experiments cannot be used to determine anything about the mitoribosomes per se, but rather serve as an alternative way of testing the effect of rbm-26 on mitochondria. We have revised the text accordingly (lines 355-357). Given these limitations we have elected not to try additional mitochondrial markers and have also not included additional rbm-26 alleles for this experiment.

      Finally the authors should address concerns about image manipulation, which amplify the concerns about technical rigor outlined above. The image in Fig. 2A appears to have a black box placed over the lower-right portion of the field to hide some features. Black boxes also appear to have been placed over the tops of images in Fig. 4B and 4D and at the left of Fig. 6A, 6B, and 6C. While these manipulations probably do not affect the conclusions, they further undermine confidence in data integrity and experimental rigor.

      We have corrected all of these image processing errors. The box in 2A was for the purpose of squaring off a corner that was clipped during image rotation. The boxes in Figures 4 and 6 (of the prior version) were added to give space for labels (without obscuring image features). We have now used alternative methods to accomplish the same goals. For example, in Figures 4-D we have placed the labels outside of the images.

      Minor points. 1. C. elegans nomenclature conventions should be followed: - C. elegans gene names have three or four letters, thus the MALSU homolog cannot be named "malsu-1". Please have new gene names approved by WormBase BEFORE submitting for publication http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/gene_name.cgi

      We have changed malsu-1 to mals-1. In addition, both mals-1 and mrpl-58 have now been approved by wormbase and will be listed on the website upon its next update.

      • If two sequential CRISPR edits are made on the same gene then they should be listed as a compound allele, such as rbm-26(cue22cue25)

      We have updated our gene names to reflect this convention.

      • Genes on the same chromosome should not be separated with a semicolon, for example rbm-26(cue40) K12H4.2(syb6330)

      We have updated our gene names to reflect this convention.

      Describing the defects as "neurodevelopmental" is misleading in the case of axon beading or degeneration. Similarly, there is no evidence for an "axon targeting" defect as stated in the abstract.

      We have revised such that instead of referring to degeneration phenotypes as neurodevelopmental, we now refer to axon degeneration phenotypes that occur during development. For example, in the abstract we now say, “These observations reveal a mechanism that regulates expression of a mitoribosomal assembly factor to protect against axon degeneration during neurodevelopment.

      Regarding targeting defects, this was meant to refer to the misplacement of the PLM axon tip (which contains electrical synapses). However, our subsequent analysis has revealed that these defects are transient in P80L and L13V mutants, as they resolve by the L4 stage. The rbm-26 null axon development defects do not resolve, though these mutant die prior to the L4 stage. Given these findings, we have decided not to use the term of targeting defects. Instead, we now refer to this as an axon tiling defect or PLM/ALM overlap phenotype.

      In Fig. 5A, the symbol that appears to correspond to F59C6.15 (lowest p-value) is a different size than the others and is colored as ncRNA, whereas WormBase annotates this gene as snoRNA.

      This error has been corrected.

      In the Introduction, the last sentences of the first two paragraphs should be varied ("However, little is known about the [...] mechanisms that protect [...] during neurodevelopment.")

      This has been done.

      Why is RBM-26 protein running as a doublet at both sizes?

      We have improved our western blotting methodology by using 12% gel, allowing for better resolution. We have also switched from colorimetric detection to ECL detection, allowing for greater sensitivity. In our new blots, we identify 6 different RBM-26 protein bands. We don’t know the reason for these bands, but speculate that they are the result of post-translational processing (148-150).

      When showing the RBM-26 expression pattern (Fig. 3) please include a lower-magnification image of the entire animal.

      This has been done (Figure S6)

      It is confusing to refer to the RNA IP experiments as an "unbiased screen", which in C. elegans typically refers to a genetic screen.

      We now refer to this as a “biochemical screen”.

      The relationship between axon overextension, beading, and mitochondrial localization is not clear. What causal connection between these is being proposed? The causal connections between these phenotypes, if any, should be clarified experimentally. For example, if the axon extension defects develop before mitochondrial localization defects, then it is unlikely that mitochondrial defects cause axon overextension.

      We have added new data showing that the reduction in mitochondrial density within the axon begins during the L1 stage and increases throughout larval development (Figure 4F). We have also added additional data showing that the increase in mitochondrial oxidation is weak in the L2 stage and surges in the L3 stage (Figure 5C-E), coincident with the beginning of the axon degeneration phenotypes. We propose (lines 383-391) that a low level of mitochondrial defects is present in L1 larvae, giving rise to the axon tiling defects. In the L3 stage there is a surge in excessive mitochondrial oxidation, giving rise to the axon degeneration phenotypes. We have added a new section to the discussion that addresses the relationship between defects in axon development and axon degeneration (lines 375-405).

      Please explain how to interpret the difference in axon beading in the two deletion alleles of the MALSU homolog (axon beading defects in tm12122 but not in syb6330). Is syb6330 not a null allele? Or are the defects in tm12122 due to other mutations in this strain background?

      One likely reason for this difference is that tm12122 is predicted to cause a partial deletion of the mals-1 coding sequence, whereas the syb6330 is a full deletion. Thus, the tm12122 could be acting as a dominant negative. In fact, prior work on the MALSU1 ortholog has indicated that this protein is subject to interference by a dominant negative construct (see Rorbach et al, Nucleic Acids Res 2012). Nonetheless, we cannot rule out the possibility of a linked second mutation in tm12122. However, since we have found similar phenotypes and genetic interactions with both alleles, we can conclude that these phenotypes and interactions are due to loss of MALS-1, rather than a second mutation.

      Are mitochondria reduced in number or mislocalized? If they are reduced in number, is this due to altered balance of fission/fusion?

      We have adjusted our methods for quantifying mitochondria and have also analyzed the proximal vs distal axon (Figure 4). We find that the density of mitochondria is decreased in the proximal axon, but not in the distal axon. We speculate that this might reflect a higher demand on mitochondria in the proximal axon, due to a higher amount of trafficking activity in the proximal axon (lines 255-257). We propose that the loss of RBM-26 causes dysfunction in mitochondria. Since fission and fusion are mechanisms that can help to repair damaged mitochondria, it is likely that they would be involved in the phenotypes that we observe.

      In Fig. 3A-D, please keep the labels in the same position in all panels and do not alter brightness settings between single-color and merged panels.

      These images have been moved to the supplemental data section (Figure S5). We have adjusted the labels as suggested. We have not changed the brightness settings, as they were already the same in all panels. However, the blue signal in the merged panel does obscure some of the red signal, giving an appearance of an alteration in color balance.

      The claim that rbm-26 acts cell-autonomously requires PLM-specific depletion and rescue experiments.

      We have added new data indicating that a Pmec-7::rbm-26::scarlet transgene can rescue the beading phenotype (Figure 3F-G).

      **Referees cross-commenting** I appreciate the use of the consultation session to resolve differences between reviewers, but in this case I fully agree with the content and tone of all the comments from the other reviewer -- I think our remarks are very well aligned!

      Reviewer #1 (Significance (Required)):

      The study engineers autism-associated variants in conserved residues of RBM27 into the C. elegans homolog RBM-26 and identifies neuronal phenotypes potentially relevant to autism and a potential molecular mechanism involving regulation of mitochondrial ribosome assembly.

      The key claims of the study are 1} that autism-associated variants in RBM-26 decrease its protein expression; 2} that impaired RBM-26 function leads to a variety of defects in development and maintenance of a single neuron called PLM, including altered axonal localization of mitochondria; 3} that RBM-26 normally binds the mRNA for the C. elegans homolog of MALSU, a mitochondrial ribosomal assembly factor; 4} that loss of RBM-26 leads to overexpression of the MALSU homolog; and 5} that MALSU is required for some of the deleterious effects on the PLM neuron seen in RBM-26 mutants.

      This study will be of interest to the autism research community because it bolsters the idea that variants in RBM27 are likely to disrupt gene function and to affect neuronal health. It will also be of interest to the broader cell biology community because it suggests an interesting potential nucleus-to-mitochondria signaling mechanism, in which a nuclear RNA-binding protein might regulate assembly of mitochondrial ribosomes.

      My field of expertise is developmental biology in C. elegans.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript, the authors studied an ASD-associated gene, rbm-26 in neuronal morphology using the touch receptor neuron PLM in C. elegans, and found that loss-of-function rbp-27 causes overextension and the formation of bulb-like structures in the axon. Using UV-crosslinking RNA immunoprecipitation and RNA-Seq, they identify malsu-1 as a target of rbm-26. Genetic analyses suggest malsu-1 likely functions downstream of rbm-26 in controlling the PLM morphology. Major comments:

      • The authors describe RBM27 is associated with ASD and ID while they only cite SFARI paper that describes a weak association of RBM27 to ASD. The appropriate referenced that show link between RBM27 and ID should be provided. The link with ID was an error. We had meant to say “ASD or other neurodevelopmental disorders.” This has been corrected.

      • SFARI database only has three (P79L, R190Q, G348D) mutations listed as ASD-associated. Where are other mutations L13V and R455H, particularly L13V that the authors used to generate the C. elegans mutant come from? Are they associated with intellectual disabilities? The others came from the devovo-DB. We have added a reference for this database and have also added the primary source references for each of the five de novo variants (see line 121).

      • The authors should be very careful when describing 'gene X causes Y diseases'. Many (if not all) of the examples described in this manuscript are disease-associated genes without validation to be causal genes. We have revised accordingly. For example on lines 433-435, we now say,” For example, mutations in the EXOSC3, EXOSC8 and EXOSC9 are thought to cause syndromes that include defects in brain development such as hypoplasia of the cerebellum and the corpus callosum”. We have decided to use the phrase “thought to cause” because three of the five referenced articles on these genes use titles that indicate causation.

      • The authors refer PLM axon beading and overextension phenotypes to 'axon degeneration and targeting defects'. The authors must provide additional evidence of axon degeneration (see below). Also the term 'targeting defects' is misleading as the authors did not examine if overextension of the PLM axon causes targeting defects. At least they should examine some synaptic markers. To provide more evidence of degeneration we have analyzed several additional phenotypes at multiple developmental stages (Figure 2 and Table S1). Regarding targeting defects, this was meant to refer to the misplacement of the PLM axon tip (which contains electrical synapses). However, our subsequent analysis has revealed that these defects are transient in P80L and L13V mutants, as they resolve by the L4 stage. The rbm-26 null axon development defects do not resolve, though these mutant die prior to the L4 stage. Given these findings, we have decided not to use the term of targeting defects. Instead, we now refer to this as an axon tiling defect or PLM/ALM overlap phenotype.

      • Neuronal phenotypes (axon overextension and beading) should be examined at different developmental timepoints (larval, young adult, and aged animals) to test if these phenotypes are indeed degenerative instead of developmental defects. We have included new data to observe all of these phenotypes at multiple developmental time points (Figure 2 and Table S1).

      • The authors use the blebbing (beading) phenotype in the axon as the sole evidence of neurodegenerative properties of the PLM neuron. A more thorough analysis of this phenotype as done by others (Pan PNAS 2006) must be provided to support the authors' claim that this phenotype represents neurodegeneration. We have included new data on multiple degenerative phenotypes in axons including: blebbing, beading, waviness and breaks (Table S1).

      • The number of beads per axon should be quantified to better represent the severity of rbm-26 mutant. Individual samples should be plotted in the quantification instead of showing the percentage of animals. We have added data on the density of beads in rbm-26(null), rbm-26(P80L), and rbm-26(L13V) mutants (Figure S3). For most experiments we have decided to use penetrance to measure axon degeneration because this is a standard in the field and allows for a larger sample size. For examples please see:

      10.1523/JNEUROSCI.1494-11.2012 (Toth et al, 2012)

      https://doi.org/10.1016/j.cub.2014.02.025 (Rawson et al, 2014)

      10.1073/pnas.1011711108 (Pan et al, 2012)

      https://doi.org/10.7554/eLife.80856 (Czech et al, 2023)

      https://doi.org/10.1016/j.celrep.2016.01.050 (Nichols et al, 2016)

      • Based on the single gel image in Fig. 1C with no loading control, the P80L mutant appears to have no protein expression. How is the P80L viable while the null mutant is lethal? The authors should quantify the protein expression levels from multiple blots with proper loading controls. If P80L mutation is introduced into RBM-26::mScarlet strain can it cause depletion of the signal in vivo? We have added new data showing that the RBM-26::Scarlet signal is diminished by the P80L mutation in vivo (Figure 1E-F). We have also added quantification from 3 biological replicate blots (Figure 1D). Finally, we have improved the sensitivity of our blots by using ECL detection and also show various exposures to highlight the fainter bands (Figures 1C and S1). Therefore, we are now able to detect low level expression of RBM-26(P80L) mutant protein. It is likely that the low level of RBM-26(P80L) and RBM-26(L13V) seen on western blots is sufficient to prevent the lethal phenotype.

      • 'Moreover, loss of either the SPTBN1 or ADD1 genes causes a neurodevelopmental syndrome that includes autism and ADHD' References are missing, and as described above, be extra careful when indicating causality. Very few genes are known to cause ASD and ADHD. We have added the citations for this work (line 81). We also note that the titles for both of the cited articles indicate causation. To be on the safe side we have revised this line to say, “Moreover, loss of either the SPTBN1 or ADD1 genes are thought to cause a neurodevelopmental syndrome that includes autism and ADHD”

      • Fig. 3E F, the authors should use the strains that express TIR1 specifically in the touch receptor neurons to argue cell autonomous function of RBM-26. Alternatively, the authors may conduct PLM neuron-specific rescue experiments to test the sufficiency. We have added new data indicating that a Pmec-7::rbm-26::scarlet transgene can rescue the beading phenotype and the PLM/ALM overlap phenotype (see Figure 3F-G).

      • 'Loss of RBM-26 causes mitochondria dysfunction in axons.' The authors did not examine mitochondria function in axons. They only examined the number of mitochondria, and ROS production in the soma. The authors should provide additional evidence to support the idea that elevated ROS production in the soma is due to mitochondrial dysfunction in axons. Also, the authors should use both P80L and L13V for this experiment, and indicate individual datapoint as dots. Here, they quantified at the L4 stage, which the authors should justify. We have added the L13V data to this experiment and now show the individual data points. In addition, we have now conducted this analysis at the L2, L3 and L4 stages (Figure 5C-E). We have also revised the text to indicate that loss of rbm-26 function causes mitochondrial dysfunction in the cell body which could potentially cause a reduction of mitochondria in the axon (see lines 100-101 and 268-270). We speculate that mitochondria in the axon are also dysfunctional. However, the mitoTimer signal is not bright enough in axons to allow for quantification.

      • Figure 5B and C: the authors should also use L13V to quantify malsu-1 mRNA and protein level, and include quantifications in panel C (from multiple blots). This is Figure 6 in the new version. We have added new data for expression of mals-1 mRNA and protein in rbm-26(L13V) mutants (Figure 6B-D). We have also included quantifications from 3 biological replicates (Figure 6D).

      • In the rbm-26 mutant, the number of mitochondria is reduced, while the amount of MALSU-1 protein is increased. If MALSU-1 is specifically localized at mitochondria in wild type, where does the excessive MALSU-1 go in the rbm-26 mutants? Quantification of MALSU-1 signal intensity should be provided. Our Pmec-7::mals-1::scarlet transgene uses the tbb-2 3’UTR and causes an overexpression phenotype. To address the question posed by the reviewer, we would need to express MALS-1 at endogenous levels. Given that endogenous levels of MALS-1 are very low, it is unlikely that we would be able to visualize its expression. Nonetheless, as a way to address this question we have attempted to create a single copy Pmec-7::mals-1::scarlet transgene that utilizes the mals-1 endogenous 3’UTR. We have tried multiple approaches for generating this construct, but all have failed, likely due to sequence complexities within the mals-1 3’UTR. While we cannot say where the extra MALS-1 protein goes, we think that it is likely overloaded into the remaining mitochondria and could also be in the cytosol as well.

      • Figure 7C: malsu-1 knockout mutants exhibit PLM overextension phenotype, which is not consistent with their model. The authors should discuss this in detail. We have added a paragraph to the discussion explaining that mitochondria function could be disrupted by either MALS-1 overexpression or by MALS-1 loss of function (lines 471-480).

      • 'To validate these findings, we also repeated these experiments with an independent allele of malsu-1, malsu-1(tm12122) and found similar results (Fig. 7A-C).' The malsu-1(tm12122) exhibits beading phenotype and more severe overextension phenotype which the authors must describe and discuss more carefully. One likely reason for this difference is that tm12122 is predicted to cause a partial deletion of the mals-1 coding sequence, whereas the syb6330 is a full deletion. Thus, the tm12122 could be acting as a dominant negative. In fact, prior work on the MALSU1 ortholog has indicated that this protein is subject to interference by a dominant negative construct (see Rorbach et al, Nucleic Acids Res 2012). Nonetheless, we cannot rule out the possibility of a linked second mutation in tm12122. However, since we have found similar phenotypes and genetic interactions with both alleles, we can conclude that these phenotypes and interactions are due to loss of MALS-1, rather than a second mutation (albeit at a slightly different penetrance). We have added these considerations to the results section (lines 342-244).

      • Figure 8: The authors should include data from L13V, malsu-1 and rbm-26; malsu-1 mutants. Quantification from multiple blots should be provided. This is Figure 8D in the new version. We have added the malsu-1 and rbm-26;malsu-1 double mutants to this experiment. We have also added quantification from multiple biological replicate blots. As pointed out by the other reviewer, we think that this experiment does not give specific information about mitoribosomes, but is an alternative approach to looking at the reduction in mitochondria. Given this limitation and considering that we have added L13V data to the mitochondria experiment in Figure 8B, we have elected not to add additional data on L13V to the western blot experiment in Figure 8D

      Minor comments: • 'Consistent with a role for mitochondria in neurodevelopmental disorders, some of these disorders include a neurodegenerative phenotype.' Why is it consistent to have neurodegenerative phenotypes if mitochondria is associated with neurodevelopmental disorders? A better explanation would help.

      We have changed this sentence to, “Some neurodevelopmental syndromes feature neurodegenerative phenotypes that occur during neuronal development.”

      • L13V is generally more severe in axon overextension phenotype than P80L while protein level is more abundant. The authors should discuss about this. We have also added a time course for the PLM/ALM overlap phenotype mutants (Figure 2D). This new data shows that the PLM/ALM overlap is quite similar overall between the P80L and L13V mutants. Both of these mutations cause an increase in PLM/ALM overlap in early larval development that is resolved by the L4 stage. The P80L phenotype resolves slightly sooner for reasons that are unknown. This could reflect differences in expression within the PLM that are not reflected in the whole worm lysate. This could also be due to a slight difference in the genetic background or other stochastic factors. The key point is that these two independent alleles cause similar phenotype overall, indicating that this phenotype is the result of loss in RBM-26 function.

      • Fig. 2E, F: 'Beading refers to focal enlargement or bubble-like lesions which were at least twice the diameter of the axon in size.' How are the diameters of axons measured? A more detailed quantification method, and examples of measurement should be provided. We have added example measurements to the supplemental section (Figure S3). Additional detail on the measurements are in the Methods section (lines 517-518).

      • Figure 3: The authors should also include low-magnification images to show where RBM-26 is expressed. The current image does now allow identifying cells. The transgene that labels the nuclei of hypodermis should be indicated in the manuscript. Specifically, the expression of the RBM-26 in the PLM should be shown. We have added a low magnification image (Figure S6) and have also added images of endogenously tagged RBM-26:Scarlet in the PLM (Figure 3A-C). The transgenic label for the hypodermis has been added to the legend of Figure S5.

      • Figure 3: 'Tissue specific degradation of RBM-26::SCARLET::AID was achieved due to cell-type specific TIR-1 driver lines (see methods for details).' This information is not provided in the method section. This information has been added to methods section, ”Auxin proteindegredation”

      • Fig. 4 E. Values from individual samples should be indicated as dots. Representative images of P80L and L13V should be included. Conduct quantifications at adult stage as the authors use in other quantifications, or justify use of specific developmental stage (L3) they used. Figure 4 has become Figures 4 and 5 in the revised version. We have updated the graphs to include dots for individual data points. We have added quantifications of the mitoTImer experiments for the L2, L3 and L4 stages (Figure 5C-E). We note that our other experiments were done at the L1, L2, L3 and L4 and adult stages. The mitoTimer signal is not sufficient at the L1 stage for quantification. At the adult stage, the red signal becomes saturated. We have added representative images for mitoTimer in P80L and L13V mutants (Figure S9).

      • The genes malsu-1 and mrpl-58 are not listed on wormbase. If the authors would like to designate names to these gene, they should clearly indicate that along with the sequence name. We have changed malsu-1 to mals-1. In addition, both mals-1 and mrpl-58 have now been approved by wormbase and will be listed on the website upon its next update.

      • The authors found that MRPL-58 amount is reduced in rbm-26 mutants (which require additional verifications). This can be explained by the fact that axonal mitochondria number is reduced in the rbm-26 mutants. How did the authors confirm that the reduction in MRPL-58 level is due to the disruption of mitoribosome assembly? This is Figure 8D-E in the new version. We have added new data showing that the decrease in MRPL-58 expression that is caused by the rbm-26(P80L) mutation is dependent on MALS-1. We concede that these experiments cannot be used to determine anything about the mitoribosomes per se, but rather serve as an alternative way of testing the effect of rbm-26 on mitochondria. We have revised the text accordingly (lines 355-357).

      • 'MALSU-1 is a mitoribosomal assembly factor that functions as part of the MALSU1:LOR8F8:mtACP anti-association module [37-39].' I don't think these are known for C. elegans MALSU-1. We have revised to, “MALS-1 is an ortholog of the MALSU1 mitoribosomal assembly factor that functions as part of the MALSU1:LOR8F8:mtACP anti-association module”

      • 'Moreover, our results also suggest that disruption of this process can give rise to neurodevelopmental disorders.' I feel this is a quite a bit of stretch.

      This has been replaced with, “Therefore, we speculate that human RBM26/27 could function with the RNA exosome complex to protect against neurodevelopmental defects and axon degeneration in infants.” (lines 371-373)

      **Referees cross-commenting** Yes, many of our comments overlap, and I fully agree with all comments from the other reviewer too. Reviewer #2 (Significance (Required)):

      I found the manuscript interesting particularly the use of innovative techniques in identifying the target of RBM-26, The genetic analyses of rbm-26 and malsu-1 generally support the authors main conclusions that rbm-26 inhibits malsu-1 and be of potential interest to basic neuroscientists and cell biologists. However, the current manuscript looked premature which made my reading experience less pleasant. The phenotypic analyses is superficial compared to works similar to this work, which are insufficient to support the authors' claim of 'axon degeneration and targeting defects'. A number of issues listed above should be addressed before this manuscript is published. The reviewer's expertise: neurodevelopment in model organisms.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      In this manuscript, the authors studied an ASD-associated gene, rbm-26 in neuronal morphology using the touch receptor neuron PLM in C. elegans, and found that loss-of-function rbp-27 causes overextension and the formation of bulb-like structures in the axon. Using UV-crosslinking RNA immunoprecipitation and RNA-Seq, they identify malsu-1 as a target of rbm-26. Genetic analyses suggest malsu-1 likely functions downstream of rbm-26 in controlling the PLM morphology.

      Major comments:

      • The authors describe RBM27 is associated with ASD and ID while they only cite SFARI paper that describes a weak association of RBM27 to ASD. The appropriate referenced that show link between RBM27 and ID should be provided.
      • SFARI database only has three (P79L, R190Q, G348D) mutations listed as ASD-associated. Where are other mutations L13V and R455H, particularly L13V that the authors used to generate the C. elegans mutant come from? Are they associated with intellectual disabilities?
      • The authors should be very careful when describing 'gene X causes Y diseases'. Many (if not all) of the examples described in this manuscript are disease-associated genes without validation to be causal genes.
      • The authors refer PLM axon beading and overextension phenotypes to 'axon degeneration and targeting defects'. The authors must provide additional evidence of axon degeneration (see below). Also the term 'targeting defects' is misleading as the authors did not examine if overextension of the PLM axon causes targeting defects. At least they should examine some synaptic markers.
      • Neuronal phenotypes (axon overextension and beading) should be examined at different developmental timepoints (larval, young adult, and aged animals) to test if these phenotypes are indeed degenerative instead of developmental defects.
      • The authors use the blebbing (beading) phenotype in the axon as the sole evidence of neurodegenerative properties of the PLM neuron. A more thorough analysis of this phenotype as done by others (Pan PNAS 2006) must be provided to support the authors' claim that this phenotype represents neurodegeneration.
      • The number of beads per axon should be quantified to better represent the severity of rbm-26 mutant. Individual samples should be plotted in the quantification instead of showing the percentage of animals.
      • Based on the single gel image in Fig. 1C with no loading control, the P80L mutant appears to have no protein expression. How is the P80L viable while the null mutant is lethal? The authors should quantify the protein expression levels from multiple blots with proper loading controls. If P80L mutation is introduced into RBM-26::mScarlet strain can it cause depletion of the signal in vivo?
      • 'Moreover, loss of either the SPTBN1 or ADD1 genes causes a neurodevelopmental syndrome that includes autism and ADHD' References are missing, and as described above, be extra careful when indicating causality. Very few genes are known to cause ASD and ADHD.
      • Fig. 3E F, the authors should use the strains that express TIR1 specifically in the touch receptor neurons to argue cell autonomous function of RBM-26. Alternatively, the authors may conduct PLM neuron-specific rescue experiments to test the sufficiency.
      • 'Loss of RBM-26 causes mitochondria dysfunction in axons.' The authors did not examine mitochondria function in axons. They only examined the number of mitochondria, and ROS production in the soma. The authors should provide additional evidence to support the idea that elevated ROS production in the soma is due to mitochondrial dysfunction in axons. Also, the authors should use both P80L and L13V for this experiment, and indicate individual datapoint as dots. Here, they quantified at the L4 stage, which the authors should justify.
      • Figure 5B and C: the authors should also use L13V to quantify malsu-1 mRNA and protein level, and include quantifications in panel C (from multiple blots).
      • In the rbm-26 mutant, the number of mitochondria is reduced, while the amount of MALSU-1 protein is increased. If MALSU-1 is specifically localized at mitochondria in wild type, where does the excessive MALSU-1 go in the rbm-26 mutants? Quantification of MALSU-1 signal intensity should be provided.
      • Figure 7C: malsu-1 knockout mutants exhibit PLM overextension phenotype, which is not consistent with their model. The authors should discuss this in detail.
      • 'To validate these findings, we also repeated these experiments with an independent allele of malsu-1, malsu-1(tm12122) and found similar results (Fig. 7A-C).' The malsu-1(tm12122) exhibits beading phenotype and more severe overextension phenotype which the authors must describe and discuss more carefully.
      • Figure 8: The authors should include data from L13V, malsu-1 and rbm-26; malsu-1 mutants. Quantification from multiple blots should be provided.

      Minor comments:

      • 'Consistent with a role for mitochondria in neurodevelopmental disorders, some of these disorders include a neurodegenerative phenotype.' Why is it consistent to have neurodegenerative phenotypes if mitochondria is associated with neurodevelopmental disorders? A better explanation would help.
      • L13V is generally more severe in axon overextension phenotype than P80L while protein level is more abundant. The authors should discuss about this.
      • Fig. 2E, F: 'Beading refers to focal enlargement or bubble-like lesions which were at least twice the diameter of the axon in size.' How are the diameters of axons measured? A more detailed quantification method, and examples of measurement should be provided.
      • Figure 3: The authors should also include low-magnification images to show where RBM-26 is expressed. The current image does now allow identifying cells. The transgene that labels the nuclei of hypodermis should be indicated in the manuscript. Specifically, the expression of the RBM-26 in the PLM should be shown.
      • Figure 3: 'Tissue specific degradation of RBM-26::SCARLET::AID was achieved due to cell-type specific TIR-1 driver lines (see methods for details).' This information is not provided in the method section.
      • Fig. 4 E. Values from individual samples should be indicated as dots. Representative images of P80L and L13V should be included. Conduct quantifications at adult stage as the authors use in other quantifications, or justify use of specific developmental stage (L3) they used.
      • The genes malsu-1 and mrpl-58 are not listed on wormbase. If the authors would like to designate names to these gene, they should clearly indicate that along with the sequence name.
      • The authors found that MRPL-58 amount is reduced in rbm-26 mutants (which require additional verifications). This can be explained by the fact that axonal mitochondria number is reduced in the rbm-26 mutants. How did the authors confirm that the reduction in MRPL-58 level is due to the disruption of mitoribosome assembly?
      • 'MALSU-1 is a mitoribosomal assembly factor that functions as part of the MALSU1:LOR8F8:mtACP anti-association module [37-39].' I don't think these are known for C. elegans MALSU-1.
      • 'Moreover, our results also suggest that disruption of this process can give rise to neurodevelopmental disorders.' I feel this is a quite a bit of stretch.

      Referees cross-commenting Yes, many of our comments overlap, and I fully agree with all comments from the other reviewer too.

      Significance

      I found the manuscript interesting particularly the use of innovative techniques in identifying the target of RBM-26, The genetic analyses of rbm-26 and malsu-1 generally support the authors main conclusions that rbm-26 inhibits malsu-1 and be of potential interest to basic neuroscientists and cell biologists. However, the current manuscript looked premature which made my reading experience less pleasant. The phenotypic analyses is superficial compared to works similar to this work, which are insufficient to support the authors' claim of 'axon degeneration and targeting defects'. A number of issues listed above should be addressed before this manuscript is published.

      The reviewer's expertise: neurodevelopment in model organisms.

    1. Author response:

      Public Reviews:

      We thank the reviewers for their overall positive assessments and constructive feedback

      Reviewer #1 (Public Review):

      Summary:

      The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.

      Strengths:

      The study brings kangaroo locomotion biomechanics into the 21st century. It is a remarkably difficult project to accomplish. There is excellent attention to detail, supported by clear writing and figures.

      Weaknesses:

      The authors oversell their findings, but the mystery still persists.

      The manuscript lacks a big-picture summary with pointers to how one might resolve the big question.

      General Comments

      This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".

      Thank you for the kind words

      This study is certainly a hop towards solving the problem. But, the title of the paper overpromises and the authors present little attempt to provide an overview of the remaining big issues.

      We will modify the title to reflect this comment.  

      The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They clearly show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds.

      Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid.

      The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid.

      Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront.

      Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach supports the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. Therefore, I think that it is incumbent on the present authors to clarify that this study has still not tied up the metabolic energetics across speed problems and placed a bow atop the package.

      Fortunately, I am confident that the impressive collective brain power that comprises this author list can craft a paragraph or two that summarizes these ideas and points out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics (perhaps ala' Umberger et al.). Or perhaps they have other ideas about how they can really solve the problem.

      You have raised important points, thank you for this feedback. We will add a paragraph discussing the limitations of our study and ensure the revised manuscript makes it clear which mysteries remain. We intend to address muscle forces, contact time, and energetics in future work when we have implemented all hindlimb muscles within the musculoskeletal model.  

      I have a few issues with the other half of this study (i.e. animal size effects). I would enjoy reading a new paragraph by these authors in the Discussion that considers the evolutionary origins and implications of such small safety factors. Surely, it would need to be speculative, but that's OK.

      We will integrate this into the discussion.

      Reviewer #2 (Public Review):

      Summary

      This is a fascinating topic that has intrigued scientists for decades. I applaud the authors for trying to tackle this enigma. In this manuscript, the authors primarily measured hopping biomechanics data from kangaroos and performed inverse dynamics.

      While these biomechanical analyses were thorough and impressively incorporated collected anatomical data and an Opensim model, I'm afraid that they did not satisfactorily address how kangaroos can hop faster and not consume more metabolic energy, unique from other animals.

      Noticeably, the authors did not collect metabolic data nor did they model metabolic rates using their modelling framework. Instead, they performed a somewhat traditional inverse dynamics analysis from multiple animals hopping at a self-selected speed.

      We aimed to provide a joint-level explanation, but we will address the limitations of not modelling the energy consumers themselves (the skeletal muscles) in the revised manuscript. We plan to expand upon muscle level energetics in the future with a more detailed MSK model.

      Within these analyses, the authors largely focused on ankle EMA, discussing its potential importance (because it affects tendon stress, which affects tendon strain energy, which affects muscle mechanics) on the metabolic cost of hopping. However, EMA was roughly estimated (CoP was fixed to the foot, not measured)…

      As noted in our methods, EMA was not calculated from a fixed centre of pressure (CoP). We did fix the medial-lateral position, owing to the fact that both feet contacted the force plate together, but the anteroposterior movement of the CoP was recorded by the force plate and thus allowed to move. We report the movement (or lack of movement) in our results. The anterior-posterior axis is the most relevant to lengthening or shortening the distance of the ‘out-lever’ R, and thereby EMA.

      It is necessary to assume fixed medial-lateral position because a single force trace and CoP is recorded when two feet land on the force plate. The medial-lateral forces on each foot cancel out so there is no overall medial-lateral movement if the forces are symmetrical (e.g. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials so that the anterior-posterior movement of the CoP would be reliable.

      and did not detectibly associate with hopping speed (see results).

      Yet, the authors interpret their EMA findings as though it systematically related with speed to explain their theory on how metabolic cost is unique in kangaroos vs. other animals.

      Indeed, the relationship between R and speed (and therefore EMA and speed) was not significant. However, the significant change in ankle height with speed, combined with no systematic change in COP at midstance, demonstrates that R would get longer at faster speeds. If we consider the nonsignificant relationship between R and speed to indicate that there is no change in R, then these two results conflict. We could not find a flaw in our methods, so instead concluded that the nonsignificant relationship between R and speed may be due to a small change in R being undetectable in our data. Taking both results into account, we think it is more likely that there is a non-detectable change in R, rather than no change in R with speed, but we presented both results for transparency.

      These speed vs. biomechanics relationships were limited by comparisons across different animals hopping at different speeds and could have been strengthened using repeated measures design.

      There is significant variation in speed within individuals, not just between individuals. The preferred speed of kangaroos is 2-4.5 m/s, but most individuals show a wide range within this. Eight of our 16 kangaroos had a maximum speed that was between 1-2m/s faster than their slowest trial. Repeated measures of these eight individuals comprises 78 out of the 100 trials.

      It would be ideal to collect data across the full range of speeds for all individuals, but it is not feasible in this type of experimental setting. Interference such as chasing is dangerous to kangaroos as they are prone to strong adverse reactions to stress.

      There are also multiple inconsistencies between the authors' theory on how mechanics affect energetics and the cited literature, which leaves me somewhat confused and wanting more clarification and information on how mechanics and energetics relate.

      We will ensure that this is clearer in the revised manuscript.

      My apologies for the less-than-favorable review, I think that this is a neat biomechanics study - but am unsure if it adds much to the literature on the topic of kangaroo hopping energetics in its current form.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study is to understand how, unlike other mammals, kangaroos are able to increase hopping speed without a concomitant increase in metabolic cost. They use a biomechancial analysis of kangaroo hopping data across a range of speeds to investigate how posture, effective mechanical advantage, and tendon stress vary with speed and mass. The main finding is that a change in posture leads to increasing effective mechanical advantage with speed, which ultimately increases tendon elastic energy storage and returns via greater tendon strain. Thus kangaroos may be able to conserve energy with increasing speed by flexing more, which increases tendon strain.

      Strengths:

      The approach and effort invested into collecting this valuable dataset of kangaroo locomotion is impressive. The dataset alone is a valuable contribution.

      Thank you!

      Weaknesses:

      Despite these strengths, I have concerns regarding the strength of the results and the overall clarity of the paper and methods used (which likely influences how convincingly the main results come across).

      (1) The paper seems to hinge on the finding that EMA decreases with increasing speed and that this contributes significantly to greater tendon strain estimated with increasing speed. It is very difficult to be convinced by this result for a number of reasons:

      • It appears that kangaroos hopped at their preferred speed. Thus the variability observed is across individuals not within. Is this large enough of a range (either within or across subjects) to make conclusions about the effect of speed, without results being susceptible to differences between subjects?

      Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speed within the bounds of what kangaroos are capable of (up to 12 m/s), but for the range we did measure (~2-4.5 m/s), there is variation hopping speed within each individual kangaroo. Out of 16 individuals, eight individuals had a difference of 1-2m/s between their slowest and fastest trials, and these kangaroos accounted for 78 out of 100 trials. Of the remainder, six individuals had three for fewer trials each, and two individual had highly repeatable speeds (3 out of 4, and 6 out of 7 trials were within 0.5 m/s). We will ensure this is clear in the revised manuscript.

      In the literature cited, what was the range of speeds measured, and was it within or between subjects?

      For other literature, to our knowledge the highest speed measured is ~9.5m/s (see supplementary Fig1b) and there were multiple measures for several individuals (see methods Kram & Dawson 1998).

      • Assuming that there is a compelling relationship between EMA and velocity, how reasonable is it to extrapolate to the conclusion that this increases tendon strain and ultimately saves metabolic cost?

      They correlate EMA with tendon strain, but this would still not suggest a causal relationship (incidentally the p-value for the correlation is not reported).

      We will add supporting literature on the relationship between metabolic cost and tendon stress (or strain), to elaborate on why the correlation between EMA and stress is important.

      Tendon strain could be increasing with ground reaction force, independent of EMA.

      Even if there is a correlation between strain and EMA, is it not a mathematical necessity in their model that all else being equal, tendon stress will increase as ema decreases? I may be missing something, but nonetheless, it would be helpful for the authors to clarify the strength of the evidence supporting their conclusions.

      Yes, GRF also contributes to the increase in tendon stress in the mechanism we propose. We have illustrated this in Fig 6, however we will make this clearer in the revised discussion.

      • The statistical approach is not well-described. It is not clear what the form of the statistical model used was and whether the analysis treated each trial individually or grouped trials by the kangaroo. There is also no mention of how many trials per kangaroo, or the range of speeds (or masses) tested.

      The methods include the statistical model with the variables that we used, as well as the kangaroo masses (13.7 to 26.6 kg, mean: 20.9 ± 3.4 kg). We will move the range of speeds from the supplementary material to the results or figure captions. We will add information on the number of trials per kangaroo to the methods.

      We did not group the data e.g. by using an average speed per individual for all their trials, or by comparing fast to slow groups (this was for display purposes in our figures, which we will make clearer in the methods).

      Related to this, there is no mention of how different speeds were obtained. It seems that kangaroos hopped at a self-selected pace, thus it appears that not much variation was observed. I appreciate the difficulty of conducting these experiments in a controlled manner, but this doesn't exempt the authors from providing the details of their approach.

      • Some figures (Figure 2 for example) present means for one of three speeds, yet the speeds are not reported (except in the legend) nor how these bins were determined, nor how many trials or kangaroos fit in each bin. A similar comment applies to the mass categories. It would be more convincing if the authors plotted the main metrics vs. speed to illustrate the significant trends they are reporting.

      Thank you for this comment. The bins are used only for display purposes and not within the analysis. In the revised manuscript, we will ensure this is clear.

      (2) The significance of the effects of mass is not clear. The introduction and abstract suggest that the paper is focused on the effect of speed, yet the effects of mass are reported throughout as well, without a clear understanding of the significance. This weakness is further exaggerated by the fact that the details of the subject masses are not reported.

      Indeed, the primary aim of our study was to explore the influence of speed, given the uncoupling of energy from hopping speed in kangaroos. We included mass to ensure that the effects of speed were not driven by body mass (i.e.: that larger kangaroos hopped faster).  

      (3) The paper needs to be significantly re-written to better incorporate the methods into the results section. Since the results come before the methods, some of the methods must necessarily be described such that the study can be understood at some level without turning to the dedicated methods section. As written, it is very difficult to understand the basis of the approach, analysis, and metrics without turning to the methods.

      We agree, and in the revised manuscript will incorporate some of the methodological details within the results.

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews. 

      eLife assessment<br /> This important manuscript follows up on previous findings from the same lab supporting the idea that deficits in learning due to enhanced synaptic plasticity are due to saturation effects. Compelling evidence is presented that behavioral learning deficits associated with enhanced synaptic plasticity in a transgenic mouse model can be rescued by manipulations designed to reverse the saturation of synaptic plasticity. In particular, the finding that a previously FDA-approved therapeutic can rescue learning could provide new insights for biologists, psychologists, and others studying learning and neurodevelopment.

      eLife assessment, Significance of findings

      This valuable manuscript follows up on previous findings from the same lab supporting the idea that deficits in learning due to enhanced synaptic plasticity are due to saturation effects. 

      According to the eLife criteria for assessing significance, the “valuable” assessment indicates “findings that have theoretical or practical implications for a subfield.” We have revised the manuscript to emphasize the “theoretical and practical implications beyond a single subfield” which “substantially advance our understanding of major research questions”, with “profound implications” and the potential for “widespread influence,” the eLife criteria for a designation of “landmark” significance.   

      The most immediate implications of our results are for the two major neuroscience subfields of cerebellar research and autism research. However, as recognized by Reviewer 2, the implications are much broader than that: “the finding that a previously FDA-approved therapeutic can rescue learning could provide important new insights for biologists, psychologists, and others studying learning and neurodevelopment.” We have substantially revised the Discussion section of the manuscript to more explicitly lay out how the central idea of our manuscript-- that the capacity for learning at any given moment is powerfully influenced by dynamic, activity- and plasticity-dependent changes in the threshold for synaptic plasticity over short timescales of tens of minutes to hours --has implications for scientific thinking and experiments on plasticity and learning throughout the brain, as well as clinical practice for a wide array of brain disorders associated with altered plasticity and learning impairment. 

      To emphasize the broad conceptual implications of our research, we have reframed our conclusions in terms of metaplasticity rather than saturation of plasticity throughout the revised manuscript. In our previous submission, we had used the “saturation “ terminology for continuity with our previous NguyenVu et al 2017 eLife paper, and mentioned the related idea of threshold metaplasticity in a single sentence: “Similarly, the aberrant recruitment of LTD before training may lead, not to its saturation per se, but to some other kind of reduced availability, such as an increased threshold for its induction (Bienenstock, Cooper, and Munro, 1982; Leet, Bear, and Gaier, 2022).” However, we now appreciate that metaplasticity is a more general conceptual framework for our findings, and therefore emphasize this concept in the revised manuscript, while still making the conceptual link with the “saturation” idea presented in NguyenVu et al 2017 (lines 236-238). 

      The concept of a sliding threshold for synaptic plasticity (threshold metaplasticity) was proposed four decades ago by Bienenstock, Cooper and Munro (1982) as a mechanism for countering an instability inherent in Hebbian plasticity whereby correlated pre- and post-synaptic activity strengthens a synapse, which leads to an increase in correlated activity, which in turn leads to further strengthening. To counter this, BCM proposed a sliding threshold whereby increases in neural activity increase the threshold for LTP and decreases in activity decrease the threshold for LTP, thereby providing a mechanism for stabilizing firing rates and synaptic weights. This BCM sliding threshold model has been highly influential in theoretical and computational neuroscience, but experimental evidence for whether and how such a mechanism functions in vivo has been quite limited.  

      Our work extends the previous, limited experimental evidence for a BCM-like sliding threshold in vivo in several significant ways, which we now discuss in the revised manuscript:

      First, we analyze threshold metaplasticity at synapses where the plasticity is not Hebbian and lacks the inherent instability that inspired the BCM model. The synapses onto cerebellar Purkinje cells have been described as “anti-Hebbian” because the associative form of plasticity is synaptic LTD of excitatory inputs. This anti-Hebbian associative plasticity lacks the instability inherent in Hebbian plasticity. Moreover, a BCM-like sliding threshold that increases the threshold for associative LTD with increased firing rates and decreases threshold for LTD with decreased firing rates would tend to oppose rather than support the stability of firing rates, nevertheless we find evidence for this in our experimental results. Thus, for cerebellar LTD, the central function of the sliding threshold may not be the stabilization of firing rates, but rather to limit plasticity in order to suppress the overwrite of new memories or to allocate different memories to the synapses of different Purkinje cells. 

      Second, we analyze the influence of a BCM-like sliding threshold for plasticity on behavioral learning. Most previous evidence for the BCM model in vivo has derived from studies of the effects of sensory deprivation (e.g., monocular occlusion) on the functional connectivity of sensory circuits (Kirkwood et al., 1996; Desai et al. 2002; Fong et al., 2021) rather than on learning per se.  

      Third, our results provide evidence for major changes in the threshold for plasticity over short time scales and with more subtle manipulations of neural activity than used in previous studies, with practical implications for clinical application. Previously, metaplasticity has been demonstrated with sensory deprivation over multiple days (Kirkwood et al., 1996; Desai et al. 2002) or with drastic changes in neural activity, such as with TTX in the retina (Fong et al, 2021), TMS (Hamada et al 2008), or high frequency electrical stimulation in vitro (Holland & Wagner 1998; Montgomery & Madison 2002) or in vivo (Abraham et al 2001). In contrast, we provide evidence for metaplasticity induced by 30 min of behavioral manipulation (pre-training) and by the relatively subtle pharmacological manipulation of activity with systemic administration of diazepam, a drug approved for humans. Thus, our work contributes not only conceptually to understanding the function of threshold metaplasticity in vivo, but also offers practical observations that could pave the way for novel therapeutic interventions.  

      Fourth, whereas efforts to enhance plasticity and learning have largely focused on increasing the excitability of neurons during learning to help cross the threshold for plasticity (e.g., Albergaria et al., 2018; Yamaguchi et al., 2020; Le Friec et al., 2017), we take the opposite, somewhat counterintuitive approach of inhibiting the excitability of neurons during a period before learning to reset the threshold for plasticity to a state compatible with new learning. To our knowledge, the only other application of such an approach in an animal model of a brain disorder has been inhibiting peripheral (retinal) activity with TTX for treatment of amblyopia (Fong et al, 2021). Our findings from CNS inhibition with a single systemic dose of diazepam greatly expands the potential applications, which could readily be tested in other mouse models of human disorders, and other learning deficits. Even in cases where the specific synaptic impairments and circuitry are less fully understood, the impact of suppressing neural activity during a period before training to reduce the threshold for plasticity could be empirically tested.  

      Fifth, our work extends the consideration of a BCM-like sliding threshold for plasticity to the cerebellum, whereas previous work has focused on models and experimental studies of forebrain circuits. Currently there is a surge of interest in the contribution of the cerebellum to functions and brain disorders previously ascribed to forebrain, hence we anticipate broad interest in this work. 

      Sixth, our results suggest that the history of plasticity rather than the history of firing rates may be the homeostat controlling the threshold for plasticity, at least at the synapses under consideration. Diazepam pre-treatment only enhanced learning in the L7-Fmr1 KO mice with a low “baseline” threshold for plasticity, as measured in vitro, and not WT mice. This suggests it is not the neural activity per se that drives the change in threshold for plasticity, but the interaction of activity with the plasticity mechanism.

      In the revised Discussion, we make all of the above points, to make the implications more clear to readers.  

      The broad interest in this topic is illustrated by two concrete examples. First, an abstract of this work was honored with selection for oral presentation at the November 2023 Symposium of the Molecular and Cellular Cognition Society, a conceptually wide-ranging organization with thousands of members worldwide. Second, the most closely related published work on activity-dependent metaplasticity in vivo, the Fong et al 2021 eLife paper demonstrating reversal of amblyopia by suppression of activity in the retina by TTX, attracted such broad interest, not just of professional scientists, but also the general public, as to be reported on National Public Radio’s All Things Considered, with an audience of 11.9 million people worldwide.  

      In considering the potential of this work for widespread influence, it is important to note that activitydriven changes in the threshold for plasticity could very well be a general property of most if not all synapses, yet very little is known about its function in vivo, especially during learning.  Therefore, the seminal conceptual and practical advances described above have the potential for profound implications throughout neuroscience, psychiatry, neurology and computer science/AI, the eLife criterion for designation as “landmark” in significance. We respectfully request that the reviewers and editor reassess the significance of our findings in light of our much-improved discussion of the broad significance of the work.

      eLife assessment, Strength of support

      Convincing evidence is presented that behavioral learning deficits associated with enhanced synaptic plasticity in a transgenic mouse model can be rescued by manipulations designed to reverse the saturation of synaptic plasticity. In particular, the finding that a previously FDA-approved therapeutic can rescue learning could provide important new insights for biologists, psychologists, and others studying learning and neurodevelopment.

      The designation of “Convincing” indicates “methodology in line with current state-of the-art.” In the revised Discussion, we more clearly highlight that our evidence is “more rigorous than current state-ofthe-art” in several respects, thereby meeting the eLife criterion for “Compelling”:

      (1) Comparison of learning deficits and effects of behavioral and pharmacological pretreatment across five closely related oculomotor learning tasks, which all depend on the same region of the cerebellum (the flocculus), but which previous work has found to vary in their dependence on LTD at the cerebellar parallel fiber-to-Purkinje cell synapses. 

      The “state-of-the-art” behavioral standard in the field of learning is assessment of a single learning task that depends on a given brain area, with the implicit or explicit assumption that the task chosen is representative of “cerebellum-dependent learning” or hippocampus-, amygdala-, basal ganglia-, cortex- dependent learning, etc. Sometimes there is a no-learning behavioral control. 

      Our study exceeds this standard by comparing across many different closely related learning tasks, which all depend on the cerebellar flocculus and other shared vestibular, visual, and oculomotor circuitry, but vary in their dependence on LTD at the cerebellar parallel fiber-to-Purkinje cell synapses. In the original submission, we reported results for high-frequency VOR-increase learning that were dramatically different than for three other VOR learning tasks for which there is less evidence for a role of LTD. Reviewer 2 noted, “the specificity of the effects to forms of plasticity previously shown to require LTD is remarkable.” In the revised manuscript, we provide new data for a second oculomotor learning task in which LTD has been implicated, OKR adaptation, with very similar results as for high-frequency VORincrease learning. The remarkable specificity of both the learning deficits and the effects of pre-training manipulations, in two different lines of mice, for the two specific learning tasks in which LTD has been most strongly implicated, and not the other three oculomotor learning tasks, substantially strengthens the evidence for the conclusion that the learning deficits and effects of pre-training are related specifically to the lower threshold for LTD, rather than the result of some other effect of the gene KO or pre-treatment on the cerebellar or oculomotor circuitry (discussed on lines 270-290 of revised manuscript). 

      (2) Replication of findings in more than one line of mice, targeting distinct signaling pathways, with a common impact of enhancing LTD at the cerebellar PF-Purkinje cell synapses.  

      State-of-the-art is to report the effects of one specific molecular signaling pathway on behavior. 

      In the first part of this Research Advance, we replicate the findings of Nguyen-Vu et al 2017 for a completely different line of mice with enhanced LTD at the parallel fiber-to-Purkinje cell synapses. Like the comparison across LTD-dependent and LTD-independent oculomotor learning tasks, the comparison across completely different lines of mice with enhanced LTD strengthens the evidence that the shared behavioral phenotypes are a reflection of the state of LTD rather than other “off-target” effects of each mutation (discussed on lines 291-309 of revised manuscript).

      (3) Reversal of learning impairments with more than one type of treatment. 

      State-of-the-art is to be able to reverse a learning deficit or other functional impairment in an animal model of a brain disorder with a single treatment; indeed, success in this respect is viewed as wildly exciting, as evidenced by the reception by the scientific and lay communities of the Fong et al, 2021 eLife report of reversal of amblyopia by TTX treatment of the retina. 

      In the current work, we demonstrate reversal of learning deficits with two different types of treatment during the period before training, one behavioral and one pharmacological. The current diazepam pretreatment results provide a fundamentally new type of evidence for the hypothesis that the threshold for LTD and LTD-dependent learning varies with the recent history of activity in the circuit, complementing the evidence from behavioral and optogenetic pre-training approaches used previously in Nguyen-Vu et al, 2017 (discussed on lines 151-158 and 246-255 of revised manuscript).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Shakhawat et al., investigated how enhancement of plasticity and impairment could result in the same behavioral phenotype. The authors tested the hypothesis that learning impairments result from saturation of plasticity mechanisms and had previously tested this hypothesis using mice lacking two class I major histocompatibility molecules. The current study extends this work by testing the saturation hypothesis in a Purkinje-cell (L7) specific Fmr1 knockout mouse mice, which have enhanced parallel fiber-Purkinje cell LTD. The authors found that L7-Fmr1 knockout mice are impaired on an oculomotor learning task and both pre-training, to reverse LTD, and diazepam, to suppress neural activity, eliminated the deficit when compared to controls.

      Strengths:

      This study tests the "saturation hypothesis" to understand plasticity in learning using a well-known behavior task, VOR, and an additional genetic mouse line with a cerebellar cell-specific target, L7-Fmr1 KO. This hypothesis is of interest to the community as it evokes a novel inquisition into LTD that has not been examined previously.

      Utilizing a cell-specific mouse line that has been previously used as a genetic model to study Fragile X syndrome is a unique way to study the role of Purkinje cells and the Fmr1 gene. This increases the understanding in the field in regards to Fragile X syndrome and LTD.

      The VOR task is a classic behavior task that is well understood, therefore using this metric is very reliable for testing new animal models and treatment strategies. The effects of pretraining are clearly robust and this analysis technique could be applied across different behavior data sets.

      The rescue shown using diazepam is very interesting as this is a therapeutic that could be used in clinical populations as it is already approved.

      There was a proper use of controls and all animal information was described. The statistical analysis and figures are clear and well describe the results.

      We thank the reviewer for summarizing the main strengths of our original submission. We have further strengthened the revised submission by 

      (1) more fully discussing the broad conceptual implications, as outlined above; 

      (2) adding additional new data (Fig. 5) showing that another LTD-dependent oculomotor learning task, optokinetic reflex (OKR) adaptation, is impaired in the L7-Fmr1 KO mice and rescued by pre-treatment with diazepam, as we had already shown for high-frequency VOR increase learning;  3) responding to the specific points raised by the reviewers, as detailed below.

      Weaknesses:

      While the proposed hypothesis is tested using genetic animal models and the VOR task, LTD itself is not measured. This study would have benefited from a direct analysis of LTD in the cerebellar cortex in the proposed circuits.

      Our current experiments were motivated by the direct analysis of cerebellar LTD in Fmr1 knock out mice that was already published (Koekkoek et al., 2005). In that previous work, LTD was analyzed in both Purkinje cell selective L7-Fmr1 KO mice (Koekkoek et al., 2005; Fig. 4D), as used in our study, and global Fmr1 knock out mice (Koekkoek et al., 2005; Fig. 4B). Both lines were found to have enhanced LTD, as cited in the Introduction of our manuscript (lines 48-51, 63-64). The goal of our current study was to build on this previous work by analyzing the behavioral correlates of the findings from this previous, direct analysis of LTD. 

      Diazepam was shown to rescue learning in L7-Fmr1 KO mice, but this drug is a benzodiazepine and can cause a physical dependence. While the concentrations used in this study were quite low and animals were dosed acutely, potential side-effects of the drug were not examined, including any possible withdrawal. 

      In humans, diazepam (valium) is one of the most frequently prescribed drugs in the world, and the side effects and withdrawal symptoms have been extensively studied and documented.1 Withdrawal symptoms are generally not observed with treatments of less than 2 weeks (Brett and Murnion, 2015). After longterm treatments tapering of the dosage is recommended to mitigate withdrawal (Brett and Murnion, 2015 and https://americanaddictioncenters.org/valium-treatment/withdrawal-duration). The extensive data on the safety of diazepam in humans lowers the barrier to potential clinical translation of our basic science findings, although we emphasize that our own expertise is scientific, and translation to Fragile X patients or other patient groups will require additional development of the research by clinicians.

      Given the extensive history of research on this drug, we focused on looking for side effects that would reflect an adverse effect of diazepam on the function of the same oculomotor neural circuitry whose ability to support certain oculomotor learning tasks was improved after diazepam. In other words, we assessed whether the pharmacological manipulation was enhancing certain functions of a given circuit at the expense of others. As we note (line 164), “The acute effect of diazepam administration [measured 2 hours after administration] was to impair learning” in both WT and L7-Fmr1 KO mice. One could consider this a side effect. More importantly, we also tested extensively for oculomotor side-effects during the therapeutic period when learning impairments were eliminated in the L7-Fmr1 KOs, 18-24 hours post-administration, and have a full section of the Results describing our findings about this, titled “Specificity of pre-training effects on learning.” As described in the Results and Discussion (lines 184195, 312-318, Figure 3, figure 3-supplement1; figure 4B; figure 5-supplement 1), we found no such adverse side-effects, which is again encouraging with respect to the translational potential of our findings. 

      This drug is not specific to Purkinje cells or cerebellar circuits, so the action of the drug on cerebellar circuitry is not well understood for the study presented.

      The effects of diazepam are indeed not specific to Purkinje cells, but rather are known to be widespread. Diazepam is a positive allosteric modulator of GABAA receptors, which are found throughout the brain, including the cerebellum. When delivered systemically, as we did in our experiments, diazepam will suppress neural activity throughout the brain by facilitating inhibition, as documented by decades of previous research with this and related benzodiazepines, including dozens of studies of the effects of diazepam in the cerebellum. 

      To our knowledge, there is currently no drug that can specifically inhibit Purkinje cells, especially one that can be given systemically to cross the blood-brain barrier. Moreover, if such a drug did exist, we would not predict it to have the same effect as diazepam in reversing the learning deficits of the L7-Fmr1 KO mice, because the latter presumably depends on suppression of activity in the cerebellar granule cells and neurons of the inferior olive, whose axons form the parallel fibers and climbing fibers, and whose correlated activity controls LTD at the parallel fiber-Purkinje cell synapses.  

      We have revised the text to clarify the key point that despite its widespread action on the brain, the effects of diazepam on cerebellum-dependent learning were remarkably specific (lines 184-195, 210-228, 312318). During the period 18-24 hours after a single dose of diazepam, the learning deficits of L7-Fmr1 KO mice on two LTD-dependent oculomotor learning tasks were completely reversed, with no effects on the same tasks in WT mice, and no effects (“side-effects”) in L7-Fmr1 KO mice or WT mice on other, LTDindependent oculomotor learning tasks that depend on the same region of the cerebellum, and no effects on baseline performance of visually or vestibularly driven eye movements. 

      As described in the revised Discussion (lines 318-323), the non-specific mild suppression of neural activity throughout the brain by diazepam makes it a potentially generalizable approach for inducing BCM-like shifts in the threshold for associative plasticity to facilitate subsequent learning. More specifically, diazepam-mediated reduction of activity throughout the brain has the potential to lower any aberrantly high thresholds for associative plasticity at synapses throughout the brain, and thereby reverse any learning deficits associated with such aberrantly high plasticity thresholds. This approach might even be useful in cases where the neural circuitry supporting a given behavior is not well characterized and the specific synapses responsible for the learning deficit are unknown. On lines 323-327 we compare this generalizable approach with the challenges of designing task- and circuit-specific approaches to reset the threshold for plasticity, particularly in circuits that are less well characterized than the oculomotor circuit.

      It was not mentioned if L7-Fmr1 KO mice have behavior impairments that worsen with age or if Purkinje cells and the cerebellar microcircuit are intact throughout the lifespan. 

      At the adult ages used in our study (8-22 weeks), the oculomotor circuitry, including the Fmr1-deficient Purkinje cells, appears to be functionally intact because all of the oculomotor performance and learning tasks we tested were either normal, or could be restored to normal with brief behavioral and/or pharmacological pre-treatment.  

      Any degeneration of the Fmr1-deficient Purkinje cells or cerebellar microcircuit or additional behavioral impairments at older ages, if they should exist, would not alter our interpretation of the results from 8-22 week old adults regarding history- and activity-dependent changes in the capacity for LTD-dependent learning. Therefore, we leave the question of changes throughout the lifespan to investigators with an interest and expertise in development and/or aging. 

      Only a small handful of the scores of previous studies of the Fmr1 KO mouse model have investigated age-dependent effects; the reviewer may be interested in papers such as Tang et al., 2015 (doi: 10.1073/pnas.1502258112) or Martin et al., 2016 (doi: 10.1093/cercor/bhv031). 

      Connections between Purkinje cells and interneurons could also influence the behavior results found.

      This comment is repeated below in a more general form (Reviewer 1, second to last comment)—please see our response there and lines 270-309 of the revised manuscript for a discussion of how concerns about “off-target” effects are mitigated by the high degree of specificity of the learning deficits and effects of pre-training for the specific learning tasks in which LTD has been previously implicated, and the very similar findings in two different lines of mice with enhanced LTD.

      While males and females were both used for the current study, only 7 of each sex were analyzed, which could be underpowered. While it might be justified to combine sexes for this particular study, it would be worth understanding this model in more detail.

      We performed additional analyses to address the question of whether there might be sex differences that were not detected because of the sample size.

      (1) In a new figure, Fig. 1-figure supplement 1, we break out the results for male and female mice in separate plots, and show that all of the effects of both the KO of Fmr1 from the Purkinje cells and of pretreatment with diazepam that are observed in the full cohort are also statistically significant in just the subset of male mice, and just the subset of female mice (see Fig. 1-figure supplement 1 legend for statistics). In other words, qualitatively, there are no sex differences, and all of the conclusions of our manuscript are statistically valid in both male and female mice. This strengthens the justification for combining sexes for the specific scientific purposes of our study.  

      (2) We performed a power analysis to determine how many mice would be needed to determine whether the very, very small quantitative differences between male and female mice are significant. The analysis indicates that this would require upwards of 70 mice of each sex for WT mice (Cohen’s d, 0.6162; power

      0.95) and upwards of 2500 mice of each sex for L7-Fmr1 KO mice (Cohen’s d, 0.0989; power 0.95). Since the very small quantitative sex differences observed in our cohorts would not alter our scientific conclusions or the possibility for clinical application to patients of both sexes, even if the small quantitative differences turned out to be significant, the very large number of animals needed did not seem warranted for the current scientific purposes. Researchers focused on sex differences may find a motivation to pursue this issue further.   

      Training was only shown up to 30 minutes and learning did not seem to plateau in most cases. What would happen if training continued beyond the 30 minutes? Would L7-Fmr1 KO mice catch-up to WT littermates? Nguyen-Vu

      (1) For VOR learning, we used a 30 min training time because in our past (e.g., Boyden et al., 2003; Kimpo and Raymond, 2007; Nguyen-Vu et al., 2013; Nguyen-Vu et al., 2017) and current results, we find that VOR learning does plateau quite rapidly, with little or no additional adaptive change in the VOR observed between the tests of learning after 30 min vs 20 min of VOR-increase training, in WT or L7Fmr1 KO mice (Fig. 1A; WT, p=0.917; L7-Fmr1 KO, p=0.861; 20 vs. 30 min; Tukey). In the L7-Fmr1 KO mice, there is no significant high-frequency VOR-increase learning after 30 min training, and the mean VOR gain is even slightly lower on average (not significant) than before training (Fig. 1A, red). Therefore, we have no reason to expect that the L7-Fmr1 KO mice would catch up to WT after additional VOR-increase training.  

      (2) We have added new data on OKR adaptation, induced with 60 min of training (Fig. 5). The L7-Fmr1 KO mice exhibited impaired OKR adaptation, even with 60 min of training (p= 1.27x10-4, Tukey). In our experience, restraint for longer than 60 min produces a behavioral state that is not conducive to learning, as also reported by (Katoh and Yamagiwa, 2018), therefore longer training times were not attempted. 

      The pathway discussed as the main focus for VOR in this learning paradigm was connections between parallel fibers (PF) and Purkinje cells, but the possibility of other local or downstream circuitry being involved was not discussed. PF-Purkinje cell circuits were not directly analyzed, which makes this claim difficult to assess.

      In the revised manuscript (lines 299-309), we have expanded our discussion of the possibility that loss of expression of Fmr1 from Purkinje cells in the Purkinje cell-specific L7-Fmr1 KO mice might influence other synapses or intrinsic properties of the Purkinje cells (including synapses from interneurons, as raised in this reviewer’s comment above), in addition to enhancing associative LTD at the parallel fiberPurkinje cell synapses. 

      It is a very general limitation of all perturbation studies, even cell-type specific perturbation studies as in the current case, that it is never possible to completely rule out “off-target” effects of the manipulation. Because of this, causality cannot be definitively concluded from correlations (e.g., between the effects of a perturbation observed at the cellular and behavioral level), and therefore we make no such claim in our manuscript. Rather, we conclude that our results “provide evidence for,” “support,” “predict,” or “are consistent with” the hypothesis of a history- and activity-dependent change in the threshold for associative LTD at the parallel fiber-Purkinje cells.

      That said, perturbation is still one of the major tools in the experimental toolbox, and there are approaches for mitigating concern about off-target effects. We highlight three aspects of our experimental design that accomplish this (lines 184-228, 256-309). First, we show nearly identical learning impairments and effects of behavioral pretreatment in lines of mice with two completely different molecular manipulations that have the common effect of enhancing PF-Purkinje cell LTD, but are likely to have different off-target cellular effects on the Purkinje cells and their synapses. Second, we show that the learning impairments were highly specific to oculomotor learning tasks in which PF-Purkinje cell LTD was previously implicated, with no such effects on three other oculomotor learning tasks that depend on the same region of the cerebellum and oculomotor circuitry. In the original submission, we provided data for one LTDdependent oculomotor learning task, high-frequency VOR-increase learning; in the revised manuscript we provide new data for a second LTD-dependent oculomotor learning task, optokinetic reflex adaptation, with nearly identical results (Fig. 5). Third, we show that the effects of diazepam pre-treatment were highly specific to the same two LTD-dependent oculomotor learning tasks and also highly specific to the L7-Fmr1 KO mice with enhanced LTD and not WT mice. These three features of the experimental design are not common in studies of learning, especially in combination. On lines 256-309, we provide an expanded discussion of how together, these three features of the design strengthen the evidence that the learning impairments and effects of diazepam pre-treatment on learning are related to LTD at the PF-Pk synapses, while acknowledging the possibility of other effects on the circuit. 

      The authors mostly achieved their aim and the results support their conclusion and proposed hypothesis. This work will be impactful on the field as it uses a new Purkinje-cell specific mouse model to study a classic cerebellar task. The use of diazepam could be further analyzed in other genetic models of neurodevelopmental disorders to understand if effects on LTD can rescue other pathways and behavior outcomes.

      We agree that the present findings are potentially relevant for a very wide array of behavioral tasks, disease models, and brain areas beyond the specific ones in our study, and we make this point on lines 310-338 of the revised manuscript. 

      Reviewer #2 (Public Review):

      This manuscript explores the seemingly paradoxical observation that enhanced synaptic plasticity impairs (rather than enhances) certain forms of learning and memory. The central hypothesis is that such impairments arise due to saturation of synaptic plasticity, such that the synaptic plasticity required for learning can no longer be induced. A prior study provided evidence for this hypothesis using transgenic mice that lack major histocompatibility class 1 molecules and show enhanced long-term depression (LTD) at synapses between granule cells and Purkinje cells of the cerebellum. The study found that a form of LTD-dependent motor learning-increasing the gain of the vestibulo-ocular reflex (VOR)-is impaired in these mice and can be rescued by manipulations designed to "unsaturate" LTD. The present study extends this line of investigation to another transgenic mouse line with enhanced LTD, namely, mice with the Fragile X gene knocked out. The main findings are that VOR gain increased learning is selectively impaired in these mice but can be rescued by specific manipulations of visuomotor experience known to reverse cerebellar LTD. Additionally, the authors show that a transient global enhancement of neuronal inhibition also selectively rescues gain increases learning. This latter finding has potential clinical relevance since the drug used to boost inhibition, diazepam, is FDA-approved and commonly used in the clinic. The evidence provided for the saturation is somewhat indirect because directly measuring synaptic strength in vivo is technically difficult. Nevertheless, the experimental results are solid. In particular, the specificity of the effects to forms of plasticity previously shown to require LTD is remarkable. The authors should consider including a brief discussion of some of the important untested assumptions of the saturation hypothesis, including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation.

      We thank the reviewer for this exceptionally clear and concise assessment of the findings and strengths of the manuscript.

      We agree that one of the most “remarkable” aspects of our findings is the specificity of the effects for oculomotor learning tasks for which there is the strongest previous evidence for a role of PF-Purkinje cell LTD. In the original manuscript, we tested just one LTD-dependent oculomotor learning task, highfrequency VOR increase learning; in the revised manuscript, we strengthen the case for LTD-dependent task specificity by adding new data (Fig. 5) showing the same effects for OKR adaptation, an additional LTD-dependent oculomotor learning task.

      The reviewer’s suggestion to include discussion of “untested assumptions”, “including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation” prompted us to more deeply consider the broader implications of our results, and extensively revise the Discussion accordingly. We clarify that we consider historydependent changes in the threshold for LTD to be a prediction of the behavioral and pharmacological findings (lines 339-347, 356) rather than an assumption. In addition, we highlight the broader implications of the results by putting them in the context of work in other brain areas on historydependent changes in the threshold for plasticity, i.e., metaplasticity, going back to the seminal Bienenstock-Cooper-Munro (BCM; year) theory (lines 348-378).  

      Reviewer #1 (Recommendations for The Authors):

      The text and figures are very clear to read, but there are a couple of questions that remain:

      The concentrations chosen for diazepam are not well described and it is unclear why the concentrations jump from 2.5 mg/kg to 0.5 mg/kg. Please add an explanation for these concentrations and if any additional behavior outcomes were observed.

      Our choice of diazepam concentrations was guided by the concentrations reported in the literature to be effective in mice, which suggest that a higher dose (2 mg/kg) can have additional effects not observed with a lower effective dose (0.5 mg/kg) (Pádua-Reis et al, 2021). Since we did not know how much enhancement of inhibition/suppression of activity might be necessary to substantially reduce the induction of PF-Purkinje cell LTD, we did pilot experiments to test concentrations at the low and high ends of the doses typically used in mice. These pilot experiments revealed that a lower dose of 0.4 or 0.5 mg/kg was comparable to the higher dose of 2.5 mg/kg in suppressing VOR-increase learning 2 hours after administration (Fig. 3 – figure supplement 2). Anecdotally, we observed higher levels of locomotor activity and other abnormal cage behavior during the period immediately after administration of the higher compared to the lower dose. To limit these side effects and any possibility of dependence, we used only the lower dose in all subsequent experiments. We clarify this rationale for using a lower dose in the legend of Fig. 3 – figure supplement 2.   

      Figure 4 describes low-frequency VOR, but the paragraph discussing these results (line 191) mentions high-frequency VOR-increase learning. It is unclear where the results are for the high-frequency data. Please include or rephrase for clearer understanding.

      In the revised manuscript, we clarify that the 1 Hz vestibular and visual stimuli used in Figs. 1-3 is the

      “high” frequency, which yields different results than the “low” frequency of 0.5 Hz (Fig. 4), as also observed in Boyden et al 2006, and Nguyen-Vu et al, 2017. 

      Reviewer #2 (Recommendations For The Authors):

      The authors should consider including a brief discussion of some of the important untested assumptions of the saturation hypothesis, including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation.

      We thank the reviewer for this comment, which, along with your public comments, inspired us to thoroughly reconsider and revise our Discussion. We think this has greatly improved the manuscript, and will substantially increase its appeal to a broad segment of the neuroscience research community, including computational neuroscientists as well as those interested in synaptic physiology, learning and memory, or plasticity-related brain disorders including autism. 

      Note that we consider the idea that ”LTD depends not only on pre- and post- synaptic activity but also on the prior history of synaptic activation” to be the central prediction of the threshold metaplasticity hypothesis rather than an assumption, and in the revised manuscript we explicitly refer to this as a prediction (line 339, 356).  We also added a discussion of multiple known cellular phenomena in the Purkinje cells and their synapses that can regulate LTD and thus represent candidate mechanisms for LTD threshold metaplasticity (lines 339-347). Again, sincere thanks for prompting us to write a vastly improved Discussion section.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact pvalues wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported in the main text for all key questions and not only when the p-value is less than 0.05.

      We have added exact p-values throughout the manuscript.  

      References

      Albergaria C, Silva NT, Pritchett DL, Carey MR. (2018). Locomotor activity modulates associative learning in mouse cerebellum. Nat Neurosci.21:725-735. doi: 10.1038/s41593-018-0129-x.

      Abraham WC, Mason-Parker SE, Bear MF, Tate WT. (2001). Heterosynaptic metaplasticity in the hippocampus in vivo: A BCM-like modifiable threshold for LTP. Proc Natl Acad Sci USA. 98:1092410929.

      Bienenstock E, Cooper L, Munro P. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J Neurosci. 2:32-48. https://doi.org/10.1523/JNEUROSCI.02-01-00032.1982

      Brett J, Murnion B. (2015). Management of benzodiazepine misuse and dependence. Aust Prescr.38:152155. doi: 10.18773/austprescr.055.

      Boyden ES, Raymond JL. (2003). Active Reversal of Motor Memories Reveals Rules Governing Memory Encoding. Neuron.39:1031-1042. https://doi.org/10.1016/S0896-6273(03)00562-2

      Boyden ES, Katoh A, Pyle JL, Chatila TA, Tsien RW, Raymond JL. (2006). Selective engagement of plasticity mechanisms for motor memory storage. Neuron. 51:823-834. https://doi.org/10.1016/j.neuron.2006.08.026

      Desai NS, Cudmore RH, Nelson SB, Turrigiano GG. (2002). Critical periods for experience-dependent synaptic scaling in visual cortex. Nat Neurosci. 5:783-789. doi: 10.1038/nn878.

      Fong M, Duffy KR, Leet MP, Candler CT, Bear MF. (2021). Correction of amblyopia in cats and mice after the critical period. ELife.10:e70023. https://doi.org/10.7554/eLife.70023

      Hamada M, Terao Y, Hanajima R, Shirota Y, Nakatani-Enomoto S, Furubayashi T, Matsumoto H, Ugawa Y. (2008). Bidirectional long-term motor cortical plasticity and metaplasticity induced by quadripulse transcranial magnetic stimulation. J Physiol. 586:3927-3947. doi: 10.1113/jphysiol.2008.152793.

      Katoh A, Yamagiwa A. (2018). Inhibition of PVN neurons influences stress-induced changes of motor learning in the VOR. Society for Neuroscience. Online Program No. 067.14.

      Kimpo RR, Raymond JL. (2007). Impaired motor learning in the vestibulo-ocular reflex in mice with multiple climbing fiber input to cerebellar Purkinje cells. J Neurosci. 27:5672-5682. doi:

      10.1523/JNEUROSCI.0801-07.2007.

      Kirkwood A, Rioult MG, Bear MF. (1996). Experience-dependent modification of synaptic plasticity in visual cortex. Nature. 381:526–528. https://doi.org/10.1038/381526a0

      Koekkoek SK, Yamaguchi K, Milojkovic BA, Dortland BR, Ruigrok TJ, Maex R, De Graaf W, Smit AE, VanderWerf F, Bakker CE, Willemsen R, Ikeda T, Kakizawa S, Onodera K, Nelson DL, Mientjes E, Joosten M, De Schutter E, Oostra BA, Ito M, De Zeeuw CI. (2005). Deletion of FMR1 in Purkinje Cells Enhances Parallel Fiber LTD, Enlarges Spines, and Attenuates Cerebellar Eyelid Conditioning in Fragile X Syndrome. Neuron. 47:339–352. https://doi.org/10.1016/j.neuron.2005.07.005

      Le Friec A, Salabert AS, Davoust C, Demain B, Vieu C, Vaysse L, Payoux P, Loubinoux I. (2017). Enhancing Plasticity of the Central Nervous System: Drugs, Stem Cell Therapy, and Neuro-Implants. Neural Plast. 2017:2545736. doi: 10.1155/2017/2545736.

      Leet MP, Bear MF, Gaier ED. (2022). Metaplasticity: a key to visual recovery from amblyopia in adulthood? Curr Opin Ophthalmol. 33:512–518. https://doi.org/10.1097/ICU.0000000000000901

      Martin HGS, Lassalle O, Brown JT, Manzoni OJ. (2016). Age-Dependent Long-Term Potentiation Deficits in the Prefrontal Cortex of the Fmr1 Knockout Mouse Model of Fragile X Syndrome. Cereb Cortex. 26:2084–2092. doi: 10.1093/cercor/bhv031.

      Montgomery JM, Madison DV. (2002). State-dependent heterogeneity in synaptic depression between pyramidal cell pairs. Neuron. 33:765-777. doi: 10.1016/s0896-6273(02)00606-2.

      Nguyen-Vu TDB, Kimpo RR, Rinaldi JM, Kohli A, Zeng H, Deisseroth K, Raymond JL. (2013). Cerebellar Purkinje cell activity drives motor learning. Nat Neurosci. 16:1734-1736. doi:

      10.1038/nn.3576.

      Nguyen-Vu TB, Zhao GQ, Lahiri S, Kimpo RR, Lee H, Ganguli S, Shatz CJ, Raymond JL. (2017). A saturation hypothesis to explain both enhanced and impaired learning with enhanced plasticity. ELife. 6:e20147. https://doi.org/10.7554/eLife.20147

      Pádua-Reis M, Nôga DA, Tort ABL, Blunder M. (2021). Diazepam causes sedative rather than anxiolytic effects in C57BL/6J mice. Sci Rep. 2021;11:9335.

      Singh A, Nagpal R, Mittal SK, Bahuguna C, Kumar P. (2017). Pharmacological therapy for amblyopia. Taiwan J Ophthalmol. 7:62-69. doi: 10.4103/tjo.tjo_8_17.

      Tang B, Wang T, Wan H, Han L, Qin X, Zhang Y, Wang J, Yu C, Berton F, Francesconi W, Yates JR 3rd, Vanderklish PW, Liao L. (2015). Fmr1 deficiency promotes age-dependent alterations in the cortical synaptic proteome. Proc Natl Acad Sci USA. 112:E4697-E4706. doi: 10.1073/pnas.1502258112.

      Yamaguchi T, Moriya K, Tanabe S, Kondo K, Otaka Y, Tanaka S. (2020). Transcranial direct-current stimulation combined with attention increases cortical excitability and improves motor learning in healthy volunteers. J Neuroeng Rehabil. 17:23. doi: 10.1186/s12984-020-00665-7.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This valuable work performed fMRI experiments in a rodent model of absence seizures. The results provide new information regarding the brain's responsiveness to environmental stimuli during absence seizures. The authors suggest reduced responsiveness occurs during this type of seizure, and the evidence leading to the conclusion is solid, although reviewers had divergent opinions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the effects of two sensory stimuli (visual and somatosensory) on fMRI responsiveness during absence seizures were investigated in GEARS rats with concurrent EEG recordings. SPM analysis of fMRI showed a significant reduction in whole-brain responsiveness during the ictal period compared to the interictal period under both stimuli, and this phenomenon was replicated in a structurally constrained whole-brain computational model of rat brains.

      The conclusion of this paper is that whole-brain responsiveness to both sensory stimuli is inhibited and spatially impeded during seizures.

      Reviewer #2 (Public Review):

      Summary:

      This study examined the possible affect of spike-wave discharges (SWDs) on the response to visual or somatosensory stimulation using fMRI and EEG. This is a significant topic because SWDs often are called seizures and because there is non-responsiveness at this time, it would be logical that responses to sensory stimulation are reduced. On the other hand, in rodents with SWDs, sensory stimulation (a noise, for example) often terminates the SWD/seizure.

      In humans, these periods of SWDs are due to thalamocortical oscillations. A certain percentage of the normal population can have SWDs in response to photic stimulation at specific frequencies. Other individuals develop SWDs without stimulation. They disrupt consciousness. Individuals have an absent look, or "absence", which is called absence epilepsy.

      The authors use a rat model to study the responses to stimulation of the visual or somatosensory systems during and in between SWDs. They report that the response to stimulation is reduced during the SWDs. While some data show this nicely, the authors also report on lines 396-8 "When comparing statistical responses between both states, significant changes (p<0.05, cluster-) were noticed in somatosensory auditory frontal..., with these regions being less activated in interictal state (see also Figure 4). That statement is at odds with their conclusion. I do not see that this issue was addressed.

      See comments below starting with “We acknowledge the reviewer…”.

      They also conclude that stimulation slows the pathways activated by the stimulus. I do not see any data proving this. It would require repeated assessments of the pathways in time. This issue was not addressed.

      See comments below starting with “We acknowledge the reviewer…”.

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data. This is still an issue. No conclusions appear to be possible to make.

      See comments below starting with “We acknowledge the reviewer…”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The authors did not add any validation of their model.

      See comments below starting with “We acknowledge the reviewer…”.

      Strengths:

      Use of fMRI and EEG to study SWDs in rats.

      Weaknesses:

      Several aspects of the Methods and Results were improved but some are still are unclear.

      We acknowledge the reviewer for the concerns of we not addressing the comments above. However, we emphasize that most of the comments were addressed in the already sent “Response to Review Comments” and in the updated manuscript. Here we repeat the responses and provide also additional clarifications to some of the comments.

      We thank the reviewer for noting the discrepancy in the statement of “less activated in interictal state”. The statement should have been written vice versa. We also address that the direction of activation change between groups can be misinterpreted based on statistical maps itself (Figure 3) where only statistical changes are visible and not the polarity of response (can be seen in Figure 4). Therefore, we have made a following changes in the section 3.3.: “There were more voxels with significant changes of activity during interictal state compared to ictal state (136% more). Comparing the statistical responses between interictal and ictal states revealed significant changes (p<0.05, cluster-level corrected) in the visual, somatosensory, and medial frontal cortices. In the ictal state, these regions showed significant hemodynamic decreases when comparing to interictal state, and these polarity changes can be seen the hemodynamic response functions (Figure 4).”

      We agree with the reviewer that there are no data showing slowing of the pathways in response to stimulus. However, we are a bit confused about this comment, as to what part in conclusion section it refers to. We did not intentionally claim that stimulation slows the activated pathways in the manuscript.

      Reviewer is right that strong claims cannot be made from HRF by itself. Therefore, we have avoided to such phrasing throughout the manuscript. In the conclusion section, we speculate that HRF decreases “could play a role in decreased sensory perception” but also state that “further studies are required”. The observed HRF decreases (rather than increases) in the cortex when stimulation was applied during SWD, was discussed in section 4.4., where we speculated that neuronal suppression (possible apparent in negative HRFs) caused by SWD can prevent responsiveness. Conclusion now states the following: “Moreover, the detected decreases in the cortical HRF when sensory stimulation was applied during spike-and-wave discharges, could play a role in decreased sensory perception. Further studies are required to evaluate whether this HRF change is a cause or a consequence of the reduced neuronal response.”

      We point out that the main validation of the model and its details were provided in the previous answer to the reviewer and added to the manuscript. The model presented in the paper is based on a mean-field formalism that captures neuronal activity at the mesoscale level. This mean-field formalism is derived via a detailed statistical description of the activity of a spiking neuronal population of excitatory and inhibitory with conductance-based synaptic interactions. Thus, the validation of the mean-field model is performed via direct comparison between the dynamics obtained from the mean-field model and the dynamics obtained from the underlying spiking neural network model. This comparison is shown in the supplementary material of the manuscript, where the transition studied in the paper between interictal (asynchronous irregular activity) and ictal (SWD dynamics) activity, which is predicted by the mean-field model, is indeed observed in the underlying spiking neuronal model. The existence of these two types of dynamics and the transition between them is the main component of the model used to build the analysis of the responsiveness performed in the paper (which has been properly validated).

      Reviewer #3 (Public Review):

      Summary:

      This is an interesting paper investigating fMRI changes during sensory (visual, tactile) stimulation and absence seizures in the GAERS model. The results are potentially important for the field and do suggest that sensory stimulation may not activate brain regions normally during absence seizures. But the findings are limited by substantial methodological issues that do not enable fMRI signals related to absence seizures to be fully disentangled from fMRI signals related to the sensory stimuli.

      Strengths:

      Investigating fMRI brain responses to sensory stimuli during absence seizures in an animal model is a novel approach with potential to yield important insights.

      Use of an awake, habituated model is a valid and potentially powerful approach.

      Weaknesses:

      The major difficulty with interpreting the results of this study is that the duration of the visual and tactile stimuli were 6 seconds, which is very close to the mean seizure duration per Table 1. Therefore the HRF model looking at fMRI responses to visual or auditory stimuli occurring during seizures was simultaneously weighting both seizure activity and the sensory (visual or auditory) stimuli over the same time intervals on average. The resulting maps and time courses claiming to show fMRI changes from visual or auditory stimulation during seizures will therefore in reality contain some mix of both sensory stimulation-related signals and seizure-related signals. The main claim that the sensory stimuli do not elicit the same activations during seizures as they do in the interictal period may still be true. But the attempts to localize these differences in space or time will be contaminated by the seizure related signals.

      In their response to this comment the authors state that some seizures had longer than average duration, and that they attempted to model the effects of both seizures and sensory stimulation. However these factors do not mitigate the concern because the mean duration of seizures and sensory stimulation remain nearly identical, and the models used therefore will not be able to effectively separate signals related to seizures and related to sensory stimulation.

      Regressors for seizures were formed by including periods of seizures without any stimulation present. In theory, if seizures were perfectly modeled by the regressor, the left variance is completely orthogonal to the main effect of the stimulus. Furthermore, only the cases where the seizures are longer than the stimulus are used to calculate the responsiveness of the stimulus (while the cases where the seizures are shorter than the stimulus are used as nuisance regressors to account for error variance). However, we agree with the reviewer that in practice all effects of the seizure cannot be removed completely from the effect of stimulus. We have addressed this concern in the “physiologic and methodology consideration” section: “We note a caution that presented maps and time courses showing fMRI changes from visual or whisker stimulation during seizures may contain a mixture of both sensory stimulation-related signals and seizure-related signals. To minimize this contamination in the linear model used, we considered both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the stimulation should be separated as much as possible from the effects caused by the seizure itself.”

      The claims that differences were observed for example between visual cortex and superior colliculus signals with visual stim during seizures vs interictal remain unconvincing due to above.

      Maps shown in Figure 3 do not show clear changes in the areas claimed to be involved.

      In their response the authors enlarged the cross sections. However there are still discrepancies between the images and the way they are described in the text. For example, in the Results text the authors say that comparing the interictal and ictal states revealed less activation in the somatosensory cortex during the ictal than during the interictal state, yet Figure 3 bottom row left shows greater activation in somatosensory cortex in this contrast.

      We note that the direction of activation change between groups can be misinterpreted based on statistical maps itself (Figure 3) where only statistical changes are visible and not the polarity of response (can be seen in Figure 4). Therefore, we have made the following changes to the section 3.3.: “There were more voxels with significant changes of activity during interictal state compared to ictal state (136% more). Comparing the statistical responses between interictal and ictal states revealed significant changes (p<0.05, cluster-level corrected) in the visual, somatosensory, and medial frontal cortices. In the ictal state, these regions showed significant hemodynamic decreases when comparing to interictal state, and these polarity changes can be seen the hemodynamic response functions (Figure 4).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Authors have revised this paper with a lot of detail. The paper can be accepted for publication in this version.

      Reviewer #2 (Recommendations For The Authors):

      Reviewer #1

      (1) The analysis in this paper does not directly answer the scientific question posed by the authors, which is to explore the mechanisms of the reduced brain responsiveness to external stimuli during absence seizures (in terms of altered information processing), but merely characterizes the spatial involvement of such reduced responsiveness. The same holds for the use of mean-field modeling, which merely reproduces experimental results without explaining them mechanistically as what the authors have claimed at the head of the paper.

      We agree with the reviewer that the manuscript does not answer specifically about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states. The sentence that can lead to misinterpretations in the manuscript abstract: "The mechanism underlying the reduced responsiveness to external stimulus remains unknown." was therefore modified to the following "The whole-brain spatial and temporal characteristics of reduced responsiveness to external stimulus remains unknown".

      This change did not address the issue. The problem is that there is no experimentation to address the underlying mechanisms of the results. I also think the changed language in the abstract is less clear than the original.

      We fully agree that this manuscript does not answer or claim to be answering about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states, by means of hemodynamics and mean-field simulation.

      We have changed the language of the abstract to the following:

      “In patients suffering absence epilepsy, recurring seizures can significantly decrease their quality of life and lead to yet untreatable comorbidities. Absence seizures are characterized by spike-and-wave discharges on the electroencephalogram associated with a transient alteration of consciousness. However, it is still unknown how the brain responds to external stimuli during and outside of seizures.

      This study aimed to investigate responsiveness to visual and somatosensory stimulation in GAERS, a well-established rat model for absence epilepsy. Animals were maintained in a non-curarized awake state allowing for naturally occurring seizures to be produced inside the magnet. They were imaged continuously using a quiet zero-echo-time functional magnetic resonance imaging (fMRI) sequence. Sensory stimulations were applied during interictal and ictal periods. Whole brain responsiveness and hemodynamic responses were compared between these two states. Additionally, a mean-field simulation model was used to mechanistically explain the changes of neural responsiveness to visual stimulation between interictal and ictal states.

      Results showed that, during a seizure, whole-brain responses to both sensory stimulations were suppressed and spatially hindered. In several cortical regions, hemodynamic responses were negatively polarized during seizures, despite the application of a stimulus. The simulation experiments also showed restricted propagation of spontaneous activity due to stimulation and so agreed well with fMRI findings. These results suggest that sensory processing observed during an interictal state is hindered or even suppressed by the occurrence of an absence seizure, potentially contributing to decreased responsiveness during this absence epileptic process.”

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data.

      The response of the authors did not clarify this issue. Instead, they explained why they examined HRF and that they can only speculate what the data means.

      Reviewer is right that strong claims cannot be made from HRF by itself. Therefore, we have avoided to such phrasing throughout the manuscript. In the conclusion section, we speculate that HRF decreases “could play a role in decreased sensory perception” but also state that “further studies are required”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The conclusion is that the modeling supports the conclusions of the study, which is useful.

      Details about the model were added.

      This is not entirely satisfactory because there is still no validation of the model.

      We point out that the main validation of the model and its details were provided in the previous answer to the reviewer and added to the manuscript. The model presented in the paper is based on a mean-field formalism that captures neuronal activity at the mesoscale level. This mean-field formalism is derived via a detailed statistical description of the activity of a spiking neuronal population of excitatory and inhibitory with conductance-based synaptic interactions. Thus, the validation of the mean-field model is performed via direct comparison between the dynamics obtained from the mean-field model and the dynamics obtained from the underlying spiking neural network model. This comparison is shown in the supplementary material of the manuscript, where the transition studied in the paper between interictal (asynchronous irregular activity) and ictal (SWD dynamics) activity, which is predicted by the mean-field model, is indeed observed in the underlying spiking neuronal model. The existence of these two types of dynamics and the transition between them is the main component of the model used to build the analysis of the responsiveness performed in the paper (which has been properly validated).

      How is ROI defined in this paper? What type of atlas is used?

      Anatomical ROIs were drawn based on Paxinos and Watson rat brain atlas 7th edition. Region was selected if there were statistically significant activations detected inside that region, based on activation maps. We clarified the definition of ROI as the following:<br /> "Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps."

      This is helpful, but the unstained brain does not show the borders of the areas. Therefore just saying an atlas was used is not enough. How in an unstained brain can the areas be accurately outlined?

      Areas of the brain were differentiated by co-registering the functional MRI images with an T1-weighted anatomical reference brain that was created on site from the same data set that was used for the manuscript. Potential co-registration inaccuracies created by using a reference brain measured in different site, sequence and a rat strain can be thus avoided. T1-images create sufficient contrast to differentiate main brain areas, but for more accurate border definition (e.g., to differentiate different thalamic nuclei), a coordinate system of the atlas and coordinates known in the used anatomical brain, were used to pinpoint exact borders of the brain areas.

      Reviewer #2

      The following also is not precise:

      "Although seizures are initially triggered by hyperactive somatosensory cortical neurons, the majority of neuronal populations are deactivated rather than activated during the seizure, resulting in an overall decrease in neuronal activity during SWD (McCafferty et al. 2023)."

      What neuronal populations? Cortex? Which neurons in the cortex? Those projecting to the thalamus? What about thalamocortical relay cells? Thalamic gabaergic neurons?

      Please check that these issues were corrected.

      The issues were addressed as follows:

      “Although SWDs are initially triggered by hyperactive somatosensory cortical neurons, neuronal firing rates, especially in majority of frontoparietal cortical and thalamocortical relay neurons, are decreased rather than increased during SWD, resulting in an overall decrease in activity in these neuronal populations (McCafferty et al., 2023). Previous fMRI studies have demonstrated blood volume or BOLD signal decreases in several cortical regions including parietal and occipital cortex, but also, quite surprisingly, increases in subcortical regions such as thalamus, medulla and pons (David et al., 2008; McCafferty et al., 2023).”

      Results

      After removing problematic animals and sessions, was there sufficient power? There probably wasn't enough to determine sex differences.

      After removing problematic sessions, we found statistically significant results (multiple comparison corrected) results in both activation maps, and hemodynamic responses. To determine sex differences, there were not enough animals for statistical findings (p>0.05).

      This is not the question. The question is whether there was sufficient power.

      A simple power calculation was performed as follows: considering a t-test, a risk alpha of 0.05, a power of 0.8, matched pairs (seizure/control), we can detect an effect size of 0.37 with our 4 animals, considering repeated measurements (4 sessions/animal x 11 seizure/control pairs per session). This is now mentioned in the manuscript.

      Table 1 has no statistical comparisons.

      Table 1 is purely an illustration of stimulation and seizure occurrence. There is no specific interest to compare stimulation types (in what state of seizure it occurred) as it does not provide any meaningful inferences to the study.

      Table 1 could be improved by statistics. More could be said and there would be justification to include it.

      We thank the reviewer for the suggestion, but as it is yet unclear to what statistical comparison would be feasible to do, we opt to leave it out.

      Statistical activation maps - it is not clear how this was done.

      Creation of statistical maps are explained in section 2.5.3.

      This section is not clear.

      We have added a reference (https://doi.org/10.1002/hbm.460020402) for readers to familiarize themselves with the concept of statistical parametric mapping.

      Fig 3 "F-contrast maps." Please explain.

      Creation of statistical maps are explained in section 2.5.3.

      This section is unclear.

      We have added a reference (https://doi.org/10.1002/hbm.460020402) for readers to familiarize themself with the concept of statistical parametric mapping.

      Reviewer #3 (Recommendations For The Authors):

      Aside from the concerns listed as weaknesses above which were not addressed, most of the more minor comments were addressed by the authors in the resubmission. However, the comment below was not addressed because it is impossible to see any firing rate changes elicited by sensory stimuli (if they are present) due to the scale during seizures. The seizure signals should be removed or accounted for by the model so that any possible sensory stimulus-related signals could be seen, and displayed on the same scale as firing rates without seizures. Prior comment (unaddressed) is repeated below:

      Figure 6-figure supplement 1, the scales are very different for many of the plots so they are hard to compare. Especially in the ictal periods (D, E, F) it is hard to see if any changes are happening during ictal stimulation similar to interictal stimulation due to very different scales. The activity related to SWD is so large that it overshadows the rest, and perhaps should be subtracted out.

      These two comments were addressed and replied in the previous round of reviews. Regarding the different scales of the plots from Figure 6-figure supplement 1, we point out that all the plots in the same scale are already presented in Figure 6 of the main-text. Regarding the activity related to SWD and sensory stimulation, we remark that the effect of the stimulation should be (and was) evaluated with respect to the ongoing activity. All the results concerning the neuronal responsiveness presented in the paper evaluate the statistical significance of the changes in activity produced by the stimulation with respect to the ongoing activity (during ictal and interictal states respectively). For this reason, all the plots containing the time series of neuronal activity in the simulations include the ongoing activity (with SWD dynamics when present) for proper comparison and relevant analysis. 

      Additional changes:

      In the section 3.2., the sentence: “In addition, responses were observed in the somatosensory cortex during a seizure state.” was removed for clarification purposes as deactivation rather than activation was observed in this brain area during a seizure state.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study explores infants' attention patterns in real-world settings using advanced protocols and cutting-edge methods. The presented evidence for the role of EEG theta power in infants' attention is currently incomplete. The study will be of interest to researchers working on the development and control of attention.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper investigates the physiological and neural processes that relate to infants' attention allocation in a naturalistic setting. Contrary to experimental paradigms that are usually employed in developmental research, this study investigates attention processes while letting the infants be free to play with three toys in the vicinity of their caregiver, which is closer to a common, everyday life context. The paper focuses on infants at 5 and 10 months of age and finds differences in what predicts attention allocation. At 5 months, attention episodes are shorter and their duration is predicted by autonomic arousal. At 10 months, attention episodes are longer, and their duration can be predicted by theta power. Moreover, theta power predicted the proportion of looking at the toys, as well as a decrease in arousal (heart rate). Overall, the authors conclude that attentional systems change across development, becoming more driven by cortical processes.

      Strengths:

      I enjoyed reading the paper, I am impressed with the level of detail of the analyses, and I am strongly in favour of the overall approach, which tries to move beyond in-lab settings. The collection of multiple sources of data (EEG, heart rate, looking behaviour) at two different ages (5 and 10 months) is a key strength of this paper. The original analyses, which build onto robust EEG preprocessing, are an additional feat that improves the overall value of the paper. The careful consideration of how theta power might change before, during, and in the prediction of attention episodes is especially remarkable. However, I have a few major concerns that I would like the authors to address, especially on the methodological side.

      Points of improvement

      (1) Noise

      The first concern is the level of noise across age groups, periods of attention allocation, and metrics. Starting with EEG, I appreciate the analysis of noise reported in supplementary materials. The analysis focuses on a broad level (average noise in 5-month-olds vs 10-month-olds) but variations might be more fine-grained (for example, noise in 5mos might be due to fussiness and crying, while at 10 months it might be due to increased movements). More importantly, noise might even be the same across age groups, but correlated to other aspects of their behaviour (head or eye movements) that are directly related to the measures of interest. Is it possible that noise might co-vary with some of the behaviours of interest, thus leading to either spurious effects or false negatives? One way to address this issue would be for example to check if noise in the signal can predict attention episodes. If this is the case, noise should be added as a covariate in many of the analyses of this paper. 

      We thank the reviewer for this comment. We certainly have evidence that even the most state-of-the-art cleaning procedures (such as machine-learning trained ICA decompositions, as we applied here) are unable to remove eye movement artifact entirely from EEG data (Haresign et al., 2021; Phillips et al., 2023). (This applies to our data but also to others’ where confounding effects of eye movements are generally not considered.) Importantly, however, our analyses have been designed very carefully with this explicit challenge in mind. All of our analyses compare changes in the relationship between brain activity and attention as a function of age, and there is no evidence to suggest that different sources of noise (e.g. crying vs. movement) would associate differently with attention durations nor change their interactions with attention over developmental time. And figures 5 and 7, for example, both look at the relationship of EEG data at one moment in time to a child’s attention patterns hundreds or thousands of milliseconds before and after that moment, for which there is no possibility that head or eye movement artifact can have systematically influenced the results.

      Moving onto the video coding, I see that inter-rater reliability was not very high. Is this due to the fine-grained nature of the coding (20ms)? Is it driven by differences in expertise among the two coders? Or because coding this fine-grained behaviour from video data is simply too difficult? The main dependent variable (looking duration) is extracted from the video coding, and I think the authors should be confident they are maximising measurement accuracy.

      We appreciate the concern. To calculate IRR we used this function (Cardillo G. (2007) Cohen's kappa: compute the Cohen's kappa ratio on a square matrix. http://www.mathworks.com/matlabcentral/fileexchange/15365). Our “Observed agreement” was 0.7 (std= 0.15). However, we decided to report the Cohen's kappa coefficient, which is generally thought to be a more robust measure as it takes into account the agreement occurring by chance. We conducted the training meticulously (refer to response to Q6, R3), and we have confidence that our coders performed to the best of their abilities.

      (2) Cross-correlation analyses

      I would like to raise two issues here. The first is the potential problem of using auto-correlated variables as input for cross-correlations. I am not sure whether theta power was significantly autocorrelated. If it is, could it explain the cross-correlation result? The fact that the cross-correlation plots in Figure 6 peak at zero, and are significant (but lower) around zero, makes me think that it could be a consequence of periods around zero being autocorrelated. Relatedly: how does the fact that the significant lag includes zero, and a bit before, affect the interpretation of this effect? 

      Just to clarify this analysis, we did include a plot showing autocorrelation of theta activity in the original submission (Figs 7A and 7B in the revised paper). These indicate that theta shows little to no autocorrelation. And we can see no way in which this might have influenced our results. From their comments, the reviewer seems rather to be thinking of phasic changes in the autocorrelation, and whether the possibility that greater stability in theta during the time period around looks might have caused the cross-correlation result shown in 7E. Again though we can see no way in which this might be true, as the cross-correlation indicates that greater theta power is associated with a greater likelihood of looking, and this would not have been affected by changes in the autocorrelation.

      A second issue with the cross-correlation analyses is the coding of the looking behaviour. If I understand correctly, if an infant looked for a full second at the same object, they would get a maximum score (e.g., 1) while if they looked at 500ms at the object and 500ms away from the object, they would receive a score of e.g., 0.5. However, if they looked at one object for 500ms and another object for 500ms, they would receive a maximum score (e.g., 1). The reason seems unclear to me because these are different attention episodes, but they would be treated as one. In addition, the authors also show that within an attentional episode theta power changes (for 10mos). What is the reason behind this scoring system? Wouldn't it be better to adjust by the number of attention switches, e.g., with the formula: looking-time/(1+N_switches), so that if infants looked for a full second, but made 1 switch from one object to the other, the score would be .5, thus reflecting that attention was terminated within that episode? 

      We appreciate this suggestion. This is something we did not consider, and we thank the reviewer for raising it. In response to their comment, we have now rerun the analyses using the new measure (looking-time/(1+N_switches), and we are reassured to find that the results remain highly consistent. Please see Author response image 1 below where you can see the original results in orange and the new measure in blue at 5 and 10 months.

      Author response image 1.

      (3) Clearer definitions of variables, constructs, and visualisations

      The second issue is the overall clarity and systematicity of the paper. The concept of attention appears with many different names. Only in the abstract, it is described as attention control, attentional behaviours, attentiveness, attention durations, attention shifts and attention episode. More names are used elsewhere in the paper. Although some of them are indeed meant to describe different aspects, others are overlapping. As a consequence, the main results also become more difficult to grasp. For example, it is stated that autonomic arousal predicts attention, but it's harder to understand what specific aspect (duration of looking, disengagement, etc.) it is predictive of. Relatedly, the cognitive process under investigation (e.g., attention) and its operationalization (e.g., duration of consecutive looking toward a toy) are used interchangeably. I would want to see more demarcation between different concepts and between concepts and measurements.

      We appreciate the comment and we have clarified the concepts and their operationalisation throughout the revised manuscript.

      General Remarks

      In general, the authors achieved their aim in that they successfully showed the relationship between looking behaviour (as a proxy of attention), autonomic arousal, and electrophysiology. Two aspects are especially interesting. First, the fact that at 5 months, autonomic arousal predicts the duration of subsequent attention episodes, but at 10 months this effect is not present. Conversely, at 10 months, theta power predicts the duration of looking episodes, but this effect is not present in 5-month-old infants. This pattern of results suggests that younger infants have less control over their attention, which mostly depends on their current state of arousal, but older infants have gained cortical control of their attention, which in turn impacts their looking behaviour and arousal.

      We thank the reviewer for the close attention that they have paid to our manuscript, and for their insightful comments.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores infants' attention patterns in real-world settings and their relationship with autonomic arousal and EEG oscillations in the theta frequency band. The study included 5- and 10-month-old infants during free play. The results showed that the 5-month-old group exhibited a decline in HR forward-predicted attentional behaviors, while the 10-month-old group exhibited increased theta power following shifts in gaze, indicating the start of a new attention episode. Additionally, this increase in theta power predicted the duration of infants' looking behavior.

      Strengths:

      The study's strengths lie in its utilization of advanced protocols and cutting-edge techniques to assess infants' neural activity and autonomic arousal associated with their attention patterns, as well as the extensive data coding and processing. Overall, the findings have important theoretical implications for the development of infant attention.

      Weaknesses:

      Certain methodological procedures require further clarification, e.g., details on EEG data processing. Additionally, it would be beneficial to eliminate possible confounding factors and consider alternative interpretations, e,g., whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during the free play.

      We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.

      Reviewer #3 (Public Review):

      Summary:

      Much of the literature on attention has focused on static, non-contingent stimuli that can be easily controlled and replicated--a mismatch with the actual day-to-day deployment of attention. The same limitation is evident in the developmental literature, which is further hampered by infants' limited behavioral repertoires and the general difficulty in collecting robust and reliable data in the first year of life. The current study engages young infants as they play with age-appropriate toys, capturing visual attention, cardiac measures of arousal, and EEG-based metrics of cognitive processing. The authors find that the temporal relations between measures are different at age 5 months vs. age 10 months. In particular, at 5 months of age, cardiac arousal appears to precede attention, while at 10 months of age attention processes lead to shifts in neural markers of engagement, as captured in theta activity.

      Strengths:

      The study brings to the forefront sophisticated analytical and methodological techniques to bring greater validity to the work typically done in the research lab. By using measures in the moment, they can more closely link biological measures to actual behaviors and cognitive stages. Often, we are forced to capture these measures in separate contexts and then infer in-the-moment relations. The data and techniques provide insights for future research work.

      Weaknesses:

      The sample is relatively modest, although this is somewhat balanced by the sheer number of data points generated by the moment-to-moment analyses. In addition, the study is cross-sectional, so the data cannot capture true change over time. Larger samples, followed over time, will provide a stronger test for the robustness and reliability of the preliminary data noted here. Finally, while the method certainly provides for a more active and interactive infant in testing, we are a few steps removed from the complexity of daily life and social interactions.

      We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.

      Reviewer #1 (Recommendations For The Authors):

      Here are some specific ways in which clarity can be improved:

      A. Regarding the distinction between constructs, or measures and constructs:

      i. In the results section, I would prefer to mention looking at duration and heart rate as metrics that have been measured, while in the introduction and discussion, a clear 1-to-1 link between construct/cognitive process and behavioural or (neuro)psychophysical measure can be made (e.g., sustained attention is measured via looking durations; autonomic arousal is measured via heart-rate). 

      The way attention and arousal were operationalised are now clarified throughout the text, especially in the results.

      ii. Relatedly, the "attention" variable is not really measuring attention directly. It is rather measuring looking time (proportion of looking time to the toys?), which is the operationalisation, which is hypothesised to be related to attention (the construct/cognitive process). I would make the distinction between the two stronger.

      This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).

      B. Each analysis should be set out to address a specific hypothesis. I would rather see hypotheses in the introduction (without direct reference to the details of the models that were used), and how a specific relation between variables should follow from such hypotheses. This would also solve the issue that some analyses did not seem directly necessary to the main goal of the paper. For example:

      i. Are ACF and survival probability analyses aimed at proving different points, or are they different analyses to prove the same point? Consider either making clearer how they differ or moving one to supplementary materials.

      We clarified this in pg. 4 of the revised manuscript.

      ii. The autocorrelation results are not mentioned in the introduction. Are they aiming to show that the variables can be used for cross-correlation? Please clarify their role or remove them.

      We clarified this in pg. 4 of the revised manuscript.

      C. Clarity of cross-correlation figures. To ensure clarity when presenting a cross-correlation plot, it's important to provide information on the lead-lag relationships and which variable is considered X and which is Y. This could be done by labelling the axes more clearly (e.g., the left-hand side of the - axis specifies x leads y, right hand specifies y leads x) or adding a legend (e.g., dashed line indicates x leading y, solid line indicates y leading x). Finally, the limits of the x-axis are consistent across plots, but the limits of the y-axis differ, which makes it harder to visually compare the different plots. More broadly, the plots could have clearer labels, and their resolution could also be improved. 

      This information on what variable precedes/ follows was in the caption of the figures. However, we have edited the figures as per the reviewer’s suggestion and added this information in the figures themselves. We have also uploaded all the figures in higher resolution.

      D. Figure 7 was extremely helpful for understanding the paper, and I would rather have it as Figure 1 in the introduction. 

      We have moved figure 7 to figure 1 as per this request.

      E. Statistics should always be reported, and effects should always be described. For example, results of autocorrelation are not reported, and from the plot, it is also not clear if the effects are significant (the caption states that red dots indicate significance, but there are no red dots. Does this mean there is no autocorrelation?).

      We apologise – this was hard to read in the original. We have clarified that there is no autocorrelation present in Fig 7A and 7D.

      And if so, given that theta is a wave, how is it possible that there is no autocorrelation (connected to point 1)? 

      We thank the reviewer for raising this point. In fact, theta power is looking at oscillatory activity in the EEG within the 3-6Hz window (i.e. 3 to 6 oscillations per second). Whereas we were analysing the autocorrelation in the EEG data by looking at changes in theta power between consecutive 1 second long windows. To say that there is no autocorrelation in the data means that, if there is more 3-6Hz activity within one particular 1-second window, there tends not to be significantly more 3-6Hz activity within the 1-second windows immediately before and after.

      F. Alpha power is introduced later on, and in the discussion, it is mentioned that the effects that were found go against the authors' expectations. However, alpha power and the authors' expectations about it are not mentioned in the introduction. 

      We thank the reviewer for this comment. We have added a paragraph on alpha in the introduction (pg.4).

      Minor points:

      1. At the end of 1st page of introduction, the authors state that: 

      “How children allocate their attention in experimenter-controlled, screen-based lab tasks differs, however, from actual real-world attention in several ways (32-34). For example, the real-world is interactive and manipulable, and so how we interact with the world determines what information we, in turn, receive from it: experiences generate behaviours (35).”

      I think there's more to this though - Lab-based studies can be made interactive too (e.g., Meyer et al., 2023, Stahl & Feigenson, 2015). What remains unexplored is how infants actively and freely initiate and self-structure their attention, rather than how they respond to experimental manipulations.

      Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.

      Stahl, A. E., & Feigenson, L. (2015). Observing the unexpected enhances infants' learning and exploration. Science, 348(6230), 91-94.

      We thank the reviewer for this suggestion and added their point in pg. 4.

      (2) Regarding analysis 4:

      a. In analysis 1 you showed that the duration of attentional episodes changes with age. Is it fair to keep the same start, middle, and termination ranges across age groups? Is 3-4 seconds "middle" for 5-month-olds? 

      We appreciate the comment. There are many ways we could have run these analyses and, in fact, in other papers we have done it differently, for example by splitting each look in 3, irrespective of its duration (Phillips et al., 2023).

      However, one aspect we took into account was the observation that 5-month-old infants exhibited more shorter looks compared to older infants. We recognized that dividing each into 3 parts, regardless of its duration, might have impacted the results. Presumably, the activity during the middle and termination phases of a 1.5-second look differs from that of a look lasting over 7 seconds.

      Two additional factors that provided us with confidence in our approach were: 1) while the definition of "middle" was somewhat arbitrary, it allowed us to maintain consistency in our analyses across different age points. And, 2) we obtained a comparable amount of observations across the two time points (e.g. “middle” at 5 months we had 172 events at 5 months, and 194 events at 10 months).

      b. It is recommended not to interpret lower-level interactions if more complex interactions are not significant. How are the interaction effects in a simpler model in which the 3-way interaction is removed? 

      We appreciate the comment. We tried to follow the same steps as in (Xie et al., 2018). However, we have re-analysed the data removing the 3-way interaction and the significance of the results stayed the same. Please see Author response image 2 below (first: new analyses without the 3-way interactions, second: original analyses that included the 3-way interaction).

      Author response image 2.

      (3) Figure S1: there seems to be an outlier in the bottom-right panel. Do results hold excluding it? 

      We re-run these analyses as per this suggestion and the results stayed the same (refer to SM pg. 2).

      (4) Figure S2 should refer to 10 months instead of 12.

      We thank the reviewer for noticing this typo, we have changed it in the reviewed manuscript (see SM pg. 3). 

      (5) In the 2nd paragraph of the discussion, I found this sentence unclear: "From Analysis 1 we found that infants at both ages showed a preferred modal reorientation rate". 

      We clarified this in the reviewed manuscript in pg10

      (6) Discussion: many (infant) studies have used theta in anticipation of receiving information (Begus et al., 2016) surprising events (Meyer et al., 2023), and especially exploration (Begus et al., 2015). Can you make a broader point on how these findings inform our interpretation of theta in the infant population (go more from description to underlying mechanisms)? 

      We have extended on this point on interpreting frequency bands in pg13 of the reviewed manuscript and thank the reviewer for bringing it up.

      Begus, K., Gliga, T., & Southgate, V. (2016). Infants' preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397-12402.

      Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.

      Begus, K., Southgate, V., & Gliga, T. (2015). Neural mechanisms of infant learning: differences in frontal theta activity during object exploration modulate subsequent object recognition. Biology letters, 11(5), 20150041.

      (7) 2nd page of discussion, last paragraph: "preferred modal reorientation timer" is not a neural/cognitive mechanism, just a resulting behaviour. 

      We agree with this comment and thank the reviewer for bringing it out to our attention. We clarified this in in pg12 and pg13 of the reviewed manuscript.

      Reviewer #2 (Recommendations For The Authors):

      I have a few comments and questions that I think the authors should consider addressing in a revised version. Please see below:

      (1) During preprocessing (steps 5 and 6), it seems like the "noisy channels" were rejected using the pop_rejchan.m function and then interpolated. This procedure is common in infant EEG analysis, but a concern arises: was there no upper limit for channel interpolation? Did the authors still perform bad channel interpolation even when more than 30% or 40% of the channels were identified as "bad" at the beginning with the continuous data? 

      We did state in the original manuscript that “participants with fewer than 30% channels interpolated at 5 months and 25% at 10 months made it to the final step (ICA) and final analyses”. In the revised version we have re-written this section in order to make this more clear (pg. 17).

      (2) I am also perplexed about the sequencing of the ICA pruning step. If the intention of ICA pruning is to eliminate artificial components, would it be more logical to perform this procedure before the conventional artifacts' rejection (i.e., step 7), rather than after? In addition, what was the methodology employed by the authors to identify the artificial ICA components? Was it done through manual visual inspection or utilizing specific toolboxes? 

      We agree that the ICA is often run before, however, the decision to reject continuous data prior to ICA was to remove the very worst sections of data (where almost all channels were affected), which can arise during times when infants fuss or pull the caps. Thus, this step was applied at this point in the pipeline so that these sections of really bad data were not inputted into the ICA. This is fairly widespread practice in cleaning infant data.

      Concerning the reviewer’s second question, of how ICA components were removed – the answer to this is described in considerable detail in the paper that we refer to in that setion of the manuscript. This was done by training a classifier specially designed to clean naturalistic infant EEG data (Haresign et al., 2021) and has since been employed in similar studies (e.g. Georgieva et al., 2020; Phillips et al., 2023).

      (3) Please clarify how the relative power was calculated for the theta (3-6Hz) and alpha (6-9Hz) bands. Were they calculated by dividing the ratio of theta or alpha power to the power between 3 and 9Hz, or the total power between 1 (or 3) and 20 Hz? In other words, what does the term "all frequency bands" refer to in section 4.3.7? 

      We thank the reviewer for this comment, we have now clarified this in pg. 22.

      (4) One of the key discoveries presented in this paper is the observation that attention shifts are accompanied by a subsequent enhancement in theta band power shortly after the shifts occur. Is it possible that this effect or alteration might be linked to infants' saccades, which are used as indicators of attention shifts? Would it be feasible to analyze the disparities in amplitude between the left and right frontal electrodes (e.g., Fp1 and Fp2, which could be viewed as virtual horizontal EOG channels) in relation to theta band power, in order to eliminate the possibility that the augmentation of theta power was attributable to the intensity of the saccades? 

      We appreciate the concern. Average saccade duration in infants is about 40ms (Garbutt et al., 2007). Our finding that the positive cross-correlation between theta and look duration is present not only when we examine zero-lag data but also when we examine how theta forwards-predicts attention 1-2 seconds afterwards seems therefore unlikely to be directly attributable to saccade-related artifact. Concerning the reviewer’s suggestion – this is something that we have tried in the past. Unfortunately, however, our experience is that identifying saccades based on the disparity between Fp1 and Fp2 is much too unreliable to be of any use in analysing data. Even if specially positioned HEOG electrodes are used, we still find the saccade detection to be insufficiently reliable. In ongoing work we are tracking eye movements separately, in order to be able to address this point more satisfactorily.

      (5) The following question is related to my previous comment. Why is the duration of the relationship between theta power and moment-to-moment changes in attention so short? If theta is indeed associated with attention and information processing, shouldn't the relationship between the two variables strengthen as the attention episode progresses? Given that the authors themselves suggest that "One possible interpretation of this is that neural activity associates with the maintenance more than the initiation of attentional behaviors," it raises the question of (is in contradiction to) why the duration of the relationship is not longer but declines drastically (Figure 6). 

      We thank the reviewer for raising this excellent point. Certainly we argue that this, together with the low autocorrelation values for theta documented in Fig 7A and 7D challenge many conventional ways of interpreting theta. We are continuing to investigate this question in ongoing work.

      (6) Have the authors conducted a comparison of alpha relative power and HR deceleration durations between 5 and 10-month-old infants? This analysis could provide insights into whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during free play.

      We thank the reviewer for this suggestion. Indeed, this is an aspect we investigated but ultimately, given that our primary emphasis was on the theta frequency, and considering the length of the manuscript, we decided not to incorporate. However, we attached Author response image 3 below showing there was no significant interaction between HR and alpha band.

      Author response image 3.

      Reviewer #3 (Recommendations For The Authors):

      (1) In reading the manuscript, the language used seems to imply longitudinal data or at the very least the ability to detect change or maturation. Given the cross-sectional nature of the data, the language should be tempered throughout. The data are illustrative but not definitive. 

      We thank the reviewer for this comment. We have now clarified that “Data was analysed in a cross-sectional manner” in pg15.

      (2) The sample size is quite modest, particularly in the specific age groups. This is likely tempered by the sheer number of data points available. This latter argument is implied in the text, but not as explicitly noted. (However, I may have missed this as the text is quite dense). I think more notice is needed on the reliability and stability of the findings given the sample. 

      We have clarified this in pg16.

      (3) On a related note, how was the sample size determined? Was there a power analysis to help guide decision-making for both recruitment and choosing which analyses to proceed with? Again, the analytic approach is quite sophisticated and the questions are of central interest to researchers, but I was left feeling maybe these two aspects of the study were out-sprinting the available data. The general impression is that the sample is small, but it is not until looking at table s7, that it is in full relief. I think this should be more prominent in the main body of the study.

      We have clarified this in pg16.

      (4) The devotes a few sentences to the relation between looking and attention. However, this distinction is central to the design of the study, and any philosophical differences regarding what take-away points can be generated. In my reading, I think this point needs to be more heavily interrogated. 

      This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).

      (5) I would temper the real-world attention language. This study is certainly a great step forward, relative to static faces on a computer screen. However, there are still a great number of artificial constraints that have been added. That is not to say that the constraints are bad--they are necessary to carry out the work. However, it should be acknowledged that it constrains the external validity. 

      We have added a paragraph to acknowledged limitations of the setup in pg. 14.

      (6) The kappa on the coding is not strong. The authors chose to proceed nonetheless. Given that, I think more information is needed on how coders were trained, how they were standardized, and what parameters were used to decide they were ready to code independently. Again, with the sample size and the kappa presented, I think more discussion is needed regarding the robustness of the findings. 

      We appreciate the concern. As per our answer to R1, we chose to report the most stringent calculator of inter-rater reliability, but other calculation methods (i.e., percent agreement) return higher scores (see response to R1).

      As per the training, we wrote an extensively detailed coding scheme describing exactly how to code each look that was handed to our coders. Throughout the initial months of training, we meet with the coders on a weekly basis to discuss questions and individual frames that looked ambiguous. After each session, we would revise the coding scheme to incorporate additional details, aiming to make the coding process progressively less subjective. During this period, every coder analysed the same interactions, and inter-rater reliability (IRR) was assessed weekly, comparing their evaluations with mine (Marta). With time, the coders had fewer questions and IRR increased. At that point, we deemed them sufficiently trained, and began assigning them different interactions from each other. Periodically, though, we all assessed the same interaction and meet to review and discuss our coding outputs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments on our manuscript and their appreciation of the results. We provide point-by-point responses bellow. For your convenience we highlight here the main changes to the manuscript.

      ·        More descriptive terminology for the contextual cues (Ctx.A / Ctx.noA is now referred to as LIGHT / DARK).

      ·        Schematic of experiment timeline highlighting the exclusion of non-discriminators following the initial acquisition period. This explains the absence of baseline sex differences post acquisition and clears up some misconceptions about lack of replicability.

      ·        New data (time in port preCS) showing that a prior reward does not cause continued presence in port.

      ·        Several text edits to address all the points raised by the reviewers.

      We hope that the editors and reviewers will be satisfied with this revised version and find the strength of the evidence more convincing.

      Reviewer #1 (Recommendations For The Authors):

      In relation to weaknesses points 1-4 in the public review:

      (1) With regards to the claim (page 4 of pdf), I think I can see what the authors are getting at when they claim "Only Ctx-dep.01 engages context-gated reward predictions", because the same reward is available in each context, and the animal must use contextual information to determine which cue will be rewarded. In other words, it has a discriminative purpose. In Ctx-dep.O1/O2, however, although the context doesn't serve a discriminative purpose in the sense that one cue will always earn a unique outcome, regardless of context, the fact that these cues are differentially rewarded in the different context means that animals may well form context-gated cue-outcome associations (e.g. CtxA-(CS1-O1), CtxnoA-(CS2-O2)). Moreover, the context is informative in this group in telling the animal which cue will be rewarded, even prior to outcome delivery, such that I don't think contextual information will fade to the background of the association and attention be lost to it in the way, say Mackintosh (1975) might predict. Therefore, I don't think this statement is correct.

      I suggest that the authors refine the statement to be more accurate.

      We agree with the reviewer —the context is absolutely relevant for rats trained in the Ctx-dep. O1/O2 task. We have edited the text in several places to make this clear. The question is how (by what mechanism) does the context participate in the control of behavior in this group. The reviewer correctly points out that, just like rats trained in the Ctx-dep. O1 task, rats trained in the Ctx-dep. O1/O2 might have formed context-gated cue-outcome associations. We now clearly acknowledge that in the text.

      However, because in this group the two outcomes are always encountered in different contexts, we argue that these rats could also have formed a direct association between the two contexts and the two outcomes. In other words, each context might directly evoke the expectation of a distinct reward outcome (prepare to drink, or prepare to eat). On a given trial, if the cue and context both tend to activate the same outcome representation, the converging cue+context excitation can add up. This would produce a context-sensitive response, but not via hierarchical modulation process (unlike Ctx-dep O1). Arguably, this last associative mechanism is much simpler and might explain why almost all rats in Ctx-dep. O1/O2 group learned the discrimination and at a much faster rate.

      Therefore, while rats trained in Ctx-dep O1/O2 might engage a combination of associative processes to achieve context-sensitive behavior (including hierarchical associations), only rats in the Ctx-dep O1 critically and unambiguously rely on hierarchical associations to achieve context-sensitive behavior.

      (2) I think the results shown in Figure 1 are very interesting, and well supported by the statistics. It's so nice to see a significant interaction, as so many papers try to report these types of effects without it. However, I do wonder how specific the results are to contextual modulation. That is, should a discriminative discrete cue be used instead of each context (e.g. CS1 indicates CS2 earns O1, CS3 indicates CS4 earns O1), would female rats still be as slow to learn the discrimination?

      I am just curious as to whether the authors have thoughts on this.

      We have not tested this and are not aware of a paper that examined this question specifically.

      However, we would like to point out that in the suggested design (CS1→[CS2→O1]; CS3→[CS4→O1]) the discriminative cues (CS1 and CS3) would almost certainly also acquire substantial reward-predictive value, either because of their direct association with the reward, or via second-order conditioning. This would complicate the interpretation of the results in terms of hierarchical associations. Incorporating non-rewarded presentation of CS1 and CS3 alone (i.e. extinguishing those cues, as is sometimes done in occasion setting experiments) would be one way to reduce the reward expectation evoked by those cues, but this approach has some limitations. Indeed, as mentioned by Rescorla (2006) “During extinction, the net associative strength of a stimulus declines to the level of [a response] threshold, but further decrement stops at that point”. So while extinguished CS1 and CS3 might no longer evoke overt behavioral responses, these cues could retain nonnegligible subthreshold excitatory connection with the US.  Individually, these cues might fail to evoke responding but could nonetheless increase responding during the CS1→CS2 trials (or CS3→CS4 trials), via simple summation. (Rescorla, 2006: “the compound of two [extinguished] stimuli has a strength that exceeds the threshold and so evokes responding”).

      This type of consideration is precisely why we opted for the behavioral task used in the study. In Ctx-dep. O1, the discriminative stimuli exert opposite effects on the two target cues, which rules out summation effects as a mechanism for context-sensitive behavior.

      (3) Pages 8-9 of pdf, where the biological basis or the delayed acquisition of contextual control in females is considered, I find this to be written from a place of assuming that what is observed in the males is the default behaviour. That is, although the estrous cycle and its effects on synaptic plasticity/physiology may well account for the results, is there not a similar argument to be made for androgens in males? Perhaps the androgens also somehow alter synaptic plasticity/physiology, leading to their faster speed, reduced performance stability, and increased susceptibility to stress.

      I would like the argument that female behaviour might be the default, and male behaviour the deviation to be considered in the discussion in addition to those already stated.

      We regret if we gave the impression that male behavior was the default. The paper is intended to report sex differences but we don’t view either sex as the default. To correct this impression, we have added a few sentences in the discussion to highlight male-hormonal factors as well as non-gonadal genetic factors that might have contributed to the observed sex differences.

      (4) In addition, the OFC - which is the brain region found to have differential expression of c-fos in males and females in Figure 5 - is not explicitly discussed with regard to the biological mechanisms of differences, which seems odd.

      I suggest OFC be discussed with regard to biological mechanisms of differences.

      We added a few sentences in the discussion to i) highlight the parallel between our study and human fMRI studies showing superior OFC activation in females during the regulation of emotional responses, ii) Suggest a potential relationship between the reported sex differences (speed of acquisition, robustness of performance, and OFC activation in context-gated reward prediction), iii) acknowledge our ignorance of the root causes of these sex differences.

      We wish we could offer a better answer. We have attempted to offer possible proximal explanations for the observed sex differences, but ultimately our work did not address the root causes of these behavioral and neural sex differences. Therefore we feel that further attempts to explain these differences would be too speculative.

      (5) I did wonder if the authors were aware that in the Rescorla-Wagner model, contextual stimuli are thought to summate with discrete cues to enter into the association with the outcome (i.e., the error term is between lambda and sigmaV, with sigmaV the 'summation' of all stimuli present on a trial, including contextual stimuli). Typically, this is not considered much, because the cue itself is so salient and more consistently paired with reward (whereas the ever-present context is often paired with no reward), but nevertheless, it is a part of the association. I'm not sure it's wrong to say that the background circumstances under which events occur are thought to play little role (as in the second sentence of the introduction), but I was wondering if the authors were aware of this fact when they wrote that.

      This sentence in the introduction was meant to introduce the distinction between eliciting stimuli and modulating contexts. Admittedly, this paints a naive picture, which we now acknowledge (we hope that the rest of the paper provides more nuance). As pointed out by this reviewer, the context is also a stimulus, and, just like any other stimulus, it is eligible for direct association with an outcome. The possibility for direct context→outcome association is precisely the rational for the Ctx-dep O1/O2 group.

      (6) Context-noA - Seems a little confusing for a name, why not just call it context B? NoA appears to imply that nothing happens in A or no outcome is available, whereas this is not always the case.

      We debated which terminology to use. We felt that “Context A vs. Context B” should perhaps be reserved to situations where the global context changes (e.g. two different conditioning boxes with different odors, floor texture etc., with proper counterbalancing procedures). We felt that “Context A vs noA” might be more appropriate here, as we are manipulating the local context by introducing (or removing) one single stimulus (the houselight). In this revised version we followed this reviewer’s advice and adopted a more descriptive, and hopefully less confusing, terminology: "Light vs Dark”.

      (7) Why is it that in the text the Ctx-dep O1/O2 is explained before simple and no discrimination, but in the Figure Ctx-dep O1/O2 is shown last? These should be consistent.

      Thanks for pointing that out. We have switched the order of task description to be consistent with the figures.

      (8) Page 6 (of pdf) - could the authors elaborate a little on why or how (or both) the delivery of reward can interfere with the expression of context-dependent discrimination? Do they just mean the performance of discrimination (e.g., animals will sit at the food port longer if there is food there because they are sitting there and eating it, which does not necessarily reflect the expectation of food based on cue presentations?), in which case it is not the discrimination itself that is being interfered with, just the measure of it. Perhaps the authors could elaborate by just inserting a sentence.

      We have added a few sentences to discuss this effect.

      The first clarification that we can make is that the reduced discrimination performance following reward is not simply due to animals’ continued presence in the reward port. We have added the time pre-cue to Fig. 3 B-F. This measure is not affected by previous reward history, showing that rats are leaving the port between trials.

      So what is driving this effect? At this stage, we are agnostic about the mechanism(s) for this effect. Kuchibhotla et al. (2019) —who first reported a similar effect— proposed a model in which recent rewards modify the threshold for behavioral responses (i.e. performance). In this model, a cue might evoke a weak reward prediction but evoke a strong behavioral response if presented after a reward. Additionally, we believe that learning factors might also contribute to the effect reported here. Indeed, the behavioral response on a given trial likely reflects the balance of hierarchical (context-dependent) associations vs. direct associations (Bradfield and Balleine, 2013). Naturally, this balance is dynamic and influenced by trial history. For instance, a Light:X+ trial might increase the value of cue X and promote responding during the following Dark:X- trial. The same logic could be applied to the influence of the context (e.g., Light:X+ trial might promote responding to a subsequent Light:Y- trial). We are currently working on a computational model that captures the dynamic interplay between hierarchical associations and direct associations. We hope that this model will provide some insight into the learning/performance mechanism for the effects reported here. However this computational work is still in the early stages and beyond the scope of the present study.

      (9) The lack of effect in the Ctx-dep O1/O2 groups in Figure 4 could be due to a lack of power - the group sizes are a lot smaller for this group than for Ctx-dep O1 where an interaction was detected. I think this should be at least addressed in the discussion (i.e., that this lack of effect is possibly due to less power here, as the effects are in the same direction).

      Good point. We now acknowledge this limitation in the text.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please comment on the failure to replicate the sex differences across experiments. Perhaps this is due to some change in the training procedure that is briefly mentioned in the methods (a reduction in the number of rewarded trials) but it is unclear.

      The reviewer correctly observed that Fig. 3-5 do not show sex differences in baseline condition. This is not because of a replication failure, but because non-discriminating subjects were excluded from the experiment at the end of the acquisition period (after 72 training sessions). We now clarify this in the Method and Results section. We also added a schematic of the experiment timeline that highlights the exclusion of non-discriminators at the end of the acquisition period (Fig 1).

      On the topic of replicability, the data for Ctx-dep O1 was collected over 3 cohorts (over the course of 2 years) and the sex difference pattern was consistent.  For instance, the proportion of discriminators vs. non-discriminators for males and females trained in Ctx-dep O1, showed similar patterns across cohorts (see below).

      Author response table 1.

      (2) The design of this experiment makes it possible to analyse whether there is a differential outcome effect (DOE). The DOE would indeed predict better discrimination in group cxt-dep O1/O2 versus cxt-dep O1, which seems to be exactly what the authors observe although between-group statistics are not reported. Inspection of Figure 1 suggests that there may be a DOE in females but not in males. I wonder if the authors might consider reanalysing the data to check this.

      Indeed, there is clearly a differential outcome effect. We now point out this DOE in relation to the latency to achieve discrimination criterion (Fig. 2 C-D). Rats in the Ctx-dep. O1/O2 group acquired discrimination (reached criterion) much faster than rats in in the Ctx-dep. O1 group.

      Following the reviewer’s suggestion, we provide here the results of targeted ANOVAs (focusing exclusively on Ctx-dep. O1 and Ctx-dep. O1/O2) to investigate a potential sex-dependent effect of DOE (i.e. Sex x Task interactions), see figure below. A three-way ANOVA (Sex x Task x Session) conducted on the discrimination index reveal a main effect of Task (F1, 86 \= 173.560, P < 0.001), Session (F2.678, 230.329 \= 140.479, P<0.001) and a marginal effect of Sex (F1,86 = 3.929, P = 0.051), but critically no Task x Sex or Task x Sex x Session interaction (P ≥ 0.504). A two-way ANOVA (Sex x Task) conducted on the sessions to criterion revealed a main effect of both factors (Sex F1, 63 = 9.52, P = 0.003; Task F1, 62 = 184.143, P < 0.001) but critically, no Sex x Task interaction (P = 0.233).  These results indicate that the use of two different outcomes clearly facilitated the acquisition of context-dependent discrimination (DOE effect), but this effect benefited both sexes equally. We thank the reviewer for recommending this analysis.

      Author response image 1.

      Differential outcome effect (DOE) affects males and females equally. A. Discrimination ratio over the acquisition period. B. trials to criterion. Compared to animals trained with a single outcome (Ctx-dep. O1), the introducing dissociable outcomes for the two type of rewarded trials (Ctx-dep. O1/O2) profoundly facilitated the acquisition of discriminated behavior. This effect benefited both sexes equally.

      (3) Some minor points for clarification that the authors may also wish to address:

      - Figure 3: is data presented from sessions 71-80 only or for all sessions? I didn't fully follow the explanation offered in the results section.

      That’s right. The data presented in Fig. 3 considers only sessions 71-80, in discriminator rats —when performance is globally stable. We have edited the text to make this clearer. These 10 sessions represent a total of 800 trials (=10 session * 80 trials). The first trial of a session what not included in the analysis since it was not preceded by any trial. For the remaining 790 trials (10 session x 79 trials), we examined how the outcome of the past trial (reward or nonrewarded) influenced responding on the next trial.  This large sample size (790 trials / rat) was required to ensure that enough data was collected for each possible trial history scenario.

      - The authors argue that females are protected from the disrupting effect of stress. It might be useful if the authors offer further explanation as to what they mean by "protected".

      By “protected”, we simply mean “less sensitive”. We have reworded this sentence in that way. We do not claim to have an understanding of the precise mechanism for this sex dependent effect (although our data point to a possible role of the OFC).

      - The authors state that "delivery of reward, while critical for learning, can also interfere with the expression of context-dependent discrimination". This statement should be explained in further detail. For instance, why should reward delivery specifically impair context-dependent discrimination but not other forms of discrimination?

      We have reworded this sentence to be more inclusive. Indeed, delivery of reward also interferes with other forms of discrimination, particularly when discrimination performance is not yet optimal. We have also added a paragraph to discuss the possible mechanisms by which reward might interfere with discrimination performance in our task.   

      Reviewer #3 (Recommendations For The Authors):

      I do not suggest additional experiments, but I do hope you continue the behavioral work to characterize what is being learned in the task. I think the approach is promising. I would suggest reporting the % time in port and port entries for the entire CS. There is no justification for only analyzing the response in the last 5s.

      We thank the reviewer for the encouragement.

      We opted to focus on the time in port for two main reasons:

      (1) This measure is relatively consistent across the two different reward outcomes (unlike the rate of port entries). Indeed, consistent with prior studies (Delamater et al., 2017), we observed that the type of reward (solid or liquid) influences the topography of the anticipatory magazine-directed behavior. Specifically, cues paired with pellets elicited significantly more port entries than cues paired with chocolate milk. The opposite pattern was observed for time in port --cues paired with chocolate milk elicited more sustained time in port compared to cues paired with pellets (see figure below). While these measures (port entries and time in port) show opposite bias for the two possible outcomes, the size of this bias is much smaller for the time in port (Cohen’s d effect size: port entries: 1.41; time in port: 0.62). As a result, the discrimination ratio calculated from Time in port is consistent across the two outcomes (P = 0.078; effect size: 0.07), which is not the case for the discrimination ratio calculated from port entries (P = 0.007; effect size 0.32 see figure below).

      (2) Unlike the rate of port entries, the time in port shows monotonic increase during training in these tasks. Indeed, we observed here and in past work (Keiflin et al., 2019), that the rate of port entries initially increases with training, but then slightly decreases; particularly for cues paired with liquid reward. In contrast, the time in port continues to increase, or remains high, with extended training. This is easy to understand if we consider the extreme case of a hypothetical rat that might enter the port once upon cue presentation and maintain continued presence in port for the whole cue duration. This rat would have a relatively low rate of port entry (a single port entry per trial) but a high time in port.

      This is not to say that the rate of port entries is not a valid measure overall (we have used, and continue to use, this metric in other preparations). However, for the reasons explained above, we believe that the time in port is a better metric for reward anticipation in this specific study.

      Moreover, we chose to focus our analysis on the last 5s of the cue because that’s when anticipatory food cup behavior is more reliably observed (in our preparation >2/3 of the total time in port in occurs during the last 5s of the cue) and less contaminated by orienting behaviors (Holland, 1977, 1980, 2000). For these reasons, analysis of the last portion of the cue is relatively common in Pavlovian anticipatory approach preparations (El-Amamy and Holland, 2007; Olshavsky et al., 2013; Esber et al., 2015; Holland, 2016a, 2016b; Schiffino and Holland, 2016; Gardner et al., 2017; Sharpe et al., 2021; Maes et al., 2020; Sharpe et al., 2020; Siemian et al., 2021; Kang et al., 2021). Reporting time in port during the same cue epoch facilitates comparisons between these studies.

      We have edited the text in the Method section to provide a brief justification for focusing our analyses on this cue epoch.

      Author response image 2.

      Outcome identity influences the topography of the conditioned response. A-C: Conditioned responding expressed as the number of port entries per trial (A) or time in port per trials (C) for rats trained in the simple discrimination task with a chocolate milk reward (n= 19) or a sucrose pellet (n = 16). Data show the average of the last three 3 sessions. Compared to chocolate milk, pellets tend to produce more port entries. Conversely, chocolate milk tend to produce more time in port. However the magnitude of this bias is smaller for the Time in port. C-D: discrimination ratio calculate from the number of port entries (C) or the time in port (D); the latter is not affected by the outcome identity. *P<0.05; **P<0.01; ***P<0.001 T tests.

      The inconsistent use of terms is distracting throughout the paper. Is it discriminated or context-gated? Please provide a definition of your terms and then use them consistently. Is it a discriminative stimulus, a context, or an occasion setter? These all imply slightly different things and it would help the reader if you just used one term throughout the paper.

      Thanks for pointing that out. We have added a definition for “context-gated” and edited the text to keep the terminology consistent when appropriate. The words “discrimination”/”discriminated” still appear in the manuscript but without implying a mechanism (all tasks are variations of Pavlovian discrimination; the rats discriminating between rewarded and non-rewarded trials).

      As mentioned by this reviewer, the terms “context” and “occasion setter” are not synonymous. Therefore these terms still appear in the manuscript to refer to different concepts (e.g. in our task the visual stimulus is a context for all rats; this context acts as an occasion setter only for some rats).

      Minor:

      Intro, 2nd PP: "autism". This is abbreviated in the abstract but spelled out here. I suggest not abbreviating in the abstract and introducing abbreviations here, as you do with PTSD.

      Fixed as suggested

      Have deficits in contextual modulation been distinguished from potential deficits in binary associative learning in autism, PTSD, and substance use disorders? This is implied, but there are no citations provided.

      We provide a list of references showing deficits in contextual modulation in these disorders.

      This does not mean that these disorders are reducible to deficits in contextual modulation and it does not exclude other forms of deficits in those disorders --including alterations in certain aspects of binary associative learning.

      "In positive occasion-setting, animals learn that a target cue (X) results in a reward outcome (+) only when that cue is accompanied by a contextual feature (A); the same cue presented in absence of this contextual feature remains without consequence (A:X+ / X-)." - there are words missing in this sentence.

      We apologize but we fail identify the missing word(s). Perhaps the reviewer could be more specific and we will be happy to edit the sentence as needed.

      What is a contextual feature, is this redundant or can you provide a specific definition?

      We use the terminology “feature” and “target” as these are the standard terms in the description of occasion setting preparations (one stimulus, “the feature”, sets the occasion for responding –or not responding- to the “target” cue). By contextual feature, we meant that in this specific example the context was the feature. We have clarified this in the text. We believe that these terms are not redundant. Indeed, the context is not always a feature, and a feature is not necessarily a context (phasic cues can serve as “features”).

      Can you provide some background on studies of sex differences in simple associative learning? You imply these have been much more thoroughly studied than conditional discriminations.

      We added a few references as suggested.

      What is the rationale for studying stress?

      Stressful life events exacerbate several mental illnesses, potentially by impacting cognitive functions.

      Although the (sex-dependent) effects of stress on some cognitive function are well established (e.g. working memory, selective attention, spatial navigation), the effect of stress on contextual modulation (a core dysfunction in certain mental illnesses) --and the possible sex-differences in this effect-- had not been formally tested. We added a few sentences in the results section (at the beginning of the stress section) to remind the reminder of why we tested the effect of stress in this task.

      Method/Results:

      Cues are not counterbalanced; the feature is visual and targets are auditory - this should be noted as a limitation in the discussion section.

      We now acknowledge this limitation in the discussion. Moreover we believe that the new terminology for the context —Light vs Dark— (instead of A vs. noA in the original version) makes it abundantly clear that the “context” is this study was always visual.

      Summation is invoked to describe the discrimination with different outcomes, how is summation happening? This is not described. Perhaps incorporate the literature on conditional discriminations with differential outcomes (the "differential outcomes effect").

      We have edited the Result + Discussion section to clarify how summation might contribute to discrimination with different outcomes. We have also added references for the DOE in this task.

      The stress effect is confounded with test order; comparing stress vs. baseline.

      Sorry we don’t understand this point. The “baseline” refers to the animal’s performance on the last training session before the acute stress manipulation (we have edited the text to make this clear). Animals are first trained in the task and then we examine how stress alters their performance in this learned task. We don’t see how this could induce a test order confound.

      Throughout the results section, it would be helpful to have the number of animals reported for each analysis.

      The number of animals for each part of the experiment is now reported in the text, as well as in the figures.

      Discussion:

      "For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that hierarchically modulates the associative strength between a target cue and its outcome." This is inaccurate. Occasion setters do not change or modulate the associative strength of a target cue. They modulate whether excitation or inhibition is expressed.

      We reworded the sentence as suggested: “For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that modulates the response to a target cue”.

      "Together, these results indicate that the sex differences observed here are not attributable to simple associative, motivational, working-memory, or attentional processes, but are specific to the neurocomputational operations required for the hierarchical, contextual control of behavior." It should be noted here that the difference is one of degree, a quantitative difference, but not a difference in the qualitative features of the process.

      "Regardless of the precise mechanism, our results indicate that, compared to male rats, females ultimately achieved more stable contextual control over cued reward-seeking; their behavior remained context-regulated under stress or after recent rewards." Again this is a matter of degree.

      We absolutely agree. All the sex-difference reported here are a matter of degree. In the framework of McCarthy et al. (2012) the reported effects are type 2 or type 3 sex differences, not type 1 sexual dimorphism. We made a few edits in the Discussion to clarify this point.

      Procedure:

      Please clarify the percentage of trials that were reinforced in the No Discrimination group.

      From session 1-32 (acquisition period), 50% of the trials were reinforced. Following this acquisition period, only 25% of the trials were reinforced to match all the other groups. We have edited the method section to clarify this point.

      Please provide the dimensions of the restraint tubes and the model number if available.

      This information is now included.

      References

      Bradfield LA, Balleine BW (2013) Hierarchical and binary associations compete for behavioral control during instrumental biconditional discrimination. J Exp Psychol Anim Behav Process 39:2–13.

      Delamater AR, Garr E, Lawrence S, Whitlow JW (2017) Elemental, configural, and occasion setting mechanisms in biconditional and patterning discriminations. Behav Processes 137:40–52.

      El-Amamy H, Holland PC (2007) Dissociable effects of disconnecting amygdala central nucleus from the ventral tegmental area or substantia nigra on learned orienting and incentive motivation. Eur J Neurosci 25:1557–1567.

      Esber GR, Torres-Tristani K, Holland PC (2015) Amygdalo-striatal interaction in the enhancement of stimulus salience in associative learning. Behav Neurosci 129:87–95.

      Gardner MPH, Conroy JS, Shaham MH, Styer CV, Schoenbaum G (2017) Lateral Orbitofrontal Inactivation Dissociates Devaluation-Sensitive Behavior and Economic Choice. Neuron 96:1192–1203.e4.

      Holland PC (1977) Conditioned stimulus as a determinant of the form of the Pavlovian conditioned response. J Exp Psychol Anim Behav Process 3:77–104.

      Holland PC (1980) CS-US interval as a determinant of the form of Pavlovian appetitive conditioned responses. J Exp Psychol Anim Behav Process 6:155–174.

      Holland PC (2000) Trial and intertrial durations in appetitive conditioning in rats. Anim Learn Behav 28:121–135.

      Holland PC (2016a) Enhancing second-order conditioning with lesions of the basolateral amygdala. Behav Neurosci 130:176–181.

      Holland PC (2016b) Effects of amygdala lesions on overexpectation phenomena in food cup approach and autoshaping procedures. Behav Neurosci 130:357–375.

      Kang M, Reverte I, Volz S, Kaufman K, Fevola S, Matarazzo A, Alhazmi FH, Marquez I, Iordanova MD, Esber GR (2021) Agency rescues competition for credit assignment among predictive cues from adverse learning conditions. Sci Rep 11:16187.

      Keiflin R, Pribut HJ, Shah NB, Janak PH (2019) Ventral tegmental dopamine neurons participate in reward identity predictions. Curr Biol 29:93–103.e3.

      Kuchibhotla KV, Hindmarsh Sten T, Papadoyannis ES, Elnozahy S, Fogelson KA, Kumar R, Boubenec Y, Holland PC, Ostojic S, Froemke RC (2019) Dissociating task acquisition from expression during learning reveals latent knowledge. Nat Commun 10:2151.

      Maes EJP, Sharpe MJ, Usypchuk AA, Lozzi M, Chang CY, Gardner MPH, Schoenbaum G, Iordanova MD (2020) Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat Neurosci 23:176–178.

      McCarthy MM, Arnold AP, Ball GF, Blaustein JD, De Vries GJ (2012) Sex differences in the brain: the not so inconvenient truth. J Neurosci 32:2241–2247.

      Olshavsky ME, Song BJ, Powell DJ, Jones CE, Monfils M-H, Lee HJ (2013) Updating appetitive memory during reconsolidation window: critical role of cue-directed behavior and amygdala central nucleus. Front Behav Neurosci 7:186.

      Rescorla RA (2006) Deepened extinction from compound stimulus presentation. J Exp Psychol Anim Behav Process 32:135–144.

      Schiffino FL, Holland PC (2016) Secondary visual cortex is critical to the expression of surprise-induced enhancements in cue associability in rats. Eur J Neurosci 44:1870–1877.

      Sharpe MJ, Batchelor HM, Mueller LE, Gardner MPH, Schoenbaum G (2021) Past experience shapes the neural circuits recruited for future learning. Nat Neurosci 24:391–400.

      Sharpe MJ, Batchelor HM, Mueller LE, Yun Chang C, Maes EJP, Niv Y, Schoenbaum G (2020) Dopamine transients do not act as model-free prediction errors during associative learning. Nat Commun 11:106.

      Siemian JN, Arenivar MA, Sarsfield S, Borja CB, Russell CN, Aponte Y (2021) Lateral hypothalamic LEPR neurons drive appetitive but not consummatory behaviors. Cell Rep 36:109615.

    1. Reviewer #1 (Public Review):

      Summary:

      This study examines a hypothesized link between autism symptomatology and efference copy mechanisms. This is an important question for a number of reasons. Efference copy is both a critical brain mechanism that is key to rapid sensorimotor behaviors, and one that has important implications for autism given recent empirical and theoretical work implicating atypical prediction mechanisms and atypical reliance on priors in ASD.<br /> The authors test this relationship in two different experiments, both of which show larger errors/biases in spatial updating for those with heightened autistic traits (as measured by AQ in neurotypical (NT) individuals).

      Strengths:

      The empirical results are convincing - effects are strong, sample sizes are sufficient, and the authors also rule out alternative explanations (ruling out differences in motor behavior or perceptual processing per se).

      Weaknesses:

      My main residual concern is that the paper should be more transparent about both (1) that this study does not include individuals with autism, and (2) acknowledging the limitations of the AQ.<br /> On the first point, and I don't think this is intentional, there are several instances where the line between heightened autistic traits in the NT population and ASD is blurred or absent. For example, in the second sentence of the abstract, the authors state "Here, we examine the idea that sensory overload in ASD may be linked to issues with efference copy mechanisms". I would say this is not correct because the authors did not test individuals with ASD. I don't see a problem with using ASD to motivate and discuss this work, but it should be clear in key places that this was done using AQ in NT individuals.<br /> For the second issue, the AQ measure itself has some problems. For example, reference 38 in the paper (a key AQ paper) also shows that the AQ is skewed more male than modern estimates of ASD, suggesting that the AQ may not fully capture the full spectrum of ASD symptomatology.<br /> Of course, this does not mean that the AQ is not a useful measure (the present data clearly show that it captures something important about spatial updating during eye movements), but it should not be confused with ASD, and its limitations need to be acknowledged. My recommendation would be to do this in the title as well - e.g. note impaired visuomotor updating in individuals with "heightened autistic traits".

      Suggestions for improvement:<br /> - Figure 5 is really interesting. I think it should be highlighted a bit more, perhaps even with a model that uses the results of both tasks to predict AQ scores.<br /> - Some discussion of the memory demands of the tasks will be helpful. The authors argue that memory is not a factor, but some support for this is needed.<br /> - With 3 sessions for each experiment, the authors also have data to look at learning. Did people with high AQ get better over time, or did the observed errors/biases persist throughout the experiment?

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study tests the hypothesis that a high autism quotient in neurotypical adults is strongly associated with suboptimal motor planning and visual updating after eye movements, which in turn, is related to a disrupted efference copy mechanism. The implication is that such abnormal behavior would be exaggerated in those with ASD and may contribute to sensory overload - a key symptom in this condition. The evidence presented is convincing, with significant effects in both visual and motor domains, adequate sample sizes, and consideration of alternatives. However, the study would be strengthened with minor but necessary corrections to methods and statistics, as well as a moderation of claims regarding direct application to ASD in the absence of testing such patients.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study examines a hypothesized link between autism symptomatology and efference copy mechanisms. This is an important question for several reasons. Efference copy is both a critical brain mechanism that is key to rapid sensorimotor behaviors, and one that has important implications for autism given recent empirical and theoretical work implicating atypical prediction mechanisms and atypical reliance on priors in ASD.

      The authors test this relationship in two different experiments, both of which show larger errors/biases in spatial updating for those with heightened autistic traits (as measured by AQ in neurotypical (NT) individuals).

      Strengths:

      The empirical results are convincing - effects are strong, sample sizes are sufficient, and the authors also rule out alternative explanations (ruling out differences in motor behavior or perceptual processing per se).

      Weaknesses:

      My main concern is that the paper should be more transparent about both (1) that this study does not include individuals with autism, and (2) acknowledging the limitations of the AQ.

      On the first point, and I don't think this is intentional, there are several instances where the line between heightened autistic traits in the NT population and ASD is blurred or absent. For example, in the second sentence of the abstract, the authors state "Here, we examine the idea that sensory overload in ASD may be linked to issues with efference copy mechanisms". I would say this is not correct because the authors did not test individuals with ASD. I don't see a problem with using ASD to motivate and discuss this work, but it should be clear in key places that this was done using AQ in NT individuals.

      For the second issue, the AQ measure itself has some problems. For example, reference 38 in the paper (a key paper on AQ) also shows that those with high AQ skew more male than modern estimates of ASD, suggesting that the AQ may not fully capture the full spectrum of ASD symptomatology. Of course, this does not mean that the AQ is not a useful measure (the present data clearly show that it captures something important about spatial updating during eye movements), but it should not be confused with ASD, and its limitations need to be acknowledged. My recommendation would be to do this in the title as well - e.g. note impaired visuomotor updating in individuals with "heightened autistic traits".

      We thank the reviewer for the kind words. We now specify more carefully that our sample of participants consists of neurotypical adults scored for autistic traits and none of them was diagnosed with autism before participating in our experiment. Regarding the Autistic Quotient Questionnaire (AQ) on page 5 of the Introduction we now write:

      “The autistic traits of the whole population form a continuum, with ASD diagnosis usually situated on the high end 31-33. Moreover, autistic traits share a genetic and biological etiology with ASD 34. Thus, quantifying autistic-trait-related differences in healthy people can provide unique perspectives as well as a useful surrogate for understanding the symptoms of ASD 31,35.”

      In the Discussion (page 9) we now write:

      ”It is essential to note that our participant pool lacked pre-existing diagnoses before engaging in the experiments and we must address limitations associated with the AQ questionnaire. The AQ questionnaire demonstrates adequate test-retest reliability 36, normal distribution of sum scores in the general population 50, and cross-cultural equivalence has been established in Dutch and Japanese samples 51-53. The AQ effectively categorizes individuals into low, average, and high degrees of autistic traits, demonstrating sensitivity for both group and individual assessments 54.

      However, evolving research underscores many aspects that are not fully captured by the self-administered questionnaire: for example, gender differences in ASD trait manifestation 55. Autistic females may exhibit more socially typical interests, often overlooked by professionals 56. Camouflaging behaviors, employed by autistic women to blend in, pose challenges for accurate diagnosis 57. Late diagnoses are attributed to a lack of awareness, gendered traits, and outdated assessment tools 58. Moving forward, complementing AQ evaluations in the general population with other questionnaires, such as those assessing camouflaging abilities 59, or motor skills in everyday situation (MOSES-test 60) becomes crucial for a comprehensive understanding of autistic traits.”

      Suggestions for improvement:

      - Figure 5 is really interesting. I think it should be highlighted a bit more, perhaps even with a model that uses the results of both tasks to predict AQ scores.

      We thank the reviewer for the suggestion. However, the sample size is relatively small for building a robust and generalizable model to predict AQ scores. Statistical models built on small datasets can be prone to overfitting, meaning that they might not accurately predict the AQ for new individuals.

      - Some discussion of the memory demands of the tasks will be helpful. The authors argue that memory is not a factor, but some support for this is needed. 

      The reviewer raises an important point regarding the potential for memory demands to influence our results. We have now also investigated the accuracy of the second saccade separately for the x and y dimension. As also shown in figure 3 panel A, a motor bias was observed only in one dimension (x), weaking the argument of memory which would imply a bias in both directions (participants remembering the position of the target relative to both screen borders for example). We performed a t-test between our subsample of participants and indeed we found a difference in saccade accuracy for the x dimension (p = 0.03) but not in the y dimension (p = 0.88).

      We now add these analyses in Discussion on page 8.

      - With 3 sessions for each experiment, the authors also have data to look at learning. Did people with high AQ get better over time, or did the observed errors/biases persist throughout the experiment? 

      We thank the reviewer for pointing this out. On page 7 (Results) we now write:

      ” Understanding how these biases might change over time could provide further insights into this mechanism. Specifically, we investigated whether participants exhibited any learning effects throughout the experiments. For data of Experiment 1 – motor updating – we divided our data into 10 separate bins of 30 trials each. We conducted a repeated measure ANOVA with the within-subject factor “number of sessions” (two main sessions of 5 bins each, ~150 trials) and the between-subject factor “group” (lower vs upper quartile of the AQ distribution). We found no main effect of “number of sessions” (F(1,7) = 0.25, p = 0.66), a main effect of “group” (F(1,7) = 2.52, p = 0.015), and no interaction between the two subsample of participants and the sessions tested (F(1,7) = 0.51, p = 0.49). Data of Experiment 2 – visual updating– were separated into 3 sessions. For each session we extracted the PSE and we conducted a repeated measure ANOVA with within subject factor “sessions” and between subject factor “groups” (lower vs upper quartile of the AQ distribution). Also here we found no main effect of sessions (F(1,13) = 0.86, p = 0.39), a main effect of group (F(1,14) = 11.85, p = 0.004), and no interaction between the two subsample of participants and the sessions tested (F(1,13) = 0.20, p = 0.73). In conclusion, the current study found no evidence of learning effects across the experimental sessions. However, a significant main effect of group was observed in both Experiment 1 (motor updating) and Experiment 2 (visual updating). Participants in the group with higher autistic traits performed systematically differently on the task, regardless of the number of sessions completed compared to those in the group with lower autistic traits.”

      Reviewer #2 (Public Review):

      Summary:

      The idea that various clinical conditions may be associated, at least partially, with a disrupted corollary discharge mechanism has been present for a long time.

      In this paper, the authors draw a link between sensory overload, a characteristic of autism spectrum disorder, and a disturbance in the corollary discharge mechanism. The authors substantiate their hypothesis with strong evidence from both the motor and perceptual domains. As a result, they broaden the clinical relevance of the corollary discharge mechanism to encompass autism spectrum disorder.

      The authors write:

      "Imagine a scenario in which you're watching a video of a fast-moving car on a bumpy road. As the car hits a pothole, your eyes naturally make quick, involuntary saccades to keep the car in your visual field. Without a functional efference copy system, your brain would have difficulty accurately determining the current position of your eye in space, which in turn affects its ability to anticipate where the car should appear after each eye movement."

      I appreciate the use of examples to clarify the concept of efference copy. However, I believe this example is more related to a gain-field mechanism, informing the system about the position of the eye with respect to the head, rather than an example of efference copy per se.

      Without an efference copy mechanism, the brain would have trouble accurately determining where the eyes will be in space after an eye movement, and it will have trouble predicting the sensory consequences of the eye movement. However it can be argued that the gain-field mechanism would be sufficient to inform the brain about the current position of the eyes with respect to the head. 

      We now used a different example. And on page 3 of Introduction, we now write:

      “During a tennis game, rapid oculomotor saccades are employed to track the high-velocity ball across the visual display. In the absence of a functional efference copy mechanism, the brain would encounter difficulty in anticipating the precise retinal location of the ball following each saccade. This could result in a transient period of visual disruption as the visual system adjusts to the new eye position. The efference copy, by predicting the forthcoming sensory consequences of the saccade, would bridge this gap and facilitate the maintenance of a continuous and accurate representation of the ball's trajectory.”

      The authors write:

      "In the double-step paradigm, two consecutive saccades are made to briefly displayed targets 21, 22. The first saccade occurs without visual references, relying on internal updating to determine the eye's position."

      Maybe I have missed something, but in the double-step paradigm the first saccade can occur without the help of visual references if no visual feedback is present, that is, when saccades are performed in total darkness. Was this the case for this experiment? I could not find details about room conditions in the methods. Please provide further details.

      In case saccades were not performed in total darkness, then the first saccade can be based on the remembered location of the first target presented, which can be derived from the retinotopic trace of the first stimuli, as well as the contribution from the surroundings, that is: the remembered relative location of the first target with respect to the screen border along the horizontal meridian (i.e. allocentric cues).

      A similar logic could be applied to the second saccade. If the second saccade were based only on the retinotopic trace, without updating, then it would go up and 45 deg to the right, based on the example shown in Figure 1. With appropriate updating, the second saccade would go straight up. However, if saccades were not performed in total darkness, then the location of the second target could also be derived from its relationship with the surroundings (for example, the remembered distance from screen borders, i.e. allocentric cues).

      If saccades were not performed in total darkness, the results shown in Figures 2 and 3 could then be related to i) differences in motor updating between AQ score groups; ii) differences in the use of allocentric cues between AQ score groups; iii) a combination of i) and ii). I believe this is a point worth mentioning in the discussion." 

      Thank you for raising the important issue of visual references in the double-step saccade task. Participants performed saccades in a dimly lit room where visual references, i.e. the screen borders, were barely visible. At the time we collected the data a laboratory that allowed performing experiments in complete darkness was not at our disposal. We acknowledge the possibility that participants could have memorized the target locations relative to the screen borders. The bias of high AQ participants could then be attributed to differences in either encoding, memorization or decoding of the target location relative to the screen borders. However, the potentially abnormal use of visual references must reflect an altered remapping process since we did not find differences in saccade landing in the vertical dimension. A t-test between our group of participants revealed a difference in saccade accuracy for the x dimension (p = 0.03) but not in the y dimension (p = 0.88). We thus agree that in addition to an altered efference copy signal in high AQ participants, altered use of visual references might also affect their saccadic remapping.

      In Discussion we now write: “Our findings suggest that a general memory deficit is unlikely to fully explain the observed bias in high-AQ participants' second saccades. As highlighted in Figure 3A, the bias was specific to the horizontal dimension, weakening the argument for a global memory issue affecting both vertical and horizontal encoding of target location. However, it's important to acknowledge that even under non-darkness conditions, participants might rely on a combination of internal updating based on the initial target location and visual cues from the environment, such as screen borders. This potential use of visual references could contribute to the observed bias in the high-AQ group. If high-AQ participants differed in their reliance on visual cues compared to the low-AQ group, it could explain the specific pattern of altered remapping observed in the horizontal dimension. This possibility aligns with our argument for an abnormal remapping process underlying the results. While altered efference copy signals remain a strong candidate, the potential influence of visual cues on remapping in this population warrants further investigation. Future studies could incorporate a darkness condition to isolate the effects of internal updating on the first saccade, and systematically manipulate the availability of visual cues throughout the task. This would allow for a more nuanced understanding of how internal updating and visual reference use interact in the double-step paradigm, particularly for individuals with varying AQ scores “.

      The authors write:

      According to theories of saccadic suppression, an efference copy is necessary to predict the occurrence of a saccade."

      I would also refer to alternative accounts, where saccadic suppression appears to arise as early as the retina, due to the interaction between the visual shift introduced by the eye movement, and the retinal signal associated with the probe used to measure saccadic suppression. This could potentially account for the scaling of saccadic suppression magnitude with saccade amplitude.

      Idrees, S., Baumann, M.P., Franke, F., Münch, T.A. and Hafed, Z.M., 2020. Perceptual saccadic suppression starts in the retina. Nature communications, 11(1), p.1977. 

      We thank the reviewer. Now on page 4 of Introduction we write:

      “Some theories consider saccadic omission and saccadic suppression as resulting from an active mechanism. In this view an efference copy would signal the occurrence of a saccade, yielding a transient decrease in visual sensitivity20-22. Others however have pointed out the possibility that a purely passive mechanism suffices to induce saccadic omission23. A recent study has found evidence for saccadic suppression already in the retina. Idrees et al.24 demonstrated that retinal ganglion cells in isolated retinae of mice and pigs respond to saccade-like displacements, leading to the suppression of responses to additional flashed visual stimuli through visually triggered retinal-circuit mechanisms. Importantly, their findings suggest that perisaccadic modulations of contrast sensitivity may have a purely visual origin, challenging the need for an efference copy in the early stages of saccadic suppression. However, the suppression they measured lasted much longer than time-courses observed in behavioral data. An efference copy signal could thus be necessary to release perception from suppression.”

      Reviewer #3 (Public Review): 

      Summary:

      This work examined efference copy related to eye movements in healthy adults who have high autistic traits. Efference copies allow the brain to make predictions about sensory outcomes of self-generated actions, and thus serve important roles in motor planning and maintaining visual stability. Consequently, disrupted efference copies have been posited as a potential mechanism underlying motor and sensory symptoms in psychopathology such as Autism Spectrum Disorder (ASD), but so far very few studies have directly investigated this theory. Therefore, this study makes an important contribution as an attempt to fill in this knowledge gap. The authors conducted two eye-tracking experiments examining the accuracy of motor planning and visual perception following a saccade and found that participants with high autistic traits exhibited worse task performance (i.e., less accurate second saccade and biased perception of object displacement), consistent with their hypothesis of less impact of efference copies on motor and visual updating. Moreover, the motor and visual biases are positively correlated, indicative of a common underlying mechanism. These findings are promising and can have important implications for clinical intervention if they can be replicated in a clinical sample.

      Strengths:

      The authors utilized well-established and rigorously designed experiments and sound analytic methods. This enables easy translations between similar work in non-human primates and humans and readily points to potential candidates for underlying neural circuits that could be further examined in follow-up studies (e.g., superior colliculus, frontal eye fields, mediodorsal thalamus). The finding of no association between initial saccade accuracy and level of autistic trait in both experiments also serves as an important control analysis and increases one's confidence in the conclusion that the observed differences in task performance were indeed due to disrupted efference copies, not confounding factors such as basic visual/motor deficits or issues with working memory. The strong correlation between the observed motor and visual biases further strengthens the claim that the findings from both experiments may be explained by the same underlying mechanism - disrupted efference copies. Lastly, the authors also presented a thoughtful and detailed mechanistic theory of how efference copy impairment may lead to ASD symptomatology, which can serve as a nice framework for more research into the role of efference copies in ASD.

      Weaknesses:

      Although the paper has a lot of strengths, the main weakness of the paper is that a direct link with ASD symptoms (i.e., sensory overload and motor inflexibility as the authors suggested) cannot be established. First of all, the participants are all healthy adults who do not meet the clinical criteria for an ASD diagnosis. Although they could be considered a part of the broader autism phenotype, the results cannot be easily generalized to the clinical population without further research. Secondly, the measure used to quantify the level of autistic traits, Autistic Quotient (AQ), does not actually capture any sensory or motor symptoms of ASD. Therefore, it is unknown whether those who scored high on AQ in this study experienced high, or even any, sensory or motor difficulties. In other words, more evidence is needed to demonstrate a direct link between disrupted efference copies and sensory/motor symptoms in ASD.

      This is a valid point, and we thank the reviewer for raising it up. Moving forward, complementing AQ evaluations in the general population with other questionnaires, such as those assessing camouflaging abilities (Hull, L., Mandy, W., Lai, MC., et al., 2019), or motor skills in everyday situation (MOSES-test, Hillus J, Moseley R, Roepke S, Mohr B. 2019 ) becomes crucial for a comprehensive understanding of autistic traits.”

      We now address this point in Discussion page 9.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      - The pothole example in the introduction was really hard to follow. I wonder if there is a better example. 

      We now used a different example. And on page 3 of Introduction, we now write:

      “During a tennis game, rapid oculomotor saccades are employed to track the high-velocity ball across the visual display. In the absence of a functional efference copy mechanism, the brain would encounter difficulty in anticipating the precise retinal location of the ball following each saccade. This could result in a transient period of visual disruption as the visual system adjusts to the new eye position. The efference copy, by predicting the forthcoming sensory consequences of the saccade, would bridge this gap and facilitate the maintenance of a continuous and accurate representation of the ball's trajectory.”

      - This is really minor; I would say that saccades are not the most frequent movement that humans perform. Some of the balance-related adjustments and even heartbeats are faster. Maybe just add "voluntary". 

      We thank the reviewer for the suggestion, now added.

      - "Severe consequences" on page 4 is a bit strong. If that were true, there would be pretty severe impairments in eye movement behavior in ASD, which I don't think is the case.

      We agree with the reviewer. We now eliminated the term “severe”.

      - The results section would read better if each experiment had a short paragraph reiterating its overall goal and the specific approach each experiment took to achieve that goal. 

      Now on page 5, for the first experiment, we write:

      ”We investigated the influence of autistic traits on visual updating during saccadic eye movements using a classic double-step saccade task. This task relies on participants making two consecutive saccades to briefly presented targets. The accuracy of the second saccade serves as an indirect measure of how effectively the participant's brain integrated the execution of the first saccade into their internal representation of visual space. Participants were divided into quartiles based on the severity of their autistic traits, as assessed by the Autistic quotient questionnaire (cite). We hypothesized that individuals with higher autistic traits would exhibit greater difficulty in visual updating compared to those with lower autistic traits. This would be reflected in reduced accuracy of their second saccades in the double-step task. Figure 2C illustrates examples from participants at the extremes of the autistic trait distribution (Autistic quotient = 3, in orange and Autistic quotient = 31, in magenta). As shown, both participants were instructed to make saccades to the locations indicated by two brief target appearances (T1 and T2), as quickly and accurately as possible, following the order of presentation. However, successful execution of the second saccade requires accurate internal compensation for the first saccade, without any visual references or feedback available during the saccade itself.”

      On page 6, for experiment 2, we write:

      ”With a trans-saccadic localization task, we explored how autistic traits affect the integration of eye movements into visual perception. Participants were presented with stimuli before and after a single saccade, creating an illusion of apparent motion. We measured the perceived direction of this displacement, which is influenced by how well the participant's brain accounts for the saccadic eye movement. We predicted that individuals with higher autistic traits would show a stronger bias in the perceived displacement direction, suggesting a less accurate integration of the eye movement into their visual perception.”

      - On page 6, the text about "vertical displacement" is confusing. The spatial displacements in this experiment were horizontal? 

      Yes, they were. The spatial displacement is horizontal, but the perceived trajectory (due to the saccade) is vertical. We now changed “vertical displacement” to “vertical trajectory”.

      - Page 6, grammatical problems in "while we report a slightly slant of the dots trajectory". 

      Thank you. Now fixed.

      - It would be helpful to discuss the apparent motion part of Experiment 2 in the main text. This important part is not made clear. 

      We now in Introduction, page 4, write:

      “In this paradigm, one stimulus is shown before and another after saccade execution. Together these two stimuli produce the perception of “apparent motion”. If stimuli are placed such that the apparent motion path is orthogonal to the saccade path, then the orientation of the apparent motion path indicates how the saccade vector is integrated into vision. The apparent motion trajectory can only appear vertical if the movement of the eyes is perfectly accounted for, that is the retinotopic displacement is largely compensated, ensuring spatial stability. However, small biases of motion direction – implying under- (or over-) compensation of the eye movement – can indicate relative failures in this stabilization process. In a seminal study, Szinte and Cavanagh 27 found a slight over-compensation of the saccade vector leading to apparent motion slightly tilted against the direction of the saccade. More importantly, when efference copies are not available, i.e. localization occurring at the time of a second saccade in a double step task, a strong saccade under-compensation occurs 28.

      This phenomenon cannot be explained by perisaccadic mislocalization of flashed visual stimuli 29,30, but the two phenomena may be related in that they may both depend upon efference copy information.”

      - Figure 1 could be improved. For example, the text talks about the motor plan, but this is not clearly shown in the figure.

      We now added the motor plan into the model. Thank you.

      - Figure 2A, the scale is off (the pictures make it look like the horizontal movement was longer than the vertical). 

      Now fixed.

      - Figure 4, it would be helpful if the task was also described in the figure. 

      We thank the reviewer for the comment. We now tried to modify the figure by also adding the perceptual judgment task.

      - Figure 5A, the y-axis shows p(correct), but that is not what the y-axis shows (the legend makes the same mistake). 

      We apologize, it’s the proportion of time participants reported the second dot to be more to the right compared to the first one. We now changed the figure and the text accordingly.

      - A recent study on motion and eye movement prediction in ASD is very relevant to the work presented here.: Park et al. (2021). Atypical visual motion-prediction abilities in autism spectrum disorder. Clinical Psychological Science, 9(5), 944-960.

      Indeed. We now refer to the cited study in Discussion, on page 9.

      Reviewer #2 (Recommendations For The Authors):

      Statistics and plotting.

      I believe some of the reported statistics are not clear. For example, the authors write:

      "Saccade landing positions of participants in the lower quartile (mean degree {plus minus} SEM: 10.17{plus minus} 0.50) did not deviate significantly from those in the upper quartile (mean degree {plus minus} SEM: 9.65 {plus minus} 0.77). This result was also confirmed by a paired sample t-test (t(7) = 0.66; p = 0.66, BF10 = 0.40)"

      Maybe I am missing something, but why use a paired-sample t-test when the upper and lower quartiles constitute different groups of participants? Shouldn't a two-sample t-test be used in this case?

      We apologize for the confusion. It is indeed a two-sample t-test.

      Along the same lines, I do not understand the link between the number of degrees of freedom reported in the t-test (7) and the number of participants reported in the study (41).

      This is also evident when looking at the scatterplot in Figure 3C. How many participants formed the averages and standard errors reported in Figures 3B and 3D? Please clarify.

      I have the same comment(s) also for the visual updating task (and related figures), where 13 degrees of freedom are reported in the t-tests. Please clarify. 

      We thank the reviewer for pointing this out. The number of participants reported in the scatter plots were indeed 42.  However, we opted to compare the averages only in the lower and upper quartile of the AQ distribution to avoid dealing with a median split (which would imply a skewed distribution). Of our sample of participants in Exp1, 8 fell into the lower quartile of the AQ distribution and 8 in the upper quartile (14 deg of freedom); from Exp 2, 8 participants fell in the lower and 7 in the upper (13 deg of freedom).

      We now fixed the values accordingly.

      Reviewer #3 (Recommendations For The Authors):

      (1) The language can be a bit misleading (especially the title and abstract) as it wasn't always clear that the participants don't actually have clinical ASD. I'd suggest avoiding using words like "symptom" as that would indicate clinical severity, and using words like "traits/characteristics" instead for more precise language. 

      We apologize for the misleading terminology used. Now fixed.

      (2) In the Intro: "...perfect compensation results in a vertical trajectory, while small biases indicate stabilization issues23-25." This is a bit confusing without knowing the details of the paradigm. Consider clarifying or at least referring to Figure 4. 

      Thank you.

      (3) In the Results: "This result was also confirmed by a paired sample t-test (t(7) = 0.66;..." This is confusing as a two-sample t-test is the appropriate test here. Also, the degree of freedom seems very low - could the authors clarify how many participants are in each subgroup (i.e., low vs. high AQ quartile), for both experiments? 

      Of our sample of participants in Exp1 8 fell into the lower quartile of the AQ distribution and 8 in the upper quartile (14 deg of freedom); from Exp 2, 8 participants fell in the lower and 7 in the upper (13 deg of freedom).

      (4) In the Methods: Experiment 2: "The first dot could appear randomly above or below gaze level at a fixed horizontal location, halfway between the two fixations (x = 0, y = -5{degree sign} or +5{degree sign} depending on the trial). The second dot was then shown orthogonal to the first one at a variable horizontal location (x = 5{degree sign} {plus minus} 2.5{degree sign})." This would mean that the position of the 2nd dot relative to the 1st one would be 2.5{degree sign}- 7.5{degree sign}, but the task description in Results and Figure 5A would suggest the horizontal location of the second dot is x = 0{degree sign} {plus minus} 2.5{degree sign}. Which one is correct? 

      The second option is the correct one. We now fixed the typo in the Methods part.

      (5) There is another study that examined oculomotor efference copies in children with ASD using a similar trans-saccadic perception task (Yao et al., 2021, Journal of Vision). In that study, they found a correlation between task performance and an ASD motor symptom (repetitive behavior). This seems quite relevant to the authors' hypothesis and discussion. 

      We thank the reviewer for the suggestion. We now added the mentioned paper in the discussion.

      (6) Please proofread the entire paper carefully as there were multiple grammatical and spelling errors.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time and the thoughtful reviews on our manuscript. The reviewers brought good points regarding the sample size, and the low exposure in the South Asian cohort owing to their unique cultural and social practices. We recognize these as limitations of the paper and discussed these in the revised version. In the revised manuscript, we have taken the key suggestions by reviewers to 1) better illustrate the analytical flow and statistical methods, in particular, to show which datasets had been used in discovery, validation, and testing of the score – as a main figure in the manuscript and in the graphical abstract; 2) demonstrate there is no possibility of overfitting in our approach using statistical metrics of performance; 3) emphasize the goal was not for discovery (e.g. our own EWAS was not used for deriving the score), but to compare with existing EWASs and contrast the results from the white European and SA populations; 4) and supplement the analysis with previously derived maternal smoking, smoking and air pollution methylation score and to explore additional health outcomes in relation to lung health in newborns. Finally, we would also like to take this opportunity to re-iterate that it was not our objective to derive the most powerful methylation score of smoking nor to demonstrate the causal role of maternal smoking on birth weight via DNAm. We have restructure the manuscript as well as the discussion to clarify this. Please find below a point-by-point response to the comments below.

      Reviewer #1:

      The manuscript could benefit from a more detailed description of methods, especially those used to derive MRS for maternal smoking, which appears to involve overfitting. In particular, the addition of a flow chart would be very helpful to guide the reader through the data and analyses. The FDR correction in the EWAS corresponds to a fairly liberal p-value threshold. 

      We thank the reviewer for these good suggestions. In the revised manuscript, we have provided a flow chart as the new Figure 1, more detailed description of the method (added a subsection “Statistical analysis” under Materials and Methods) as well as metrics including measures of fit indices such as AUC and adjusted R2 for each validation and testing dataset to illustrate there is no danger of overfitting (in new Supplementary Table 5).

      The choice of use FDR was indeed arbitrary as there has been no consensus on what significance threshold, if any, should be used in the context of EWAS. Here we simply followed the convention in previous studies to contrast the top associated signals for their effects between different populations and with reported effect sizes. Throughout the manuscript, we have removed the notion of significant associations and used the phrase “top associated signals” or “top associations” when discussion EWAS results for individual CpGs.

      Reviewer #2:

      (1) The number of mothers who self-reported any smoking was very low, much lower than in the general population and practically non-existent in the South Asian population. As a result, all analyses appeared to have been underpowered. It is possibly for this reason that the authors chose to generate their DNA methylation model using previously published summary statistics. The resulting score is not of great value in itself due to the low-powered dataset used to estimate covariance between CpG sites. In fact, a score was generated for a much larger, better-powered dataset several years ago (Reese, EHP, 2017, PMID 27323799). 

      We thank the reviewer for pointing out the low exposure in the South Asian population, which we believe is complementary to the literature on maternal smoking that almost exclusively focused on white Europeans. However, the score was validating in the white European cohort (CHILD; current smoking 3.1%), which was reasonably similar to the trend that maternal cigarettes smoking is on the decline from 2016 to 2021, from 7.2% to 4.6% (Martin, Osterman, & Driscoll, 2023). This is also consistent with the fact that CHILD participants were recruited from major metropolitans of Canada with relatively high SES and education as compared to FAMILY.

      We do agree with the reviewers that a higher prevalence of maternal smoking in the validating sample could potential improve the power of the score. Our original analytical pipeline focused on CHILD as the validation dataset; FAMILY (see the new Figure 1) was used as the testing data. We alternatively provided an analytical scheme using FAMILY as the validation dataset, as it had a higher proportion of current smokers, however, this is limited by the number of CpGs available (128 in FAMILY vs. 2,619 in CHILD out of the 2,620 CpGs from (Joubert et al., 2016)). The results of all possible combinations of validation vs. testing and restriction of targeted array vs. HM450 are summarized in the new new Supplementary Table 5 and Supplementary Figure 5.

      To clarify, our choice to construct DNAm score using published summary statistics was not an ad-hoc decision due to the observed low power from CHILD EWAS. We agree with the reviewer that our study was indeed underpowered and was not originally intended for EWAS discovery. Thus, we specifically proposed to adopt a multivariate strategy from the literature of polygenic risk scores. This approach enabled us to leverage well-powered association signals without individual-level access to data with a sample size of n > 5,000 (Joubert et al., 2016). In comparison, the Reese maternal smoking score (Reese et al., 2017) had a discovery sample size of only n = 1,057. Our score was not out-performed, in fact, the AUC in both FAMILY (external validating dataset; n=411) and CHILD (external testing dataset; n=352) and was larger than that based on the Reese score as tabulated below (part of the new Supplementary Table 5).

      Author response table 1.

      Further, regarding the comment on the covariance matrix. Indeed, lassosum via elastic-net and summary data requires a reference covariance matrix that is consistent between the discovery data and external validation data. In fact, for moderately sized correlation/covariance values (r2 > 0.1), a sample size of >100 is sufficiently powered to detect it being different from 0 and thus used for estimation. Similar to the linkage disequilibrium of genotype data, the CpGs also exhibit a block-wise correlation structure and thus the theoretical framework of lassosum extends naturally to MRS.

      In the revised manuscript, we included the Reese score, as well as a few additional scores to compare their predictiveness of smoking phenotypes in white European cohorts. We note that the applicability was limited in the FAMILY cohort that was profiled using a targeted array and only 7 out of 28 of the CpGs in the Reese score were available. As a result, though the Reese score had similar performance than our derived score in CHILD (0.94 vs. 0.95), its performance in FAMILY was compromised (0.72 vs. 0.89).

      (2) The conclusion that "even minimal smoking exposure in South Asian mothers who were not active smokers showed a DNAm signature of small body size and low birthweight in newborns" is not warranted because no analyses were performed to show that the association between DNA methylation and birth size/weight was driven by maternal smoking. 

      We thank the reviewer for this subtle point – it was not our intention to suggest there was a causal relationship between DNA methylation and birth size that was mediated by maternal smoking. We meant to suggest that the maternal smoking methylation score was consistently associated with negative outcomes in newborns of both white European and South Asian mothers despite no maternal smoking was present in South Asian mothers. It is possible that maternal smoking MRS was capturing a lot more than just smoking and second-hand smoking, such as other environmental exposures that also lead to oxidative stress. These together are associated with reduced birth size/weight.

      In the revised manuscript, we have modified the conclusion above to:

      “Notably, these results indicate a consistent association between the DNAm signature of maternal smoking and a small body size and low birthweight in newborns, in both white European mothers who exhibited some amount of smoking and in South Asian mothers who themselves were not active smokers.”

      (3) Although it was likely that some mothers were exposed to second-hand smoke and/or pollution, data on this was either non-existent or not included in this study. Including this would have allowed a more novel investigation of the effects of smoke exposure on the pregnancies of non-smoking mothers.

      We agree with this comment – second-hand smoking was captured by self-reported weekly smoking exposure by the mothers. We reported the association with smoking exposure and found that it was not consistently associated with our methylation scores across the cohorts (cohort specific association p-values of 5.4×10-5, 3.4×10-5, and 0.58, for CHILD, FAMILY, and START; original Table 3), possibly due to the low exposure in South Asian population (max weekly exposure was 42 hrs in contrast to 168 hrs in FAMILY and 98 hrs in CHILD). Meanwhile, air pollution data are currently not available. Here we additionally performed the association between maternal smoking and air pollution methylation score, using key CpGs from the largest air pollution EWAS to-date (Gondalia et al., 2021). However, there was no association between the air pollution score and any maternal smoking phenotypes (ps > 0.4).

      (4) One of the European cohorts and half of the South Asian cohort had DNA methylation measured on only 2500 CpG sites. This set of sites included only 125 sites previously linked to prenatal smoking. The resulting model of prenatal smoking was small (only 11 CpG sites). It is possible that a large model may have been more powerful.

      That is correct – also see our response to R2 comment #1. In our previous analysis, we validated two scores (one based on CpGs on the < 3,000 CpGs array and the other one for the full HM450K). The score with more CpGs indeed had slightly better performance. We included this as one of the limitations of the paper. Nevertheless, it does not impact the conclusion that the scores (based on a larger or smaller model) are transferrable to diverse populations and can be used to comparatively study the DNAm influence of maternal smoking in newborns.

      The following was added in the discussion:

      “First, the customized array with a limited number of CpGs (<3,000) was designed in 2016 and many large EWASs on smoking and maternal smoking conducted more recently had not been included.”

      (5) The health outcomes investigated are potentially interesting but there are other possibly more important outcomes of interest such as birth complications, asthma, and intellectual impairment which are known to be associated with prenatal smoking.

      We thank the reviewer for bring up this point. One of the key health outcomes in the CHILD study was asthma, and data at later time points are available. However, we do not have similar outcomes collected in the other two studies (FAMILY and START), which focused on cardiometabolic health in young children. Thus, we did not initially include outcomes that were not available across all cohorts as the intention was to contrast the effects between populations.

      We recognize that this is an important question and decided to provide the association results for asthma and allergy at available time points in CHILD, FAMILY, and START. We also included mode of delivery via emergency C-section as an additional proxy outcome of birth complications. However, none of these were marginally (p < 0.05) associated with the DNAm smoking score. These are now included in the updated Supplementary Table 8.

      Reviewer #1 (Recommendations For The Authors):

      (1) The number of samples in the South Asian birth cohort given in the abstract (n = 887) does not match the sample size of the START cohort from the results section (results, page 7, line 139, n = 880). It is also different from the final analytical dataset size from the methods section (page 17, line 386, n = 890). Please clarify. 

      We thank the reviewer for pointing this out. In the abstract, it was the final sample sized used for EWAS (no missingness in smoking history). The 880 in result was a typo for 890, which contains three individuals with missing smoking data. These have been updated with the correct sample size for START cohort that had full epigenome-wide methylation data (n = 504, and 503 with non-missing smoking history).

      (2) Page 3, line 54: "consistent signal from the GFI1 gene (ps < 5×10-5)". Is ps a typo? If not then it might be clearer to state how many sites this included. 

      No, these summarized the six CpG sites in the GFI1 gene as outlined in Table 2. We have clarified in the abstract to show the number of CpG sites included.

      (3) Please report effect sizes together with information about the statistical significance (p values). 

      We have updated the manuscript with (standardized) effect sizes whenever possible along with p-values.

      (4) Page 4, line 80. This paragraph could be improved by adding a sentence explaining DNA methylation. 

      We thank the reviewer for this suggestion. A sentence was included to introduce DNAm at the beginning of the second paragraph:

      “DNA methylation is one of the most commonly studied epigenetic mechanisms by which cells regulate gene expression, and is increasingly recognized for its potential as a biomarker (13).”

      (5) Page 4, line 84. Sentence difficult to understand, please rephrase: "Our recent systematic review of 17 cord blood epigenome-wide association studies (EWAS) demonstrated that out of the 290 CpG sites reported, 19 sites were identified in more than one study; all of them associated with maternal smoking". 

      We have revised to clarify the review was on cord blood EWAS with five outcomes: maternal diabetes, pre-pregnancy body mass index, diet during pregnancy, smoking, and gestational age.

      “Our recent systematic review of 17 cord blood epigenome-wide association studies (EWAS) found that out of the 290 CpG sites reported to be associated with at least one of the following: maternal diabetes, pre-pregnancy body mass index (BMI), diet during pregnancy, smoking, and gestational age, 19 sites were identified in more than one study and all of them associated with maternal smoking.”

      (6) Page 5, line 93. The second part of the sentence is not necessary: "The majority of cohort studies have focused on participants of European ancestry, but few were designed to assess the influence of maternal exposures on DNA methylation changes in non-Europeans". 

      We have revised accordingly to:

      “Only a handful of cohort studies were designed to assess the influence of maternal exposures on DNA methylation changes in non-Europeans.”

      (7) Page 5, line 95. "It has been suggested that ancestral background could influence both systematic patterns of methylation (27), such as cell composition and smoking behaviours (28)". The sentence is slightly unclear. Could it be rephrased to say that cell composition differences may be present by ancestry, which can lead to differential DNAm patterns? 

      We have revised accordingly to:

      “It has been suggested that systematic patterns of methylation (Elliott et al., 2022), such as cell composition, could differ between individuals of different ancestral backgrounds, which could in turn confound the association between differential DNAm and smoking behaviours (Choquet et al., 2021).”

      (8) Page 5, line 108. How does reducing the number of predictors lead to more interpretable effect sizes? 

      This was meant as a general comment in the context of variable selection, whereby the fewer predictors there are, the effect size of each predictor becomes more interpretable. However, we recognize this comment might be irrelevant to the specific approaches we adopted. We have revised it to motivate methylation score as a powerful instrument for analysis:

      “Reducing the number of predictors and measurement noise in the data can lead to better statistical power and a more parsimonious instrument for subsequent analyses.”

      (9) Page 5, line 112. Health consequences seem a bit strong, given that the analysis describes correlations/associations. 

      We have revised it to “association with”:

      “In this paper, we investigated the epigenetic signature of maternal smoking on cord blood DNA methylation in newborns, as well as its influence on newborn and later life outcomes in one South Asian which refers to people who originate from the Indian subcontinent, and two predominantly European-origin birth cohorts.”

      Results

      (10) It would be very helpful to have a flow diagram to detail all of your analyses.

      We thank the reviewer for this suggestion. In the revised manuscript, we have provided a flow chart as the new Figure 1, updated the summary of analysis in . Table 3, and added a new Supplementary Table 5 for the DNAm score derivation, as well as more detailed description of the statistical analysis in the Materials and Methods under the subsection “Statistical analysis”.

      (11) Page 7, line 138. Please add a reference to the CHILD study. 

      We have added a reference of the CHILD study.

      (12) Tables in results and in supplemental data a) contain a mixture of fields describing the newborn and its mother (this is not true for Supplementary Table 2), b) lack column descriptions, c) lack descriptions of abbreviations and formatting used in tables, d) use different font types, e) lack descriptions of statistical tests that were used to obtain p-values, f) use inconsistent rounding. Please correct and add the missing information.

      We have consolidated the notation and nomenclature in all Tables and text. All numerical results are now rounded to 2 decimal places. The tests used were included in the Table headers as well as described in the Materials and Methods:

      “For continuous phenotypes, an analysis of variance (ANOVA) using the F-statistics or a two-sample t-test was used to compare the mean difference across the three cohorts or two groups, respectively. For categorical phenotypes, a chi-square test of independence was used to compare the difference in frequencies of observed categories. Note that three of the categories under smoking history in the START cohort had expected cell counts less than 5, and was thus excluded from the comparison, the reported p-value was for CHILD and FAMILY.”

      (13) Table 1. Sample sizes given in column descriptions do not add up to 1,650 (legend text).

      We thank the reviewer for pointing this out. The updated sample size is 1,267, based on the 352 CHILD samples, 411 FAMILY samples, and 352 START samples. Notice that we did not remove those without full smoking history data as Table 1 was intended for the epigenetic subsamples.

      (14) Page 7, line 156. Supplementary Tables are incorrectly numbered. In the text, Supplementary Table 4 comes after Supplementary Table 2.

      We thank the reviewer for catching this and have corrected the ordering of the Supplementary Tables and Figures. 

      (15) Page 7, line 158. "cell compositions" - do you mean estimated white cell proportions? 

      We have revised it to “estimated cord blood cell proportions” in the text throughout.

      (16) Smoking EWAS - do you see any overlap/directional consistency with the top findings from adult EWASs of smoking such as AHRR? 

      We annotated the top EWAS signals from the literature in the meta-analysis (new Figure 2; Supplementary Figures 1 and 3), but was only able to confirm associations in the GFI1 gene. The AHRR signals were also annotated, but below the FDR correction threshold as seen in new Figure 2 at the start of chromosome 5. We further added a new Supplementary Figure 3 to show the directional consistency with top findings (2,620 CpGs reported and 128 CpGs overlapped with our meta-analysis) from Joubert et al., 2016. The Pearson’s correlation coefficient with meta-analyzed effect for maternal smoking was 0.72 and for smoking exposure was 0.60.

      We added the following to Results:

      “Further, we observed consistency in the direction of association for the 128 CpGs that overlapped between our meta-analysis and the 2,620 CpGs with evidence of association for maternal smoking (19) (Supplementary Figure 3). Specifically, the Pearson’s correlation coefficient for maternal smoking and weekly smoking exposure was 0.72 and 0.60, respectively.”

      (17) Page 8, line 169. "also coincided with the GFI1 gene" this is a bit imprecise. Please report the correlation with the CpG from the maternal smoking analysis. 

      The CpG was inside the GFI1 gene, we have included the Pearson’s correlation with the top hit in the text below:

      “There were no CpGs associated with the ever-smoker status at an FDR of 0.05, though the top signal (cg09935388) was also mapped to the GFI1 gene (Pearson’s r2 correlation with cg12876356 = 0.75 and 0.68 in CHILD and FAMILY, respectively; Supplementary Figure 1).”

      (18) Page 8, line 171. Typo "ccg": "ccg01798813". 

      It has been corrected to “cpg01798813”.

      (19) Page 8, line 176. Please be clear about the phenotype used in these analyses. 

      The EWAS of weekly smoking exposure in START was removed in this version of the manuscript, in reflection of the results and the reviewer’s comments, as a result of this phenotyping being skewed and possibly leading to only spurious results (also see response to comment #20).

      We have clarified the phenotypes for these results under “Epigenetic Association of Maternal Smoking in White Europeans” below:

      “The maternal smoking and smoking exposure EWASs in CHILD did not yield any CpGs after FDR correction (Supplementary Figure 3).”

      (20) What was the genomic inflation for the EWASs? 474 loci in the South Asian EWAS seems like a lot of findings. Perhaps a more robust method (e.g., OSCA MOMENT) might help to control the false positive rate. 

      The genomic inflation factor was moderately across the cohorts for smoking exposure: 1.02 in CHILD, 0.94 in FAMILY, and 1.00 in START. However, there was more inflation in the tail of the distribution in START than the European cohorts. The empirical type I error rates at 0.01, 0.001, 0.00001, were high in START (x1.7, x5.7, and x165 times at each respective threshold), in contrast to CHILD (x1.06, x1.05, and x0.6) or FAMILY (x1.6, x1.9, and 0). The smoking exposure EWAS based on START was thus removed as these are likely false positives and there was very low smoking exposure to start with (11 reported weekly exposure between 2–42 hrs/week out of 462 with non-missing data). We have added the QQ-plots as well as the genomic inflation factor for the reported meta-analysis in the new Supplementary Figure 2. The following was added to the Results:

      “There was no noticeable inflation of empirical type I error in the association p-values from the meta-analysis, with the median of the observed association test statistic roughly equal to the expected median (Supplementary Figure 2).”

      (21) What is the targeted array? I don't think it has been introduced prior to this point. 

      We introduced it in the Materials and Methods under subsection “Methylation data processing and quality controls”. Considering this comment and previous comments on the ordering of Tables and Figures, we have decided to place Materials and Methods after Introduction and before Results.

      (22) The MRS section is described poorly in the results section. It is not clear where the 11 or 114 CpGs come from.

      We now include an analytical summary of all scores (derived or external from literature) in the new Supplementary Table 5. Further, we updated the description of scores in Materials and Methods under the subsection “Using DNA Methylation to Construct Predictive Models for Maternal Smoking” to clarify the source and types of MRSs derived:

      “To evaluate whether the targeted GMEL-EPIC array design has comparable performance as the epigenome-wide array to evaluate the epigenetic signature of maternal smoking, a total of three MRSs were constructed, two using the 128 CpGs available in all cohorts – across the HM450K and targeted GMEL-EPIC arrays – and with either CHILD (n = 347 with non-missing smoking history) or FAMILY (n = 397) as the validation cohort, and another using 2,107 CpGs that were only available in CHILD and START samples with CHILD as the validation cohort. Henceforth, we referred to these derived maternal smoking scores as the FAMILY targeted MRS, CHILD targeted MRS, and the HM450K MRS, respectively.”

      (23) Page 9, line 187. "There was no statistically significant difference between the two scores in all samples (p = 1.00) or among non-smokers (p = 0.24).". How was the significance assessed? Please describe the models (outcome, covariates, model type) used for comparing the two models. It would also be good to report the correlation between the scores.

      We have added a subsection “Statistical analysis” under Materials and Methods that described the tests. The correlation between scores is now summarized as a heatmap across all cohorts in the new Supplementary Figure 6.

      “For each cohort, we contrasted the three versions of the derived scores using an analysis of variance analysis (ANOVA) along with pairwise comparisons using a two-sample t-test to examine how much information might be lost due to the exclusion of more than 10-fold CpGs at the validation stage. We also examined the correlation structure between all derived and external MRSs using a heatmap summarizing their pairwise Pearson’s correlation coefficient.”

      (24) Please include the number of samples in the training/validation and in the test set in the methods and in the results.

      We thank the reviewer for this suggestion. In the revised manuscript, we have provided a flow chart as the new Figure 1 and more detailed description of the method in the Materials and Methods. Please also see response to comment #22. The training sample size is based on Joubert et al., (2016), which is 5,647. For our main analyses, the validation sample with non-missing phenotypes remained the CHILD cohort (n=347), while the FAMILY (n=397) and START (n=503) samples were the independent testing data. We alternatively provided another scenario, in which the FAMILY sample was the validation cohort, while CHILD and START were the testing cohorts. The exact sample size and performance metrics for each scenario and score are clearly summarized in the new Supplementary Table 5.

      (25) Table 3. Please clarify the type of information contained in the four last columns (p-value?).

      Yes – these are the individual cohort p-values. We have taken the suggestion from comment #12 to fully describe all columns and fields.

      (26) Page 10, line 215: "The meta-analysis revealed no heterogeneity in the direction nor the effect size of associations between populations". Please quote/refer to the results. 

      In the revision, the heterogeneity p-values were quoted and the relevant tables (Supplementary Table 8) were added to this sentence.

      (27) Figure 2 has issues with x labels. Due to the low number of ever smokers in START, the boxplot may not be the best visualisation method. It would also benefit from listing n's per group.

      We appreciate this comment to improve the figure presentation. We increased the font size for the X-labels. The sample size for each group in START was also labeled in the new Figure 3 (previously Figure 2).

      Discussion

      (28) Studying the association between maternal smoking and cord blood DNAm is interesting from a biological perspective as it allows for assessing the immediate and long-term effects of maternal smoking on newborn health. However, in terms of calculating the MRS, what are the benefits of using cord blood over the mother's blood? We know that blood-based DNAm smoking score is a powerful predictor of long-term smoking status. 

      The reviewer raises an interesting point – abundant literature supports that DNAm changes are tissue-specific. While mother’s blood DNAm smoking score reflect the long-term exposure to smoking in mothers, the cord blood DNAm captures the consequence of such long-term exposure for newborn health. One of the key results of our study is showing that established DNAm signatures of maternal smoking, which is known to mediate birth size and weight in white Europeans (these references were cited in the original manuscript), carries the same effect of reducing birth weight and size in the South Asian population. This is a critical finding from a DoHaD and public health perspective, as DNAm signatures of maternal smoking, irrespective of the smoking status of the mother, can influence the health trajectory of the newborns.

      We have expanded our discussion based on this suggestion to highlight the unique features of studying maternal smoking via different tissues and their implications. The following was added to the discussion:

      “There are several advantages of using a cord blood based biomarker from the DoHaD perspective. Firstly, cord blood provides a direct reflection of the in utero environment and fetal exposure to maternal smoking. Additionally, since cord blood is collected at birth, it eliminates potential confounding factors such as postnatal exposures that may affect maternal blood samples. Furthermore, studying cord blood DNAm allows for the assessment of epigenetic changes specifically relevant to the newborn, offering valuable information on the potential long-term health implications.”

      (29) Page 13, line 285: "Fourth" without "third".

      It has been revised accordingly.

      Methods 

      (30) The methods section does not contain all the details required to replicate the analysis. Whenever statistical analysis is conducted, this section should clearly describe the type of the analysis (linear regression, t-test, etc.) and name the dependent and independent variables. Sample sizes should also be given. 

      We added further details of test used and sample size for each analysis. We have also included a new “Statistical analysis” subsection under Materials and Methods.

      (31) Please describe MRS testing in the methods.

      We tested MRS with respect to binary and continuous smoking phenotypes using a logistic and linear regression, respectively. The predictive value was assessed using area under the roc curve for the binary outcome and an adjusted R2 for the continuous outcome. These were added to the new “Statistical analysis” subsection under Materials and Methods. See response to comments #22-24, and #30.

      (32) Please describe the methods used to compare the two versions of MRS for maternal

      smoking.

      It was a two-sample t-test, which was described in the Figure legends. We have now added this to the new “Statistical analysis” subsection under Materials and Methods.

      (33) Please describe testing the associations between MRS and Offspring Anthropometrics in more detail.

      We added further details on the regression model and the test for association in the methods. We have now added this to the new “Statistical analysis” subsection under Materials and Methods.

      (34) Meta analysing the 450k and GMEL arrays is going to substantially reduce the number of CpGs under investigation.

      We agree with the reviewer that this is not optimal for signal discovery. However, this is the only way we could synthesize evidence across the cohorts as FAMILY samples were only processed using the customized array. We added the following as a limitation of the study in the discussion.

      “First, the customized array with a limited number of CpGs (<3,000) was designed in 2016 and many large EWASs on smoking and maternal smoking conducted more recently had not been included.”

      (35) Page 16, line 364: GDM abbreviation was used in the results section (line 145), yet it is introduced in line 364. 

      Thank you for catching this, we have removed the duplicate.

      (36) Page 17, line 381: Given the stated importance of ancestry, why not restrict the sample to genetically confirmed groups?

      The reviewer has a valid point that ancestry, either perceived or genetic, can introduce additional heterogeneity due to potential differences in genetics, cultural and social practices, and lifestyles. Genetic data are indeed available for a subset of the individuals. In the original version of the manuscript, we used a stringent ancestry calling method by mapping all individuals with the 1000 Genomes samples from continental populations. The final definition was based on a combination of self-reported and genetically confirmed ancestry. However, if we restricted only to genetically confirmed groups, the sample size would be reduced to 312 (vs. 411), 268 (vs. 352), and 488 (vs. 504) in FAMILY, CHILD, and START, respectively.

      We compared the mean difference in the beta-values of the top associated CpGs and the derived MRS between those genetically confirmed vs. self-reported ancestral groups, and observed no material difference. These results are now included in the Supplementary Materials as part of the sensitivity analysis. Thus, given these considerations, we decided to use this complementary approach to retain the maximum number of samples while ensuring some aspect of ancestral homogeneity.

      “To maximize sample size in FAMILY and CHILD, we retained either self-identified or genetically confirmed Europeans based on available genetic data (Supplementary Table 1).”

      (37) Page 18, line 397: sensitivity analysis not sensitive analysis.

      Thank you for catching this, we have revised accordingly.

      (38) Page 18, line 409: smoking was rank transformed however, it would be good to see regression diagnostics for the lead loci in the EWAS to check that assumptions were met. 

      We thank the reviewer for this suggestion. Smoking exposure is indeed skewed and in fact very much zero-inflated across the cohorts. The raw phenotype violated several model assumptions in terms of variance heteroskedasticity, outlying values (influential points), and linearity. The diagnostics suggested improved deviation from model assumption, yet some aspects of the violation remained at a lesser degree. We included a comparison of results before and after transformation and model diagnostics for the lead CpG using CHILD and FAMILY data in the Supplementary Materials. The following was added to the results:

      “As a sensitivity analysis, we repeated the analysis for the continuous smoking exposure under rank transformation vs. raw phenotype for the associated CpG in GFI1 and examined the regression diagnostics (Supplementary Material), and found that the model under rank-transformation deviated less from assumptions.”

      (39) Page 19, line 418: FDR seems quite a lenient threshold, especially when genome-wide significance thresholds exist. I would be inclined to view the EWAS findings as null.

      The choice of use FDR to was indeed arbitrary as there has been no consensus on what significance threshold, if any, should be used in the context of EWAS. The significance threshold for GWAS (Pe’er et al., 2008) probably does not apply directly to EWAS as the number of effective tests will likely differ between genome-wide genetic variants and CpGs. The Bonferroni corrected p-value threshold in this context would be 0.05/200,050=2.5´10-7, which is still less stringent than the GWAS significance threshold. We originally decided to follow the convention of previous studies and use FDR to filter out a subset of plausible associations to contrast the top association signals for their effects between different populations and with reported effect sizes.

      We have revised the manuscript throughout by removing the notion of significant associations, and instead used the phrase “top associated signals” or “top associations” when discussion EWAS results for individual CpGs. The following was added to Materials and Methods to clarify the choice of our threshold:

      “For each EWAS or meta-analysis, the false discovery rate (FDR) adjustment was used to control multiple testing and we considered CpGs that passed an FDR-adjusted p-value < 0.05 to be relevant for maternal smoking.”

      (40) I do not understand Supplementary Figure 6 - how have the data been standardised? Why not plot the CpGs on the beta-value scale?

      The standardized values were plotted as the reported p-values for the mean and variance equality tests (i.e. ANOVA F-test, Levene’s test, Anderson-Darling test) were based on these transformed values to reduce inflation due to non-normality. We have since removed this comparison and kept only the comparison of the overall score as the number of CpGs in the HM450k score (143 CpGs) for comparison is too high to be visually interpretable.

      (41) It is my understanding, that the MRS for maternal smoking was constructed using external weights projected and regularised using elastic net (effectively trained) in CHILD cohort. The results section discusses associations between maternal smoking history and outcomes in CHILD, FAMILY, and START. Training and testing the score in the same sample (cohort) may result in overfitting and therefore should not be implemented.

      The original MRS was constructed using external weights from an independent discovery sample (Joubert et al., 2016; n > 5,000) and the LASSO validation was done in CHILD (n = 352), external testing was in FAMILY and START. This was the lassosum framework whereby we leverage larger sample size from external studies to select more plausible CpGs as candidates to include in the model. Thus, training, validation, and testing were not done in the same samples. We have included a Figure 1 to illustrate the updated analytical flow and a graphical abstract to summarize the methods.

      (42) Is it a concern that the findings don't seem to replicate Joubert's results, which came from a much larger study?

      Replication is usually done in samples much larger than the discovery samples, thus it is not a concern that we were unable to confirm all signals from Joubert et al., (2016). However, 6/7 of the top associations (FDR adjusted p-value < 0.05) in the meta-analysis were declared as significant in Joubert et al. (2016). In addition, the fact that using Joubert’s summary statistics, we were able to derive MRSs that were strongly associated with both smoking history and weekly exposure suggests shared signals. Also see response to  R1 comment #16 for a comparison of effect consistency.

      (43) Please check that all analysis scripts have been uploaded to Github and that the EWAS results are publicly available.

      We thank the reviewer for this suggestion. All updated scripts and EWAS results are available on Github. We are working to have the results also submitted to EWAS catalog.

      Reviewer #2 (Recommendations For The Authors):

      The impact of this study is reduced due to previous findings:

      (1) Previous studies have already shown that DNA methylation may mediate the effect of maternal smoking on birth size/weight (see e.g.https://doi.org/10.1098/rstb.2018.0120https://doi.org/10.1093/ije/dyv048).

      We thank the reviewer for this point and would like to take the opportunity to clarify that it was not our objective to examine whether there was a causal relationship, between DNA methylation and birth size that was mediated by maternal smoking. One of the key messages of our study is to evaluate whether epigenetic associations – at individual CpGs and aggregated as a score – are consistent between white European and South Asian populations. One way to examine this is through using established DNAm signatures of maternal smoking, which is known to mediate birth size and weight in white Europeans (these references were cited in the original manuscript), and confirm whether they also carry the same effect on birth outcomes in the South Asian population.

      Indeed, our results support that maternal smoking methylation score was consistently associated with negative outcomes in newborns of both white European and South Asian mothers despite no maternal smoking was present in South Asian mothers. These collective point to the possibility that the maternal smoking MRS was capturing a lot more than just smoking and second-hand smoking, but potentially other environmental exposures that also lead to oxidative stress. These together are associated with health consequences, including reduced birth size/weight. One of the candidates for such exposure is air pollution as some of the maternal smoking CpGs were previously linked to air pollution. However, we were unable to assess this hypothesis directly without the air pollution data, and the air pollution methylation score was not associated with smoking history (Supplementary Figure 5) nor smoking exposure (p > 0.4 in CHILD, FAMILY and START).

      The following was added to Materials and Methods under the subsection Using DNA Methylation to Construct Predictive Models for Maternal Smoking:

      “To benchmark and compare with existing maternal smoking MRSs, we calculated the Reese score using 28 CpGs (48,49),  Richmond score using 568 CpGs (49), Rauschert score using 204 CpGs (50), Joubert score using all 2,620 CpGs with evidence of association for maternal smoking (19), and finally a three-CpG score for air pollution (51). The details of these scores and score weight can be found in Supplementary Table 4.”

      The following was added to Results

      “Both produced methylation scores that were significantly associated with maternal smoking history (ANOVA F-test p-values =1.0×10-6 and 2.4×10-14 in CHILD and  6.9×10-16 and <2.2×10-16 in FAMILY), and the best among alternative scores for CHILD and FAMILY (Supplementary Table 5). With the exception of the air pollution MRS, all remaining scores were marginally associated with smoking history in both CHILD and FAMILY (Supplementary Figure 5).”

      (2) Due to the small study size and low levels of prenatal smoke exposure, the model derived here is of little value and is, in fact, superseded by a previously published model (PMID: 27323799). At the very least, the model should be evaluated here. A novel aspect of this study is the inclusion of a South Asian cohort. Unfortunately, smoke exposure is practically non-existent, so it is unclear how it can be used. The more interesting finding in this study is the possibility that environmental factors such as second-hand smoke or pollution may have similar effects on pregnancies as maternal smoking. Are these available? If so, they could be evaluated for associations with DNA methylation. This would be novel. 

      In the revised manuscript, we included the Reese score (Reese et al., 2017) and a few other maternal smoking scores for comparison. In the CHILD cohort, the performance was comparable to our derived score (AUC of 0.95 vs. 0.94 for Reese score), but its applicability was limited since the FAMILY dataset was profiled using a targeted array and only 7 out of 28 of the CpGs in the Reese score were available (AUC of 0.89 vs. 0.72 for Reese). As compared to the remaining scores from literature (see the new Supplementary Table 5 for complete results), Reese’s score has generally favorable performance.

      We did examine second-hand smoking in the original manuscript, showing a significant association with weekly maternal smoking exposure (original Table 3 and Supplementary Table 8). However, air pollution data is not available for assessment.

      (3) The other novel aspect is the evaluation of associations with outcomes later in life. Height and weight are interesting but impact could be gained by including other relevant outcomes such as birth complications, asthma, and intellectual impairment which are known to be associated with prenatal smoking. 

      We thank the reviewer for bring up this point. One of the key health outcomes in the CHILD study was asthma, and data at later time points are available. However, we do not have similar outcomes collected in the other two studies (FAMILY and START), which focused on cardiometabolic health in young children. Thus, we did not initially include outcomes that were not available across all cohorts as the intention was to contrast the effects between populations.

      We recognize that this is an important question and decided to provide the association results for mother reported asthma and allergy, but based on different definitions as these outcomes cannot be harmonized across the cohorts. We also included mode of delivery via emergency C-section as an additional proxy outcome of birth complication.

      The following was added to Materials and Methods:

      “Mode of delivery (emergency c-section vs. other) was collected at the time of delivery.”

      “Additional phenotypes included smoking exposures (hours per week) at home, potential allergy based on mother reporting any of: eczema, hay fever, wheeze, asthma, food allergy (egg, cow milk, soy, other) for her child in FAMILY and START, and asthma based on mother’s opinion in CHILD (“In your opinion, does the child have any of the following? Asthma”).”

      The following was added to Results:

      “The maternal smoking MRS was consistently associated with increasing weekly smoking exposure in children reported by mothers at the 1-year (0.51±0.15, FDR adjusted p= 0.0052) , 3-year (0.53±0.16, FDR adjusted p= 0.0052), and 5-year (0.40±0.15, FDR adjusted p= 0.021) visits with similar effects.”

      “We did not find any association with self-reported allergy or asthma in children at later visits (Supplementary Table 8). Further, there was no evidence of association between the MRS and any maternal outcomes (Supplementary Table 8).”

      REFERENCES:

      Gondalia, R., Baldassari, A., Holliday, K. M., Justice, A. E., Stewart, J. D., Liao, D., . . . Whitsel, E. A. (2021). Epigenetically mediated electrocardiographic manifestations of sub-chronic exposures to ambient particulate matter air pollution in the Women's Health Initiative and Atherosclerosis Risk in Communities Study. Environ Res, 198, 111211. doi:10.1016/j.envres.2021.111211

      Joubert, B. R., Felix, J. F., Yousefi, P., Bakulski, K. M., Just, A. C., Breton, C., . . . London, S. J. (2016). DNA Methylation in Newborns and Maternal Smoking in Pregnancy: Genome-wide Consortium Meta-analysis. Am J Hum Genet, 98(4), 680-696. doi:10.1016/j.ajhg.2016.02.019

      Martin, J. A., Osterman, M. J. K., & Driscoll, A. K. (2023). Declines in Cigarette Smoking During Pregnancy in the United States, 2016-2021. NCHS Data Brief(458), 1-8. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/36723453

      Reese, S. E., Zhao, S., Wu, M. C., Joubert, B. R., Parr, C. L., Haberg, S. E., . . . London, S. J. (2017). DNA Methylation Score as a Biomarker in Newborns for Sustained Maternal Smoking during Pregnancy. Environ Health Perspect, 125(4), 760-766. doi:10.1289/EHP333

    1. Reviewer #2 (Public Review):

      Overview

      In this work, Manley and Vaziri investigate the neural basis for variability in the way an animal responds to visual stimuli evoking prey-capture or predator-avoidance decisions. This is an interesting problem and the authors have generated a potentially rich and relevant data set. To do so, the authors deployed Fourier light field microscopy (Flfm) of larval zebrafish, improving upon prior designs and image processing schemes to enable volumetric imaging of calcium signals in the brain at up to 10 Hz. They then examined associations between neural activity and tail movement to identify populations primarily related to the visual stimulus, responsiveness, or turn direction - moreover, they found that the activity of the latter two populations appears to predict upcoming responsiveness or turn direction even before the stimulus is presented. While these findings may be valuable for future more mechanistic studies, issues with resolution, rigor of analysis, clarity of presentation, and depth of connection to the prior literature significantly dampen enthusiasm.

      Imaging

      - Resolution: It is difficult to tell from the displayed images how good the imaging resolution is in the brain. Given scattering and lensing, it is important for data interpretation to have an understanding of how much PSF degrades with depth.

      - Depth: In the methods it is indicated that the imaging depth was 280 microns, but from the images of Figure 1 it appears data was collected only up to 150 microns. This suggests regions like the hypothalamus, which may be important for controlling variation in internal states relevant to the behaviors being studied, were not included.

      - Flfm data processing: It is important for data interpretation that the authors are clearer about how the raw images were processed. The de-noising process specifically needs to be explained in greater detail. What are the characteristics of the noise being removed? How is time-varying signal being distinguished from noise? Please provide a supplemental with images and algorithm specifics for each key step.

      - Merging: It is noted that nearby pixels with a correlation greater than 0.7 were merged. Why was this done? Is this largely due to cross-contamination due to a drop in resolution? How common was this occurrence? What was the distribution of pixel volumes after aggregation? Should we interpret this to mean that a 'neuron' in this data set is really a small cluster of 10-20 neurons? This of course has great bearing on how we think about variability in the response shown later.

      - Bleaching: Please give the time constants used in the fit for assessing bleaching.

      Analysis

      - Slow calcium dynamics: It does not appear that the authors properly account for the slow dynamics of calcium-sensing in their analysis. Nuclear-localized GCaMP6s will likely have a kernel with a multiple-second decay time constant for many of the cells being studied. The value used needs to be given and the authors should account for variability in this kernel time across cell types. Moreover, by not deconvolving their signals, the authors allow for contamination of their signal at any given time with a signal from multiple seconds prior. For example, in Figure 4A (left turns), it appears that much of the activity in the first half of the time-warped stimulus window began before stimulus presentation - without properly accounting for the kernel, we don't know if the stimulus-associated activity reported is really stimulus-associated firing or a mix of stimulus and pre-stimulus firing. This also suggests that in some cases the signals from the prior trial may contaminate the current trial.

      - Partial Least Squares (PLS) regression: The steps taken to identify stimulus coding and noise dimensions are not sufficiently clear. Please provide a mathematical description.

      - No response: It is not clear from the methods description if cases where the animal has no tail response are being lumped with cases where the animal decides to swim forward and thus has a large absolute but small mean tail curvature. These should be treated separately.

      Results

      - Behavioral variability: Related to Figure 2, within- and across-subject variability are confounded. Please disambiguate. It may also be informative on a per-fish basis to examine associations between reaction time and body movement.

      - Data presentation clarity: All figure panels need scale bars - for example, in Figure 3A there is no indication of timescale (or time of stimulus presentation). Figure 3I should also show the time series of the w_opt projection.

      - Pixel locations: Given the poor quality of the brain images, it is difficult to tell the location of highlighted pixels relative to brain anatomy. In addition, given that the midbrain consists of much more than the tectum, it is not appropriate to put all highlighted pixels from the midbrain under the category of tectum. To aid in data interpretation and better connect this work with the literature, it is recommended that the authors register their data sets to standard brain atlases and determine if there is any clustering of relevant pixels in regions previously associated with prey-capture or predator-avoidance behavior.

      Interpretation

      - W_opt and e_1 orthogonality: The statement that these two vectors, determined from analysis of the fluorescence data, are orthogonal, actually brings into question the idea that true signal and leading noise vectors in firing-rate state-space are orthogonal. First, the current analysis is confounding signals across different time periods - one could assume linearity all the way through the transformations, but this would only work if earlier sources of activation were being accounted for. Second, the transformation between firing rate and fluorescence is most likely not linear for GCaMP6s in most of the cells recorded. Thus, one would expect a change in the relationship between these vectors as one maps from fluorescence to firing rate.

      - Sources of variability: The authors do not take into account a fairly obvious source of variability in trial-to-trial response - eye position. We know that prey capture responsiveness is dependent on eye position during stimulus (see Figure 4 of PMID: 22203793). We also expect that neurons fairly early in the visual pathway with relatively narrow receptive fields will show variable responses to visual stimuli as the degree of overlap with the receptive field varies with eye movement. There can also be small eye-tracking movements ahead of the decision to engage in prey capture (Figure 1D, PMID: 31591961) that can serve as a drive to initiate movements in a particular direction. Given these possibilities indicating that the behavioral measure of interest is gaze, and the fact that eye movements were apparently monitored, it is surprising that the authors did not include eye movements in the analysis and interpretation of their data.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Hoops et al. showed that Netrin-1 and UNC5c can guide dopaminergic innervation from nucleus accumbens to cortex during adolescence in rodent models. 

      We showed this with respect to Netrin-1 only. With respect to UNC5c, we showed that the timing of its expression suggests that it may be involved, but did not conduct the UNC5cmanipulation experiments necessary to prove it. We state this clearly in the manuscript.

      They found that these dopamine axons project to the prefrontal cortex in a Netrin-1 dependent manner and knocking down Netrin-1 disrupted motor and learning behaviors in mice. 

      We would like to clarify that we did not show that learning or motor behaviors are affected. We showed that inhibitory control, measured in the Go/No-Go task, is altered in adulthood.

      Furthermore, the authors used hamsters, a seasonal model that is affected by the length of daylight, to demonstrate that the guidance of dopamine axons is mediated by the environmental factor such as daytime length and in sex dependent manner. 

      We agree with this characterization of our hamster experiments, but want to emphasize that it is the timing of the adolescent dopamine axon input to the prefrontal cortex what is impacted by daytime length in a sex dependent manner.

      Regarding the cell type specificity of Netrin-1 expression, the authors began by stating "this question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present." This statement contradicts the exact issue regarding the specificity issue I raised.

      We are not sure why the identities of the cell types expressing Netrin-1 are at issue. As a secreted protein, Netrin-1 can be attached to the extracellular cell surface or in the extracellular matrix, where it interacts with its receptors, which are embedded in the cell surfaces of growing axons (Finci et al., 2015; Rajasekharan & Kennedy, 2009). Netrin-1 is expressed by a wide variety of cell types, for example it is expressed in medium spiny neurons in the striatum of rodents as well as in cholinergic neurons (Shatzmiller et al., 2008). However, we cannot see why showing exactly what type(s) of cells have Netrin-1 on their surfaces, or have secreted them into the matrix, would be at issue for our study.

      They then went on to show the RNAscope data for Netrin-1 in Figure 2, which showed Netrin-1 mRNA was actually expressed quite ubiquitously in anterior cingulate cortex, dorsopeduncular cortex, infralimbic cortex, prelimbic cortex, etc. 

      Figure 2 - this is referring to Author response image 2 of our first response to reviewers.

      We agree that Netrin-1 mRNA is present throughout the forebrain. In particular, its presence in the regions mentioned by Reviewer #1 is a key component of our theory for how dopamine axons grow to the prefrontal cortex in adolescence.

      In addition, contrary to the authors' statement that Netrin-1 is a "secreted protein", the confocal images in Figure 1 in the rebuttal letter actually show Netrin-1 present in "granule-like" organelles inside the cytoplasm of neurons. 

      The rebuttal letter’s Figure 1 is not sufficient to determine the subcellular location of the Netrin-1, however we agree that it is likely that Netrin-1 is present in the cytoplasm of neurons. Indeed, its presence in vesicles in the cytoplasm is to be expected as this is a common mechanism for cells to secrete proteins into the extracellular space (Glasgow et al., 2018). We are not sure whether Reviewer #1’s “granule-like” organelles are in fact secretory vesicles or not, and we do not think our immunohistochemical images are an appropriate method by which to determine this kind of question. We find, however, that a detailed characterization of the subcellular distribution of Netrin-1 is beyond the scope of our study. 

      That Netrin-1 is a secreted protein is well-established in the literature (for example, see Glasgow et al., 2018). The confocal images we provide suggest, but do not prove, that it is likely Netrin-1 is present both extracellularly and intracellularly, which is entirely consistent with its synthesis, secretion, and function. It is also consistent with our methodology and findings. 

      Finally, the authors presented Figure 7 to indicate the location where virus expressing Netrin-1 shRNA might be located. Again, the brain region targeted was quite focal and most likely did not cover all the Netrin-1+ brain regions in Figure 2. 

      Figure 2 - this is referring to Author response image 2 of our first response to reviewers.

      Figure 7 - this is referring to Author response image 4 of our first response to reviewers.

      We agree with Reviewer #1’s characterization of our experiment. We intended to interrupt the Netrin-1 pathway to the prefrontal cortex, like removing a bridge along a road. The Netrin-1 signal remained intact along the dopamine axon’s route before and after the location of the viral injection, however it was lost at the site of the virus injection. This is like a road remaining intact on either side of a destroyed bridge, but becoming impassable at the location where the bridge was destroyed. We are glad that Reviewer 1 agrees our experimental design achieved the desired outcome (a focal reduction in Netrin-1 expression).

      Collectively, these results raised more questions regarding the specificity of Netrin-1 expression in brain regions that are behaviorally relevant to this study.

      We do not agree with this assessment. Our manipulation of Netrin-1 expression was highly localized and specific, as Reviewer #1 seems to acknowledge. We are not clear on what questions this might raise that would call into question our findings as described in our manuscript. We have now added the following paragraph to our manuscript:  

      “It remains unknown exactly what types of cells are expressing Netrin-1 along the dopamine axon route, and how this expression is regulated to produce the Netrin-1 gradients that guide the dopamine axons. It also remains unclear where the misrouted axons end up in adulthood. Future experiments aimed at addressing these questions will provide further valuable insight into the nature of the “Netrin-1 pathway”. Nonetheless, our results allow us to conclude that Netrin-1 expressing cells “pave the way” for dopamine axons growing to the medial prefrontal cortex.”

      With respect to the effectiveness of Netrin-1 knockdown in the animals in this study, the authors cited data in HEK293 cells (Cuesta et al., 2020. Figure 2a), which did not include any statistics, and previously published in vivo data in a separate, independent study (Cuesta et al., 2020. Figure 2c). They do not provide any data regarding the effectiveness of Netrin-1 knockdown in THIS study.

      Indeed, we understand the concerns of Reviewer 1 here. This issue was discussed at the time all the experiments (both in the current manuscript and in Cuesta et al., (2020)) were conducted, and we decided that it was sufficient to show the virus was capable of knocking down Netrin-1 in vitro and in vivo in the forebrain. These characterization experiments were published in the first manuscript to present results using the virus, which was Cuesta et al., 2020. However, all experiments from both manuscripts were conducted contemporaneously.

      We do not see how repeating the same characterization experiments again is useful. 

      Similar concerns regarding UNC5C knockdown (points #6, #7, and #8) were not adequately addressed.

      There is no UNC5c knockdown in this manuscript. Furthermore, points #6, #7 and #8 do not deal with UNC5c knockdown. Point #6 is regarding the Netrin-1 virus efficacy, which we discuss above. Points #7 and #8 are requesting numerous additional experiments that we feel are worthy of their own manuscripts, and we do not feel that they call into question the findings we present here. Rather, answering points #7 and #8 would further refine our understanding of how dopamine axons grow to the prefrontal cortex beyond our current manuscript.

      In brief, while this study provides a potential role of Netrin-1-UNC5C in target innervation of dopaminergic neurons and its behavioral output in risk-taking, the data lack sufficient evidence to firmly establish the cause-effect relationship.

      We do not claim a cause-effect relationship here or anywhere in the manuscript. Concrete establishment of a cause-effect relationship will require several more manuscripts worth of experiments.

      Reviewer #2 (Public Review):

      In this manuscript, Hoops et al., using two different model systems, identified key developmental changes in Netrin-1 and UNC5C signaling that correspond to behavioral changes and are sensitive to environmental factors that affect the timing of development. They found that Netrin-1 expression is highest in regions of the striatum and cortex where TH+ axons are travelling, and that knocking down Netrin-1 reduces TH+ varicosities in mPFC and reduces impulsive behaviors in a Go-No-Go test. 

      We want to point out that we examined the Netrin-1 expression in the septum rather than the striatum but otherwise feel the above description is accurate.

      Further, they show that the onset of Unc5 expression is sexually dimorphic in mice, and that in Siberian hamsters, environmental effects on development are also sexually dimorophic. This study addresses an important question using approaches that link molecular, circuit and behavioral changes. Understanding developmental trajectories of adolescence, and how they can be impacted by environmental factors, is an understudied area of neuroscience that is highly relevant to understanding the onset of mental health disorders. I appreciated the inclusion of replication cohorts within the study.

      We appreciate Reviewer #2’s comments, which we feel accurately describe our experimental approach and findings, including their limitations.

      Reviewer #3 (Public Review):

      This study from the Flores group aims at understanding neuronal circuit changes during adolescence which is an ill-defined, transitional period involving dramatic changes in behavior and anatomy. They focus on DA innervation of the prefrontal cortex, and their interaction with the guidance cue Netrin1. They propose DA axons in the PFC increase in the postnatal period, and their density is reduced in a Netrin 1 knockdown, suggesting that Netrin abets the development of this mesocortical pathway. 

      We feel it necessary to point out that we are not the first to propose that dopamine axons in the prefrontal cortex increase in the postnatal period.  This is well-established and was first documented in rodents in the 1980s (Kalsbeek et al., 1988). Otherwise we agree with Reviewer 3’s characterization.

      In such mice impulsivity gauged by a go-no go task is reduced. They then provide some evidence that Unc5c is developmentally regulated in DA axons. Finally they use an interesting hamster model, to study the effect of light hours on mesocortical innervation, and make some interesting observations about the timing of innervation and Unc5c expression, and the fact that females housed in winter day length conditions display an accelerated innervation of the prefrontal cortex.

      We agree with Reviewer #3’s characterization of our study and findings here.

      Comments on the revision. Several points were addressed; some remain to be addressed.

      (4) It's not clear to me that TH doesnt stain noradrenergic axons in the PFC. See Islam and Blaess, 2021, and references therein.

      Presuming that Reviewer #3 is referring to Islam et al. (2021), the review they cite supports our position that TH-stained axons in the forebrain are by-and-large dopamine axons.

      Nonetheless, Islam et al. do point out that it is important to keep in mind that TH-positive axons have a slight possibility of being noradrenaline axons. We are very conscious of this possibility and are careful to minimize this risk. As we state in the methods, we only examine axons that are morphologically consistent with dopamine axons and are localized to areas within the forebrain where dopamine axons are known to innervate, in addition to being THpositive. The localization and morphology of noradrenaline axons in the forebrain is different from that of dopamine axons. This is stated in our methods on lines 76-94, where we describe in detail the differentiation between dopamine and norepinephrine axons and include a full list of relevant citations.

      (6) The Netrin knockdown data provided is from a previous study/samples.

      Indeed, however the experiments for the two manuscripts were conducted contemporaneously. We believe two sets of validation experiments are not required.

      (8) While the authors make the argument that the behavior is linked to DA, they still haven't formally tested it, in my opinion.

      We agree that we have not formally tested this link. However, we disagree that we claim to have established a formal link in our manuscript.

      (1). Fig 3, UNc 5c  levels are not yet quantified. Furthermore, I agree with the previous reviewer that Unc5C knockdown would corroborate key aspects of the model.

      We present UNC5c quantities for mice in our first response to reviewers (Figure 11 therein) however we did not do so for the hamsters due to the time involved. We are planning further experiments with the hamsters and may include quantification of UNC5c in the nucleus accumbens at such time. However, we do not feel its absence from this manuscript calls into question our findings.

      With regards to the UNC5c knockdown, we agree it would be an informative extension of our findings here, but again we do not feel that it is necessary to corroborate our current findings.

      New - Developmental trajectory of prefrontal TH-positive axons from early adolescence to adulthood is similar in male and female rats, (Willing Juraska et al., 2017). This needs discussion.

      Willing et al. (2017) reported an increase in prefrontal dopamine density during adolescence in male and female rats, with a non-significant trend towards an earlier increase in females.

      This is in line with our current results in mice indicating that the timing of dopamine axon targeting and growth is sex specific. We are currently testing this idea directly using intersectional viral tracing methods. We now added the following sentence to the manuscript: 

      “Differences in the precise timing of dopamine innervation to the PFC in adolescence have been suggested by findings reported in male and female rats (Willing et al., 2017)”.

      References

      Brignani, S., Raj, D. D. A., Schmidt, E. R. E., Düdükcü, Ö., Adolfs, Y., Ruiter, A. A. D., Rybiczka-Tesulov, M., Verhagen, M. G., Meer, C. van der, Broekhoven, M. H., MorenoBravo, J. A., Grossouw, L. M., Dumontier, E., Cloutier, J.-F., Chédotal, A., & Pasterkamp, R. J. (2020). Remotely Produced and Axon-Derived Netrin-1 Instructs GABAergic Neuron Migration and Dopaminergic Substantia Nigra Development. Neuron, 107(4), 684-702.e9. https://doi.org/10.1016/j.neuron.2020.05.037

      Cuesta, S., Nouel, D., Reynolds, LM, Morgunova, A., Torres-Berrio, A., White, A., Hernandez, G., Cooper, HM, Flores, C. (2020). Dopamine axon targeting in the nucleus accumbnes in adolescence requires Netrin-1. Frontiers in Cell and Developmental Biology, 8,  doi:10.3389/fcell.2020.00487

      Finci, L., Zhang, Y., Meijers, R., & Wang, J. H. (2015). Signaling mechanism of the netrin-1 receptor DCC in axon guidance. Progress in Biophysics and Molecular Biology, 118(3), 153-160. https://doi.org/10.1016/j.pbiomolbio.2015.04.001

      Glasgow, S. D., Labrecque, S., Beamish, I. V., Aufmkolk, S., Gibon, J., Han, D., Harris, S. N., Dufresne, P., Wiseman, P. W., McKinney, R. A., Séguéla, P., Koninck, P. D., Ruthazer, E. S., & Kennedy, T. E. (2018). Activity-Dependent Netrin-1 Secretion Drives Synaptic Insertion of GluA1-Containing AMPA Receptors in the Hippocampus. Cell Reports, 25(1),

      168-182.e6. https://doi.org/10.1016/j.celrep.2018.09.028

      Islam, K. U. S., Meli, N., & Blaess, S. (2021). The Development of the Mesoprefrontal Dopaminergic System in Health and Disease. Frontiers in Neural Circuits, 15, 746582. https://doi.org/10.3389/fncir.2021.746582

      Kalsbeek, A., Voorn, P., Buijs, R. M., Pool, C. W., & Uylings, H. B. M. (1988). Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology, 269(1), 58–72. https://doi.org/10.1002/cne.902690105

      Rajasekharan, S., & Kennedy, T. E. (2009). The netrin protein family. Genome Biology, 10(9), 239. https://doi.org/10.1186/gb-2009-10-9-239

      Shatzmiller, R. A., Goldman, J. S., Simard-Émond, L., Rymar, V., Manitt, C., Sadikot, A. F., & Kennedy, T. E. (2008). Graded expression of netrin-1 by specific neuronal subtypes in the adult mammalian striatum. Neuroscience, 157(3), 621–636. https://doi.org/10.1016/j.neuroscience.2008.09.031

      Willing, J., Cortes, L. R., Brodsky, J. M., Kim, T., & Juraska, J. M. (2017). Innervation of the medial prefrontal cortex by tyrosine hydroxylase immunoreactive fibers during adolescence in male and female rats. Developmental Psychobiology, 59(5), 583–589. https://doi.org/10.1002/dev.21525

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02235

      Corresponding author: Adriano, Aguzzi

      1. General Statements

      We thank the reviewers for providing valuable comments. We are pleased that our study is considered important to advance the knowledge on IL-1-independent inflammatory functions of inflammasomes. We have clarified and revised the manuscript (track changed) as detailed below in the point-by-point response in this letter.

      2. Point-by-point description of the revisions

      Referee 1

      General: In this manuscript, et al., investigates the role of the inflammasome adapter ASC (in AA amyloidosis). This condition involves the aggregation of serum amyloid A (SAA) and is linked to chronic inflammation. Firstly, I can directly say that I do recommend this study for publication. This is a well conducted and well-written study which advances the knowledge on IL-1-independent inflammatory functions of inflammasomes. Furthermore, I find it particularly impressive that despite the inflammasome research community is well aware that amyloidosis is a hallmark of inflammatory diseases, it took a neuroscientist specialized in prion diseases to raise the question whether ASC would be involved in seeding serum AA aggregation. Key findings include: • ASC forms extracellular aggregates that enhance SAA aggregation, as observed through superresolution microscopy. • In a mouse model, the absence of ASC significantly reduced amyloid load, not due to increased phagocytosis but likely due to diminished aggregation. • Treatment with anti-ASC antibodies reduced amyloid load and mitigated weight loss in mice with AA amyloidosis. These findings suggest that ASC plays a crucial role in AA amyloidosis and that targeting ASC could be a potential therapeutic strategy. The study expands our understanding of the involvement of ASC in proteinopathies beyond neural diseases, pointing to its role in systemic conditions like AA amyloidosis.

      __Significance: __In conclusion, this manuscript offers valuable insights into the role of ASC in AA amyloidosis, presenting compelling findings that support its potential as a therapeutic target. Addressing the mentioned concerns and making the suggested revisions will further enhance the manuscript's scientific rigor and impact. Overall, this study is a valuable contribution to the field of inflammasome research and its relevance in systemic conditions like AA amyloidosis.

      Comment 1: Overall, the experiments are well-conducted and mostly all controls I would expect were included. With few exceptions, the data is convincing. With that said, I have issues with some of the staining employed in Fig 1. In Fig. 1, the authors assess ASC staining in cardiac tissues from a patient with vasculitis and systemic inflammation-related AA amyloidosis, and a control patient who died of a heart attack but had no signs of amyloidosis. However, most of the data shown is related to the AL177 anti-ASC. More importantly, no isotype stainings are included. We have previously demonstrated that the AL177 anti-ASC, used here, reacts quite strongly with ASC−/− cells, and it is one of the less specific anti-ASC commercially available (PMID: 27221487). As this is data from one patient (understandably), I wonder if the authors could counterstain ASC in the same samples using a specific human anti-ASC with a different color (ex: Biolegend HASC), and confirm that the signal overlays with the AL-177.

      Response: We conducted additional experiments to address the anti-ASC antibody specificity, as now described in Results, Method, and Fig. S1. We tested a set of anti-ASC antibodies (AL177, MY6745, 1C3D7) for their ASC specificity. We confirmed that both the AL177 and the MY6745 antibodies have high ASC-specificity (Fig. S1A). Moreover, for illustration purposes (and to warn other scientists), we included a third anti-ASC antibody (1C3D7) found to be unspecific as it yielded a strong signal in PYCARD-/- (ASC-/-) THP-1 cells (Fig. S1B). In addition, isotype controls were included in these experiments (Fig. S1A, right panels), as suggested by the reviewer, showing no target protein detection in both, PYCARD+/+ (ASC+/+) and PYCARD-/- cells underscoring the anti-ASC specificity of AL177 and MY6745 antibodies.

      • *

      Comment 2: Finally, in Figure 1H it seens from the description that another anti-ASC was used: "referred in the legend as ASC (MAB ASC, Yellow)". Is this a monoclonal anti-ASC? Also, the images show large and bright antibody aggregates (middle of the image, top left corner behind the "H", and a massive fluorescence in the bottom right of the image), indicating the presence of staining artifacts. Again, no counterstaining with isotype controls are shown.

      Response: We apologize for the confusing jargon in Figure 1H. “MAB ASC” refers to the anti-ASCPYD antibody (MAB/MY6745). We have corrected the antibody terminology in the legend. MAB/MY6745 is a monoclonal antibody generated by Mabylon that is highly reactive to both human and murine ASC. This antibody was generated to 1) perform an immunotherapy in vivo study and to 2) be used as alternative specific antibody in addition to AL177 to show co-localization of SAA and ASC in a human AA patient using STED superresolution microscopy. MAB/MY6745 is a rabbit monoclonal anti-ASC antibody targeting the pyrin domain (PYD) from which the rabbit Fcγ domain was replaced with that of a mouse IgG2a domain to avoid xenogeneic anti-drug responses in recipients and to improve its effector functions in vivo. To examine possible staining artefacts which can occur with Formalin-Fixed Paraffin-Embedded (FFPE) human tissues, we assessed the specificity of a variety of anti-ASC antibodies (Fig. S1). Our data presented in Fig. S1 show that the monoclonal anti-ASC antibody binds specifically. It is conceivable that AL177 and MAB/MY6745 target different epitopes of ASC, resulting in different staining patterns. An isotype control, included in __Fig. S1, __was used to test the specificity of the secondary antibodies, and did not show any nonspecific staining. We have adapted and added this to the text body and figure legend accordingly.

      Comment 3: Overall, although I don't dispute the possibility that ASC would co-localize with SAA deposits, I don't think the data presented can safely sustain that claim. I would, therefore, suggest that alternative methods to be employed to substantiate these conclusions: Supposedly, would it be possible to immuno-precipitate (IP) amyloid SAA and assess ASC via western blotting? As well as IP ASC and detect SAA? Or use DSS-crosslinking to find ASC oligomers in tissue areas rich in SAA?

      Response: In addition to assessing co-localization by means of STED superresolution microscopy (Fig. 1), we also employed LiP-MS with various forms of ASC (monomeric and ASC specks) and identified a previously unrecognized biophysical interaction of SAA and the ASC PYD domain (Fig. 2C-F). As an orthogonal line of evidence, we provided kinetic data showing that SAA aggregation is enhanced in the presence of ASC specks (Fig. 2A-B). We feel that these results are reasonably convincing, but we agree that co-localization is almost invariably an aspirational finding, and even superresolution microscopy cannot fully exclude the presence artifacts (nor can, in fairness, co-immunoprecipitation, which must often rely on overexpression). A sentence acknowledging this limitation was added to the Discussion.

      Comment 4: For example, it would be reasonable to quantify the results in Figure 3G and providing clarification regarding the controls in the figure legend. Though there is significantly less SAA in spleen homogenates from Asc−/−, there also seems to be the case for b-actin in Fig 3G. Moreover, in the figure legend the authors state: "...Spleen homogenate from untreated (-ctrl) and AA+ (+ctrl) C57BL/6 wt mice from an independent experiment served as negative and positive control, respectively." I don't know what the authors mean with that. Is this a montage, or samples from different experiments were run together in one blot? And if so, for what reason? This is confusing and should be clarified.

      Response: We reworded the figure legend to provide clarity about the technical assay controls and adjusted the labels in Fig. 3E __accordingly: To ascertain SAA antibody functionality, mouse spleen homogenate from independently obtained and Congo red-confirmed AA+ tissue served as positive, whereas non-induced (AA-) spleen tissue served as negative technical controls. (__Fig 3E). We decided to show the two (positive/AA+ and negative/AA-) technical controls in Fig. 3E.

      Comment 5: Furthermore, in the Abstract, a slight rephrasing is suggested to accurately describe ASC specks as molecular aggregates formed inside cells, which are subsequently released into the extracellular space.

      Response: We thank the referee for bringing this to our attention. We rephrased the abstract accordingly.

      Comment 6: Lastly, enhancing the text size in figures, particularly in Fig 3, is advised to improve legibility and overall clarity.

      Response: The legibility and style of main Fig. 3 text sizes has been changed and additional figure formatting has been performed.

      Referee 2

      General: The manuscript by Losa et al., investigates whether ASC is involved in serum AA amyloidosis. The authors report that ASC colocalizes with SAA in human AA amyloidosis and that purified ASC specks accelerate SAA fibril formation in vitro. In addition, splenic AA amyloid was decreased in Pycard-/- mice compared to Pycard+/+ mice and that treatment with anti-ASC antibodies decreased amyloid loads in Pycard+/+ mice. Lastly, they analyzed serum of 19,334 patients to show that the prevalence of anti-ASC antibodies did not correlate with any specific disease. The authors conclude that ASC to play a role in extraneural proteinopathies of humans and experimental animals and suggest that anti-ASC immunotherapy may contribute to resolving such diseases. The findings in the study are novel and demonstrate a new role for ASC in aggregation proteinopathies. However, there are number of issues that need to be addressed before acceptance for publication.

      Significance: __The findings in the study are novel and demonstrate a new role for ASC in aggregation proteinopathies. This study reports a crucial role for ASC in SAA interaction and recruitment, SAA serum level modulation, SAA fibril formation acceleration, and controlling the extent of inflammation associated amyloidosis with respect to AA amyloid deposition __

      Comment 1: Figure 3 E depicts Western blots of monomeric SAA in spleen of Pycard+/+ and Pycard-/- mice. The authors should include immunoblots depicting the levels of ASC in these tissues and to demonstrate that the Pycard-/- mice lack ASC.

      Response: We did not perform ASC immunoblots for Pycard-/- and Pycard+/+ mice since the absence of the ASC protein in this well-established mouse line has been demonstrated in several key publications, including under inflammation conditions (right side of the figure below, from Mariathasan et al., Nature, 2014). However, we show ASC IHC of Pycard+/+ and Pycard-/- AA+ mice on spleen, confirming the absence of an ASC signal in Pycard-/- mice and its presence in the Pycard+/+ (Fig. 3F). Moreover, our genotyping data confirmed the presence and absence of the Pycard gene in Pycard+/+ and Pycard-/- AA+ mice.

      Comment 2: Fig. 3B shows that at 96 hours after injection there was no difference in SAA serum concentration. How do the authors explain this drop in SAA serum concentration? No explanation is provided.

      Response: Acute-phase response peaks at 24 hours after injury (i.e., Kushner I, 1982; Gabay et Kushner, 1999; Gitlin et Colten, 1987, Calif.: Academic Press, 1987:123-53). Beyond 24 hours, acute phase proteins decay over time mirroring the process of tissue integrity restoration and the clearance of the insulting stimuli. This is in line with our data, where the inflammatory injury was induced by subcutaneous AgNO3 injection, resulting in a non-statistical serum SAA difference between the Pycard+/+ and Pycard-/- experimental mice at 96 hours post AgNO3 injection. In addition, the majority of SAA in Pycard+/+ mice was incorporated into amyloid deposit. As suggested by the reviewer we have included this explanation/references into the revised manuscript.

      Comment 3: Figure 4 shows anti-ASC administration reduces amyloid load. The immunoblot in Figure 4C does not represent the quantification of the blot. In fact, there are only 3 samples per treatment group whereas the quantification shows 5-6 animals per group.

      Response: We have performed two independent immunoblots at the same time to perform technical replicates (duplicates). As pointed out by the reviewer, this resulted in 6 samples and data points that were visualized and analyzed in main Fig. 4C. To avoid duplicating data, overloading the main figures with technical replicates, we opted to show only one representative immunoblot in the main Fig. 4C. The other blots are shown in the supplementary figures Fig. S13A and Fig. S13B for full transparency.

      Comment 4: Additionally, the authors have not shown that the drug penetrates the target tissue and how much drug is present in spleen to provide a therapeutic effect. What is the half-life of the drug? These parameters are critical to assess the MOA of the anti-ASC used in these studies.

      Response: To assess the pharmacokinetics of the anti-ASC antibody, we determined its titers in serum by ELISA at various time points up to 96 hpi after the first injection. The anti-ASC antibody serum levels peaked at 24 hpi and declined to about half maximal serum concentration levels at 96 hpi. This serum half-life, for the injected concentration, is in the range of reported kinetic parameters of engineered monoclonal antibodies (e.g., Unverdorben et al., MAbs, 2016; Foss et al., Nat Comm, 2024) (Fig. 4B). Because of the high permeability of splenic red pulp vasculatures, and because of the absence of any selectively permeable barrier, efficacious imbibement of the splenic extracellular space can be plausibly expected. Theoretically, one could perfuse mice intracardially with PBS and then measure antibody in tissue. Such measurements can work relatively well in the brain, which possesses a highly impermeable barrier. However, here we would find it difficult to convince ourselves that such measurements would not be contaminated by residual blood in splenic capillaries that may be difficult to clean up through perfusion. Therefore, we did not measure the antibody levels in the spleen.

      Comment 5: The authors should expand the discussion section to include the work of other groups that have successfully employed anti-ASC antibodies. For example, PMID: 35793783, PMID: 32366256

      Response: We thank the referee for pointing out that literature. We extended the discussion section accordingly and added these important references into the discussion.

      Comment 6: Methods: The authors provide the number of animals employed in the Supplemental Tables 5 and 7. These numbers should be provided in the methods section or in the Figure legends. Additionally, how many replicates were performed for the data in Figure 2?

      Response: __As suggested by the reviewer we now provide the number of animals in the figure legends of main __Fig. 2 and Fig. 3 __in addition to those in Table 5 and Supp Table 7__ to enhance clarity.

      Referee 3____

      General: The manuscript by Losa et al. explores the co-aggregation of ASC with serum amyloid A (SAA) in vivo and in mouse models, It posits that, similar to Amyloid beta, SAA is cross-seeded by ASC foci both in vitro and in vivo. This review only addresses the co-localization and in vitro cross seeding data (Figs. 1 and 2A, B), not the mouse experiments or mass spectrometry data. The manuscript first shows co-deposition of ASC with SAA amyloid. SAA was stained both with Congo red and ThS, both standard dyes for amyloid staining. Figure S2 shows CR birefringence, the hallmark of amyloid deposits. The authors then move to demonstrate co-localization of SAA and ASC in confocal and STED immuno-fluorescence microscopy.

      Significance: The discovery of the role of ASC in Alzheimer's disease generated an exciting new hypothesis to the etiology of sporadic AD, for which the cause is unknown. The current manuscript finds that ASC may also play a role in AA amyloidosis, which is a significant finding.

      Comment 1: Confocal images C-E show overlapping staining of markers for both SAA and ASC. Similarly, STED images show co-aggregation of ASC and SAA in amyloidosis patients. However, since confocal images F and G seem to show overlapping staining of the yellow and magenta channels as well, a careful quantitative analysis of the data I needed. Quantify co-localization (Pearson coefficient) in confocal and STED images. STED images from control patients are missing and need to be included.

      Response: AA amyloidosis is a relatively rare disease, and tissue samples thereof are even rarer. We only had access to the samples of one patient in both control and SAA groups. This limitation prevented us from conducting quantitative analyses. Rather than looking at the Pearson – or, possibly better, Spearman – correlation coefficient, we opted for an unbiased method of correlation in which we reconstructed the picture using 3D surface rendering with the Imaris software (see Fig. 1). From this reconstruction, we exported the barycenter of each surface on a 3D plot for both SAA and ASC markers (see Fig. S2B-C). Each point represents the center of a surface, while the box plots on the sides represent the distribution of the markers in space, demonstrating the overlap of the markers for ASC and SAA. We also understand the suggestion to conduct STED imaging on control samples to show the absence of co-aggregation. However, we could not be sure of which region to capture and how to decide on the focus, as we did not detect strong signal from confocal images of the control sample. Imaging blindly would almost necessarily lead to irrelevant imaging and aberrant comparison. We do not claim any quantitative data out of these images; however, we report an observation. Quantitative and mechanistic co-aggregation data are presented in Fig. 2 using LiP-MS.

      Comment 2: The authors then move on to demonstrate that ASC foci can cross-seed SAA amyloid formation in vitro, by recording SAA aggregation kinetics in the presence and absence of ASC foci. Curves recorded in the presence of ASC foci have accelerated kinetics as shown by a decrease in the time to reach half-maximal fluorescence (t1/2). However, these data (Fig 2A, B) are not very clean. Only three data points out of five curves shown in panel A. are presented in the fitting of the control (yellow) aggregation kinetics in panel B. Why was this done? Panel B shows a significant difference between the control and the kinetics seeded with ASC specks. It looks doubtful that the results are still statistically significant if these data are included, so their exclusion impacts the overall conclusion of the paper. The significance of the cross-seeding results needs to be substantiated experimentally.

      __Response: __The in vitro SAA aggregation assay was performed under established conditions (Claus S et al., EMBO Rep 2017) and the resulting data was processed using the AmyloFit software from the Knowles lab in Cambridge, UK (Meisl G et al., Nat Protoc 2016). The AmyloFit technology uses global fitting resulting in high-accuracy kinetics. Given the software algorithm, only curves that show a sigmoidal ThT fluorescence signal over time can be fitted. Therefore, replicates that do not show aggregation (characteristic ThT signal) over time cannot be fitted. As a result, only three out of six curves could be fitted resulting in three t1/2. Conversely, in the presence of ASC specks, all six replicates aggregated in a dose-dependent manner, and could be fitted perfectly, yielding six t1/2 values. Thus, all available data points are plotted and used for statistical analysis. Moreover, the fact that in presence of ASC specks all SAA replicates aggregated/converted successfully in a dose-dependent manner (whereas in the SAA-only condition some replicates do not aggregate) further underscores the pivotal role of ASC specks in SAA seeding, conversion, and aggregation enhancement.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this potentially useful study, the authors attempt to use comparative meta-analysis to advance our understanding of life history evolution. Unfortunately, both the meta-analysis and the theoretical model is inadequate and proper statistical and mechanistic descriptions of the simulations are lacking. Specifically, the interpretation overlooks the effect of well-characterised complexities in the relationship between clutch size and fitness in birds.

      Public Reviews:

      We would like to thank the reviewers for their helpful comments, which have been considered carefully and have been valuable in progressing our manuscript. The following bullet points summarise the key points and our responses, though our detailed responses to specific comments can be found below:<br /> - Two reviewers commented that our data was not made available. Our data was provided upon submission and during the review process, however was not made accessible to the reviewers. Our data and code are available at https://doi.org/10.5061/dryad.q83bk3jnk.

      - The reviewers have highlighted that some of our methodology was unclear and we have added all the requested detail to ensure our methods can be easily understood.

      - The reviewers highlight the importance of our conclusions, but also suggest some interpretations might be missing and/or are incomplete. To make clear how we objectively interpreted our data and the wider consequences for life-history theory we provide a decision tree (Figure 5). This figure makes clear where we think the boundaries are in our interpretation and how multiple lines of evidence converge to the same conclusions.

      Reviewer #1 (Public Review):

      This paper falls in a long tradition of studies on the costs of reproduction in birds and its contribution to understanding individual variation in life histories. Unfortunately, the meta-analyses only confirm what we know already, and the simulations based on the outcome of the meta-analysis have shortcomings that prevent the inferences on optimal clutch size, in contrast to the claims made in the paper.

      There was no information that I could find on the effect sizes used in the meta-analyses other than a figure listing the species included. In fact, there is more information on studies that were not included. This made it impossible to evaluate the data-set. This is a serious omission, because it is not uncommon for there to be serious errors in meta-analysis data sets. Moreover, in the long run the main contribution of a meta-analysis is to build a data set that can be included in further studies.

      It is disappointing that two referees comment on data availability, as we supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      The main finding of the meta-analysis of the brood size manipulation studies is that the survival costs of enlarging brood size are modest, as previously reported by Santos & Nakagawa on what I suspect to be mostly the same data set.

      We disagree that the main finding of our paper is the small survival cost of manipulated brood size. The major finding of the paper, in our opinion, is that the effect sizes for experimental and observational studies are in opposite directions, therefore providing the first quantitative evidence to support the influential theoretical framework put forward by van Noordwijk and de Jong (1986), that individuals differ in their optimal clutch size and are constrained to reproducing at this level due to a trade-off with survival. We further show that while the manipulation experiments have been widely accepted to be informative, they are not in fact an effective test of whether within-species variation in clutch size is the result of a trade-off between reproduction and survival.

      The comment that we are reporting the same finding as Santos & Nakagawa (2012) is a misrepresentation of both that study and our own. Santos & Nakagawa found an effect of parental effort on survival only in males who had their clutch size increased – but no effect for males who had their clutch size reduced and no survival effect on females for either increasing or reducing parental effort. However, we found an overall reduction in survival for birds who had brood sizes manipulated to be larger than their original brood (for both sexes and mixed sex studies combined). In our supplementary information, we demonstrate that the overall survival effect of a change in reproductive effort is close to zero for males, negative (though non-significant) for females and significantly negative for mixed sexes (which are not included in the Santos & Nakagawa study). Please also note that the Santos & Nakagawa study was conducted over 10 years ago. This means we added additional data (L364-365). Furthermore, meta-analyses are an evolving practice and we also corrected and improved on the overall analysis approach (e.g. L358-359 and L 393-397, and see detailed SI).

      The paper does a very poor job of critically discussing whether we should take this at face value or whether instead there may be short-comings in the general experimental approach. A major reason why survival cost estimates are barely significantly different from zero may well be that parents do not fully adjust their parental effort to the manipulated brood size, either because of time/energy constraints, because it is too costly and therefore not optimal, or because parents do not register increased offspring needs. Whatever the reason, as a consequence, there is usually a strong effect of brood size manipulation on offspring growth and thereby presumably their fitness prospects. In the simulations (Fig.4), the consequences of the survival costs of reproduction for optimal clutch size were investigated without considering brood size manipulation effects on the offspring. Effects on offspring are briefly acknowledged in the discussion, but otherwise ignored. Assuming that the survival costs of reproduction are indeed difficult to discern because the offspring bear the brunt of the increase in brood size, a simulation that ignores the latter effect is unlikely to yield any insight in optimal clutch size. It is not clear therefore what we learn from these calculations.

      The reviewer’s comment is somewhat of a paradox. We take the best studied example of the trade-off between reproductive effort and parental survival – a key theme in life history and the biology of ageing – and subject this to a meta-analysis. The reviewer suggests we should interpret our finding as if there must be something wrong with the method or studies we included, rather than considering that the original hypothesis could be false or inflated in importance. We do not consider questioning the premise of the data over questioning a favoured hypothesis to necessarily be the best scientific approach here. In many places in our manuscript, we question and address, at length, the underlying data and their interpretation (L116-117, L165-167, 202-204 and L277-282). Moreover, we make it clear that we focus on the trade-off between current reproductive effort and subsequent parental survival, while being aware that other trade-offs could counter-balance or explain our findings (discussed on L208-210 & L301-316). Note that it is also problematic, when you do not find the expected response, to search for an alternative that has not been measured. In the case here, of potential trade-offs, there are endless possibilities of where a trade-off might operate between traits. We purposefully focus on the one well-studied and most commonly invoked trade-off. We clearly acknowledge, though, that when all possible trade-offs are taken into account a trade-off on the fitness level can occur and cite two famous studies (Daan et al., 1990 and Verhulst & Tinbergen 1991) that have shown just that (L314-316).

      So whilst we agree with the reviewer that the offspring may incur costs themselves, rather than costs being incurred by the parents, the aim of our study was to test for a general trend across species in the survival costs of reproductive effort. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example, this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest.

      What we do appreciate from the reviewer’s comment is that the interpretation of our findings is complex. Even though our in-text explanation includes the caveats the reviewer refers to, and are discussed at length, their inter-relationships are hard to appreciate from a text format. To improve this presentation and for ease of the reader, we have added a decision tree (Figure 5) which represents the logical flow from the hypothesis being tested through to what overall conclusion can be drawn from our results. We believe this clarifies what conclusions can be drawn from our results. We emphasise again that the theory that trade-offs between reproductive effort and parental survival being the major driver of variation in offspring production was not supported though is the one that practitioners in the field would be most likely to invoke, and our result is important for this reason.

      There are other reasons why brood size manipulations may not reveal the costs of reproduction animals would incur when opting for a larger brood size than they produced spontaneously themselves. Firstly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Secondly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      First, our results did show a survival cost of reproduction for brood manipulations (L107-123, Figure 1, Table 1). Note, however, that much theory is built on the immediate costs of reproduction and, as such, these costs are likely overinterpreted, meaning that our overall interpretation still holds, i.e. “parental survival trade-off is not the major determinative trade-off in life history within-species” (Figure 5).

      We agree with the reviewer that lifetime manipulations could be even more informative than single-year manipulations. Unfortunately, there are currently too few studies available to be able to draw generalisable conclusions across species for lifetime manipulations. This is, however, the reason we used lifetime change in clutch size in our fitness projections, which the reviewer seems to have missed – please see methods line 466-468, where we explicitly state that this is lifetime enlargement. Of course, such interpretations do not include an accumulation of costs that is greater than the annual cost, but currently there is no clear evidence that such an assumption is valid. Such a conclusion can also not be drawn from the study on jackdaws by Boonekamp et al (2014) as the treatments were life-long and, therefore, cannot separate annual from accrued (multiplicative) costs that are more than the sum of the annual costs incurred. Note that we have now included specific discussion of this study in response to the reviewer (L265-269).

      Details of how the analyses were carried out were opaque in places, but as I understood the analysis of the brood size manipulation studies, manipulation was coded as a covariate, with negative values for brood size reductions and positive values for brood size enlargements (and then variably scaled or not to control brood or clutch size). This approach implicitly assumes that the trade-off between current brood size (manipulation) and parental survival is linear, which contrasts with the general expectation that this trade-off is not linear. This assumption reduces the value of the analysis, and contrasts with the approach of Santos & Nakagawa.

      We thank the reviewer for highlighting a lack of clarity in places in our methods. We have added additional detail to the methodology section (see “Study sourcing & inclusion criteria” and “Extracting effect sizes”) in our revised manuscript. Note, that our data and code was not shared with the reviewers despite us supplying this upon submission and again during the review process, which would have explained a lot more of the detail required.

      For clarity in our response, each effect size was extracted by performing a logistic regression with survival as a binary response variable and clutch size was the absolute value of offspring in the nest (i.e., for a bird that laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). The clutch size was also standardised and, separately, expressed as a proportion of the species’ mean.

      We disagree that our approach reduces the value of our analysis. First, our approach allows a direct comparison between experimental and observational studies, which is the novelty of our study. Our approach does differ from Santos & Nakagawa but we disagree that it contrasts. Our approach allows us to take into consideration the severity of the change in clutch size, which Santos & Nakagawa do not. Therefore, we do not agree that our approach is worse at accounting for non-linearity of trade-offs than the approach used by Santos & Nakagawa. Arguably, the approach by Santos & Nakagawa is worse, as they dichotomise effort as increased or decreased, factorise their output and thereby inflate their number of outcomes, of which only 1 cell of 4 categories is significant (for males and females, increased and decreased brood size). The proof is in the pudding as well, as our results clearly demonstrate that the magnitude of the manipulation is a key factor driving the results, i.e. one offspring for a seabird is a larger proportion of care (and fitness) than one offspring for a passerine. Such insights were not achieved by Santos & Nakagawa’s method and, again, did not allow a direct quantitative comparison between quality (correlational) and experimental (brood size manipulation, i.e. “trade-off”) effects, which forms a central part of our argumentation (Figure 5). 

      Our analysis, alongside a plethora of other ecological studies, does assume that the response to our predictor variable is linear. However, it is common knowledge that there are very few (if any) truly linear relationships. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets the range of added chicks required to estimate a non-linear relationship was not available. The question also remains of what the shape of such a non-linear relationship should be and is hard to determine a priori. There is also a real risk when fitting non-linear terms that they are spurious and overinterpreted, as they often present a better fit (denoting one df is not sufficient especially when slopes vary). We have added this detail to our discussion.

      The observational study selection is not complete and apparently no attempt was made to make it complete. This is a missed opportunity - it would be interesting to learn more about interspecific variation in the association between natural variation in clutch size and parental survival.

      We clearly state in our manuscript that we deliberately tailored the selection of studies to match the manipulation studies (L367-369). We paired species extracted for observational studies with those extracted in experimental studies to facilitate a direct comparison between observational and experimental studies, and to ensure that the respective datasets were comparable. The reviewer’s focus in this review seems to be solely on the experimental dataset. This comment dismisses the equally important observational component of our analysis and thereby fails to acknowledge one of the key questions being addressed in this study. Note that in our revised version we have edited the phylogenetic tree to indicate for which species we have both types of information, which highlights our approach to selecting observational data (Figure 3).

      Reviewer #2 (Public Review):

      I have read with great interest the manuscript entitled "The optimal clutch size revisited: separating individual quality from the costs of reproduction" by LA Winder and colleagues. The paper consists in a meta-analysis comparing survival rates from studies providing clutch sizes of species that are unmanipulated and from studies where the clutch sizes are manipulated, in order to better understand the effects of differences in individual quality and of the costs of reproduction. I find the idea of the manuscript very interesting. However, I am not sure the methodology used allows to reach the conclusions provided by the authors (mainly that there is no cost of reproduction, and that the entire variation in clutch size among individuals of a population is driven by "individual quality").

      We would like to highlight that we do not conclude that there is no cost of reproduction. Please see lines 336–339, where we state that our lack of evidence for trade-offs driving within-species variation in clutch size does not necessarily mean the costs of reproduction are non-existent. We conclude that individuals are constrained to their optima by the survival cost of reproduction. It is also an over-statement of our conclusion to say that we believe that variation in clutch size is only driven by quality. Our results show that unmanipulated birds that have larger clutch sizes also lived longer, and we suggest that this is evidence that some individuals are “better” than others, but we do not say, nor imply, that no other factors affect variation in clutch size. We have added Figure 5 to our manuscript to help the reader better understand what questions we can answer with our study and what conclusions we can draw from our results.

      I write that I am not sure, because in its current form, the manuscript does not contain a single equation, making it impossible to assess. It would need at least a set of mathematical descriptions for the statistical analysis and for the mechanistic model that the authors infer from it.

      We appreciate this comment, and have explained our methods in terms that are accessible to a wider audience. Note, however, that our meta-analysis is standard and based on logistic regression and standard meta-analytic practices. We have added the model formula to the model output tables.

      For the simulation, we simply simulated the resulting effects. We of course supplied our code for this along with our manuscript (https://doi.org/10.5061/dryad.q83bk3jnk), though as we mentioned above, we believe this was not shared with the reviewers despite us making this available for the review process. We therefore understand why the reviewer feels the simulations were not explained thoroughly. We have revised our methods section and added details which we believe make our methodology more clear without needing to consult the supplemental material. However, we have also added the equations used in the process of calculating our simulated data to the Supplementary Information for readers who wish to have this information in equation form.

      The texts mixes concepts of individual vs population statistics, of within individual vs among-individuals measures, of allocation trade-offs and fitness trade-offs, etc ....which means it would also require a glossary of the definitions the authors use for these various terms, in order to be evaluated.

      We would like to thank the reviewer for highlighting this lack of clarity in our text. Throughout the manuscript we have refined our terminology and indicated where we are referring to the individual level or the population level. The inclusion of our new Figure 5 (decision tree) should also help in this context, as it is clear on which level we base our interpretation and conclusions on.

      This problem is emphasised by the following sentence to be found in the discussion "The effect of birds having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation". The "effect" is defined as the survival rate (see Fig 1). While it is relatively easy to intuitively understand what the "effect" is for the unmanipulated studies: the sensitivity of survival to clutch size at the population level, this should be mentioned and detailed in a formula. Moreover, the concept of effect size is not at all obvious for the manipulated ones (effect of the manipulation? or survival rate whatever the manipulation (then how could it measure a trade-off ?)? at the population level? at the individual level ?) despite a whole appendix dedicated to it. This absolutely needs to be described properly in the manuscript.

      Thank you for identifying this sentence for which the writing was ambiguous, our apologies. We have now rewritten this and included additional explanation. L282-290: ‘The effect on parental annual survival of having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation, and quantitatively similar. Parents with naturally larger clutches are thus expected to live longer and this counterbalances the “cost of reproduction” when their brood size is experimentally manipulated. It is, therefore, possible that quality effects mask trade-offs. Furthermore, it could be possible that individuals that lay larger clutches have smaller costs of reproduction, i.e. would respond less in terms of annual survival to a brood size manipulation, but with our current dataset we cannot address this hypothesis (Figure 5).’

      We would also like to thank the reviewer for bringing to our attention the lack of clarity about the details of our methodology. We have added details to our methodology (see “Extracting effect sizes” section) to address this (see highlighted sections). For clarity, the effect size for both manipulated and unmanipulated nests was survival, given the brood size raised. We performed a logistic regression with survival as a binary response variable (i.e., number of individuals that survived and number of individuals that died after each breeding season), and clutch size was the absolute value of offspring in the nest (i.e., for a bird that laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). This allows for direct comparison of the effect size (survival given clutch size raised) between manipulated and unmanipulated birds.

      Despite the lack of information about the underlying mechanistic model tested and the statistical model used, my impression is still that the interpretation in the introduction and discussion is not granted by the outputs of the figures and tables. Let's use a model similar to that of (van Noordwijk and de Jong, 1986): imagine that the mechanism at the population level is

      a.c_(i,q)+b.s_(i,q)=E_q

      Where c_(i,q) are s_(i,q) are respectively the clutch size for individual i which is of quality q, and E_q is the level of "energy" that an individual of quality q has available during the given time-step (and a and b are constants turning the clutch size and survival rate into energy cost of reproduction and energy cost of survival, and there are both quite "high" so that an extra egg (c_(i,q) is increased by 1) at the current time-step, decreases s_(i,q) markedly (E_q is independent of the number of eggs produced), that is, we have strong individual costs of reproduction). Imagine now that the variance of c_(i,q) (when the population is not manipulated) among individuals of the same quality group, is very small (and therefore the variance of s_(i,q) is very small also) and that the expectation of both are proportional to E_q. Then, in the unmanipulated population, the variance in clutch size is mainly due to the variance in quality. And therefore, the larger the clutch size c_(i,q) the higher E_q, and the higher the survival s_(i,q).

      In the manipulated populations however, because of the large a and b, an artificial increase in clutch size, for a given E_q, will lead to a lower survival s_(i,q). And the "effect size" at the population level may vary according to a,b and the variances mentioned above. In other words, the costs of reproduction may be strong, but be hidden by the data, when there is variance in quality; however there are actually strong costs of reproduction (so strong actually that they are deterministic and that the probability to survive is a direct function of the number of eggs produced)

      We would like to thank the reviewer for these comments. We have added detail to our methodology section so our models and rationale are more clear. Please note that our simulations only take the experimental effect of brood size on parental survival into account. Our model does not incorporate quality effects. The reviewer is right that the relationship between quality and the effects exposed by manipulating brood size can take many forms and this is a very interesting topic, but not one we aimed to tackle in our manuscript. In terms of quality we make two points: (1) overall quality effects connecting reproduction and parental survival are present, (2) these effects are opposite in direction to the effects when reproduction is manipulated and similar in magnitude. We do not go further than that in interpreting our results. The reviewer is correct, however, that we do suggest and repeat suggestions by others that quality can also mask the trade-off in some individuals or circumstances (L74-76, L95-98 & L286-289), but we do not quantify this, as it is dependent on the unknown relationship between quality and the response to the manipulation. A focussed set of experiments in that context would be interesting and there are some data that could get at this, i.e. the relationship between produced clutch size and the relative effect of the manipulation (now included L287-290). Such information is, however, not available for all studies and, although we explored the possibility of analysing this, currently this is not possible with adequate confidence and there is the possible complexity of non-linear effects. We have added this rationale in our revision (L259-265).

      Moreover, it seems to me that the costs of reproduction are a concept closely related to generation time. Looking beyond the individual allocative (and other individual components of the trade-off) cost of reproduction and towards a populational negative relationship between survival and reproduction, we have to consider the intra-population slow fast continuum (some types of individuals survive more and reproduce less (are slower) than other (which are faster)). This continuum is associated with a metric: the generation time. Some individuals will produce more eggs and survive less in a given time-period because this time-period corresponds to a higher ratio of their generation time (Gaillard and Yoccoz, 2003; Gaillard et al., 2005). It seems therefore important to me, to control for generation time and in general to account for the time-step used for each population studied when analysing costs of reproduction. The data used in this manuscript is not just clutch size and survival rates, but clutch size per year (or another time step) and annual (or other) survival rates.

      The reviewer is right that this is interesting. There is a longstanding unexplained difference in temperate (seasonal) and tropical reproductive strategies. Most of our data come from seasonal breeders, however. Although there is some variation in second brooding and such, these species mostly only produce one brood. We do agree that a wider consideration here is relevant, but we are not trying to explain all of life history in our paper. It is clearly the case that other factors will operate and the opportunity for trade-offs will vary among species according to their respective life histories. However, our study focuses on the two most fundamental components of fitness – longevity and reproduction – to test a major hypothesis in the field, and we uncover new relationships that contrast with previous influential studies and cast doubt on previous conclusions. We question the assumed trade-off between reproduction and annual survival. We show that quality is important and that the effect we find in experimental studies is so small that it can only explain between-species patterns but is unlikely to be the selective force that constrains reproduction within species. We do agree that there is a lot more work that can be done in this area. We hope we are contributing to the field, by questioning this central trade-off. We have incorporated some of the reviewers suggestions in the revision (L309-315). We have added Figure 5 to make clear where we are able to reach solid conclusions and the evidence on which these are based as clearly as possible in an easily accessible format.

      Finally, it is important to relate any study of the costs of reproduction in a context of individual heterogeneity (in quality for instance), to the general problem of the detection of effects of individual differences on survival (see, e.g., Fay et al., 2021). Without an understanding of the very particular statistical behaviour of survival, associated to an event that by definition occurs only once per life history trajectory (by contrast to many other traits, even demographic, where the corresponding event (production of eggs for reproduction, for example) can be measured several times for a given individual during its life history trajectory).

      Thank you for raising this point. The reviewer is right that heterogeneity can dampen or augment selection. Note that by estimating the effect of quality here we give an example of how heterogeneity can possibly do exactly this. We thank the reviewer for raising that we should possibly link this to wider effects of heterogeneity and we have added to our discussion of how our results play into the importance of accounting for among-individual heterogeneity (L252-256).

      References:

      Fay, R. et al. (2021) 'Quantifying fixed individual heterogeneity in demographic parameters: Performance of correlated random effects for Bernoulli variables', Methods in Ecology and Evolution, 2021(August), pp. 1-14. doi: 10.1111/2041-210x.13728.

      Gaillard, J.-M. et al. (2005) 'Generation time: a reliable metric to measure life-history variation among mammalian populations.', The American naturalist, 166(1), pp. 119-123; discussion 124-128. doi: 10.1086/430330.

      Gaillard, J.-M. and Yoccoz, N. G. (2003) 'Temporal Variation in Survival of Mammals: a Case of Environmental Canalization?', Ecology, 84(12), pp. 3294-3306. doi: 10.1890/02-0409.

      van Noordwijk, A. J. and de Jong, G. (1986) 'Acquisition and Allocation of Resources: Their Influence on Variation in Life History Tactics', American Naturalist, p. 137. doi: 10.1086/284547.

      Reviewer #3 (Public Review):

      The authors present here a comparative meta-analysis analysis designed to detect evidence for a reproduction/ survival trade-off, central to expectations from life history theory. They present variation in clutch size within species as an observation in conflict with expectations of optimisation of clutch size and suggest that this may be accounted for from weak selection on clutch size. The results of their analyses support this explanation - they found little evidence of a reproduction - survival trade-off across birds. They extrapolated from this result to show in a mathematical model that the fitness consequences of enlarged clutch sizes would only be expected to have a significant effect on fitness in extreme cases, outside of normal species' clutch size ranges. Given the centrality of the reproduction-survival trade-off, the authors suggest that this result should encourage us to take a more cautious approach to applying concepts the trade-off in life history theory and optimisation in behavioural ecology more generally. While many of the findings are interesting, I don't think the argument for a major re-think of life history theory and the role of trade-offs in fitness maximisation is justified.

      The interest of the paper, for me, comes from highlighting the complexities of the link between clutch size and fitness, and the challenges facing biologists who want to detect evidence for life history trade-offs. Their results highlight apparently contradictory results from observational and experimental studies on the reproduction-survival trade-off and show that species with smaller clutch sizes are under stronger selection to limit clutch size.

      Unfortunately, the authors interpret the failure to detect a life history trade-off as evidence that there isn't one. The construction of a mathematical model based on this interpretation serves to give this possible conclusion perhaps more weight than is merited on the basis of the results, of this necessarily quite simple, meta-analysis. There are several potential complicating factors that could explain the lack of detection of a trade-off in these studies, which are mentioned and dismissed as unimportant (lines 248-250) without any helpful, rigorous discussion. I list below just a selection of complexities which perhaps deserve more careful consideration by the authors to help readers understand the implications of their results:

      We would like to thank the reviewer for their thoughtful response and summary of the findings that we also agree are central to our study. The reviewer also highlights areas where our manuscript could benefit from a deeper consideration and we have added detail accordingly to our revised discussion.

      We would like to highlight that we do not interpret the failure to detect a trade-off as evidence that there is not one. First, and importantly, we do find a trade-off but show this is only incurred when individuals produce a clutch beyond their optimal level. Second, we also state on lines 322-326 that the lack of evidence to support trade-offs being strong enough to drive variation in clutch size does not necessarily mean there are no costs of reproduction.

      The statement that we have constructed a mathematical model based on the interpretation that we have not found a trade-off is, again, factually incorrect. We ran these simulations because the opposite is true – we did find a trade-off. There is a significant effect of clutch size when manipulated on annual parental survival. We benefit from our unique analysis allowing for a quantitative fitness estimate from the effect size on annual survival (as this is expressed on a per-egg basis). This allowed us to ask whether this quantitative effect size can alone explain why reproduction is constrained, and we evaluate this using simulations. From these simulations we find that this effect size is too small to explain the constraint, so something else must be going on, and we do spend a considerable amount of text discussing the possible explanations (L202-215). Note that the possibly most parsimonious conclusion here is that costs of reproduction are not there, or simply small, so we also give that explanation some thought (L221-224 and L315-331).

      We are disappointed by the suggestion that we have dismissed complicating factors that could prevent detection of a trade-off, as this was not our intention. We were aiming to highlight that what we have demonstrated to be an apparent trade-off can be explained through other mechanisms, and that the trade-off between clutch size and survival is not as strong in driving within-species variation in clutch size as previously assumed. We have added further discussion to our revised manuscript to make this clear and give readers a better understanding of the complexity of factors associated with life-history theory, including the addition of a decision tree (Figure 5).

      • Reproductive output is optimised for lifetime reproductive success and so the consequences of being pushed off the optimum for one breeding attempt are not necessarily detectable in survival but in future reproductive success (and, therefore, lifetime reproductive success).

      We agree this is a valid point, which is mentioned in our manuscript in terms of alternative stages where the costs of reproduction might be manifested (L316-320). We would also like to highlight that , in our simulations, the change in clutch size (and subsequent survival cost) was assumed for the lifetime of the individual, for this very reason.

      • The analyses include some species that hatch broods simultaneously and some that hatch sequentially (although this information is not explicitly provided (see below)). This is potentially relevant because species which have been favoured by selection to set up a size asymmetry among their broods often don't even try to raise their whole broods but only feed the biggest chicks until they are sated; any added chicks face a high probability of starvation. The first point this observation raises is that the expectation of more chicks= more cost, doesn't hold for all species. The second more general point is that the very existence of the sequential hatching strategy to produce size asymmetry in a brood is very difficult to explain if you reject the notion of a trade-off.

      We agree with the reviewer that the costs of reproduction can be absorbed by the offspring themselves, and may not be equal across offspring (we also highlight this at L317-318 in the manuscript). However, we disagree that for some species the addition of more chicks does not equate to an increase in cost, though we do accept this might be less for some species. This is, however, difficult to incorporate into a sensible model as the impacts will vary among species and some species do also exhibit catch-up growth. So, without a priori knowledge on this, we kept our model simple to test whether the effect on parental survival (often assumed to be a strong cost) can explain the constraint on reproductive effort, and we conclude that it does not.

      We would also like to make clear that we are not rejecting the notion of a trade-off. Our study shows evidence that a trade-off between survival and reproductive effort probably does not drive within-species variation in clutch size. We do explicitly say this throughout our manuscript, and also provide suggestions of other areas where a trade-off may exist (L317-320). The point of our study is not whether trade-offs exist or not, it is whether there is a generalisable across-species trend for a trade-off between reproductive effort and survival – the most fundamental trade-off in our field but for which there is a lack of conclusive evidence within species. We believe the addition of Figure 5 to our reviewed manuscript also makes this more evident.

      • For your standard, pair-breeding passerine, there is an expectation that costs of raising chicks will increase linearly with clutch size. Each chick requires X feeding visits to reach the required fledge weight. But this is not the case for species which lay precocious chicks which are relatively independent and able to feed themselves straight after hatching - so again the relationship of care and survival is unlikely to be detectable by looking at the effect of clutch size but again, it doesn't mean there isn't a trade-off between breeding and survival.

      Precocial birds still provide a level of parental care, such as protection from predators. Though we agree that the level of parental care in provisioning food (and in some cases in all parental care given) is lower in precocial than altricial birds, this would only make our reported effect size for manipulated birds to be an underestimate. Again, we would like to draw the reviewer’s attention to the fact we did detect a trade-off in manipulated birds and we do not suggest that trade-offs do not exist. The argument the reviewer suggests here does not hold for unmanipulated birds, as we found that birds that naturally lay larger clutch sizes have higher survival.

      • The costs of raising a brood to adulthood for your standard pair-breeding passerine is bound to be extreme, simply by dint of the energy expenditure required. In fact, it was shown that the basal metabolic rate of breeding passerines was at the very edge of what is physiologically possible, the human equivalent being cycling the Tour de France (Nagy et al. 1990). If birds are at the very edge of what is physiologically possible, is it likely that clutch size is under weak selection?

      If birds are at the very edge of what is physiologically possible, then indeed it would necessarily follow that if they increase the resource allocated in one area then expenditure in another area must be reduced. In many studies, however, the overall brood mass is increased when chicks are added and cared for in an experimental setting, suggesting that birds are not operating at their limit all the time. Our simulations show that if individuals increase their clutch size, the survival cost of reproduction counterbalances the fitness gained by increasing clutch size and so there is no overall fitness gain to producing more offspring. Therefore, selection on clutch size is constrained to the within-species level. We do not say in our manuscript that clutch size is under weak selection – we only ask why variation in clutch size is maintained if selection always favours high-producing birds.

      • Variation in clutch size is presented by the authors as inconsistent with the assumption that birds are under selection to lay the Lack clutch. Of course, this is absurd and makes me think that I have misunderstood the authors' intended point here. At any rate, the paper would benefit from more clarity about how variable clutch size has to be before it becomes a problem for optimality in the authors' view (lines 84-85; line 246). See Perrins (1965) for an exquisite example of how beautifully great tits optimise clutch size on average, despite laying between 5-12 eggs.

      We thank the reviewer for highlighting that our manuscript may be misleading in places, however, we are unsure which part of our conclusions the author is referring to here. The question we pose is “Why don’t all birds produce a clutch size at the population optimum?”, and is central to the decades-long field of life-history theory. Why is variation maintained? As the reviewer outlines, there is extensive variability, with some birds laying half of what other birds lay.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Title: while the costs of reproduction are possibly important in shaping optimal clutch size, it is not clear what you can about it given that you do not consider clutch / brood size effects on fitness prospects of the offspring.

      We have expanded on our discussion of how some costs may be absorbed by the offspring themselves. However, a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest. We have focussed on the relationship between reproductive effort and survival because it is given the most weight in the field in terms of driving intra-specific variation in clutch size. We have altered our title to show we focus on the survival costs specifically: “The optimal clutch size revisited: separating individual quality from the parental survival costs of reproduction”.

      (2) L.11-12: I agree that this is true for birds, but this is phrased more generally here. Are you sure that that is justified?

      The trade-off between survival and reproductive effort has largely been tested experimentally through brood manipulations in birds as this provides a good system in which to test the costs and benefits of increasing parental effort. The work in this area has provided theory beyond just passerine birds, which are the most commonly manipulated group, to across-taxa theories. We are unaware of any study/studies that provide evidence that the reproduction/survival trade-off is generalisable across multiple species in any taxa. As such, we do believe this sentence is justified. An example is the lack of a consistent negative genetic correlation in populations of fruitflies, for example, that has also been hailed as a lack-of-cost paradigm. Furthermore, some mutants that live longer do so without a cost on reproduction.

      (3) L.13-14: Not sure what you mean with this sentence - too much info lacking.

      We have added some detail to this sentence.

      (4) L.14: it is slightly awkward to say 'parental investment and survival' because it is the survival effect that is usually referred to as the 'investment'. Perhaps what you want to say is 'parental effort and survival'?

      We have replaced “parental investment” with “reproductive effort”

      (5) L.15: you can omit 'caused'. Compared to control treatment or to reduced broods? Why not mention effects or lack thereof of brood reduction? And it would be good to also mention here whether effects were similar in the sexes.

      Please see our methodology where we state that we use clutch size as a continuous variable (we do not compare to control or reduced but include the absolute value of offspring in a logistic regression). The effects of a brood reduction are drawn from the same regression and so are opposite. Though we appreciate the detail here is lacking to fully comprehend our study, we would like to highlight this is the abstract and details are provided in the main text.

      (6) L. 15: I am not sure why you write 'however', as the finding that experimental and natural variation have opposite effects is in complete agreement with what is generally reported in the literature and will therefore surprise no one that is aware of the literature.

      We use “however” to highlight the change in direction of the effect size from the results in the previous sentence. We also believe that ours ise the first study that provides a quantitative estimate of this effect and that previous work is largely theoretical. The reviewer states that this is what is generally reported but it is not reported in all cases, as some relationships between reproductive effort and survival are negative (for the quality measurement, in correlational space, see Figure 1).

      (7) L.16: saying 'opposite to the effect of phenotypic quality' seems difficult to justify, as clutch size cannot be equated with phenotypic quality. Perhaps simply say 'natural variation in clutch size'? If that is what you are referring to.

      Please note we are referring to effect sizes here –- that is, the survival effect of a change in clutch size. By phenotypic quality we are referring to the fact that we find higher parental survival when natural clutch sizes are higher. It is not the case that we refer to quality only as having a higher clutch size. This is explicitly stated in the sentence you refer to. We have changed “effect” to “effect size” to highlight this further.

      (8) L.18: why do you refer to 'parental care' here? Brood size is not equivalent to parental care.

      Brood size manipulations are used to manipulate parental care. The effect on parental survival is expected to be incurred because of the increase in parental care. We have changed “parental care” to “reproductive effort” to reduce the number of terms we use in our manuscript.

      (9) L.18-19: suggest to tone down this claim, as this is no more than a meta-analytic confirmation of a view that is (in my view) generally accepted in the field. That does not mean it is not useful, just that it does not constitute any new insight.

      We are unaware of any other study which provides generalisable across-species evidence for opposite effects of quality and costs of reproduction. The work in this area is also largely theoretical and is yet to be supported experimemtally, especially in a quantitative fashion. It is surprising to us that the reviewer considers there to be general acceptance in a field, rather than being influenced by rigorous testing of hypotheses, made possible by meta-analysis, the current gold standard in our field.

      (10) L.21: what does 'parental effort' mean here? You seem to use brood size, parental care, parental effort, and parental investment interchangeably but these are different concepts. Daan et al (1990, Behaviour), which you already cite, provide a useful graph separating these concepts. Please adjust this throughout the manuscript, i.e. replace 'reproductive effort' with wording that reflect the actual variable you use.

      We have not used the phrase “parental effort” in this sentence. We agree these are different concepts but in this context are intertwined. For example, brood size is used to manipulate parental care as a result of increased parental effort. We do agree the manuscript would benefit from keeping terminology consistent throughout the manuscript and have adjusted this throughout.

      (11) L.23: perhaps add 'in birds' somewhere in this sentence? Some reference to the assumptions underlying this inference would also be useful. Two major assumptions being that birds adjusted their effort to the manipulation as they would have done had they opted for a larger brood size themselves, and that the costs of laying and incubating extra eggs can be ignored. And then there is the effect that laying extra eggs will usually delay the hatch date, which in many species reduces reproductive success.

      Though our study does exclusively use birds, birds have been used to test the survival/reproduction trade-off because they present a convenient system in which to experimentally test this. The conclusions from these studies have a broader application than in birds alone. We believe that although these details are important, they are not appropriate in the abstract of our paper.

      (12) L.26: how is this an explanation? It just repeats the finding.

      We intend to refer to all interpretations from all results presented in our manuscript. We have made this more clear by adjusting our writing.

      (13) L.27: I do not see this point. And 'reproductive output' is yet another concept, that can be linked to the other concepts in the abstract in different ways, making it rather opaque.

      We have changed “reproductive output” to “reproductive effort”.

      (14) L.33: here you are jumping from 'resources' to 'energetically' - it is not clear that energy is the only or main limiting resource, so why narrow this down to energy?

      We do not say energy is the only or main limiting resource. We simply highlight that reproduction is energetically demanding and so, intuitively, a trade-off with a highly energetically demanding process would be the focal place to observe a trade off. We have, though, replaced “energetically” with “resource”.

      (15) L.35-36: this is new to me - I am not aware of any such claims, and effects on the residual reproductive value could also arise through effects on future reproduction. The authors you cite did not work on birds, or (in their own study systems) presented results that as far as I remember warrant such a general statement.

      The trade-off between reproduction and survival is seminal to the disposable soma theory, proposed by Kirkwood. Though Kirkwood’s work was largely not focussed on birds, it had fundamental implications for the field of evolutionary ecology because of the generalisable nature of his proposed framework. In particular, it has had wide-reaching influence on how the biology of aging is interpreted. The readership of the journal here is broad, and our results have implications for that field too. The work of Kirkwood (many of the papers on this topic have over 2000 citations each) has been perhaps overly influential in many areas, so a link to how that work should be interpreted is highly relevant. If the reviewer is interested in this topic the following papers by one of the co-authors and others could be of interest, some of which we could not cite in the main manuscript due to space considerations:

      https://www.science.org/doi/pdf/10.1126/sciadv.aay3047

      https://agingcelljournal.org/Archive/Volume3/stochasticity_explains_non_genetic_inheritance_of_lifespan/

      https://pubmed.ncbi.nlm.nih.gov/21558242/

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2435.13444

      https://www.nature.com/articles/362305a0

      https://www.cell.com/trends/ecology-evolution/fulltext/S0169-5347(12)00147-4

      https://www.cell.com/cell/pdf/S0092-8674(15)01488-9.pdf

      https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-018-0562-z

      (16) L.42: this could be preceded with mentioning the limitations of observational data.

      We have added detail as to why brood manipulations are a good test for trade-offs and so this is now inherently implied.

      (17) L.42-43: why?

      We have added detail to this sentence.

      (18) L.45: do any of the references cited here really support this statement? I am certain that several do not - in these this statement is an assumption rather than something that is demonstrated. It may be useful to look at Kate Lessell's review on this that appeared in Etologia, I think in the 1990's. Mind however that 'reproductive effort' is operationally poorly defined for reproducing birds - provisioning rate is not necessarily a good measure of effort in so far as there are fitness costs.

      We have updated the references to support the sentence.

      (19) L.47: Given that you make this statement with respect to brood size manipulations in birds, it seems to me that the paper by Santos & Nakagawa is the only paper you should cite here. Given that you go on to analyze the same data it deserves to be discussed in more detail, for example to clarify what you aim to add to their analysis. What warrants repeating their analysis?

      Please first note that our dataset includes Santos & Nakagawa and additional studies, so it is not accurate to say we analyse the same data. Furthermore, we believe our study has implications beyond birds alone and so believe it is appropriate to cite the papers that do support our statement. We have added details to the methods to explicitly state what data is gathered from Santos & Nakagawa (it is only used to find the appropriate literature and data was re-extracted and re-analysed in a more appropriate way) and, separately, how we gathered the observational studies (see L352-381).

      (20) L.48: There are more possible explanations to this, which deserve to be discussed. For example, brood size manipulations may not have been that effective in manipulating reproductive effort - for example, effects on energy expenditure tend to be not terribly convincing. Secondly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Thirdly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      Please see our response to this comment in the public reviews.

      Out of interest and because the reviewer mentioned “energy expenditure” specifically: There are studies that show convincing effects of brood size manipulation on parental energy expenditure. We do agree that there are also studies that show ceilings in expenditure. We therefore disagree that they “tend to be not terribly convincing”. Just a few examples:

      https://academic.oup.com/beheco/article/10/5/598/222025 (Figure 2)

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2435.12321 (Figure 1)

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1046/j.1365-2656.2000.00395.x (but ceiling at enlarged brood).

      (21) L.48, "or, alternatively, that individuals may differ in quality": how do you see that happening when brood size is manipulated, and hence 'quality' of different experimental categories can be assumed to be approximately equal? This point does apply to observational studies, so I assume that that is what you had in mind, but that distinction should be clear (also on line 54).

      We have made it more clear that we determine if there are quality effects separate to the costs of reproduction found using brood manipulation studies.

      (22) L.50: Drent & Daan, in their seminal paper on "The prudent parent" (1980, Ardea) were among the earliest to make this point and deserve to be cited here.

      We have added this citation

      (23) L.51, "relative importance": relative to what? Please be more specific.

      We have adjusted this sentence.

      (24) L.54: Vedder & Bouwhuis (2018, Oikos) go some way towards this point and should be explicitly mentioned with reference to the role of 'quality' effects on the association between reproductive output and survival.

      We have added this reference.

      (25) L.55: can you be more specific on what you want to do exactly? What you write here could be interpreted differently.

      We have added an explicit aim after this sentence to be more clear.

      (26) L.57: Here also a more specific wording would be useful. What does it mean exactly when you say you will distinguish between 'quality' and 'costs'?

      We have added detail to this sentence.

      (27) L.62: it should be clearer from the introduction that this is already well known, which will indirectly emphasize what you are adding to what we know already.

      We would argue this is not well known and has only been theorised but not shown empirically, as we do here.

      (28) L.62: you equate clutch size with 'quality' here - that needs to be spelled out.

      We refer to quality as the positive effect size of survival for a given clutch size, not clutch size alone. We appreciate this is not clear in this sentence and have reworded.

      (29) L.64: this looks like a serious misunderstanding to me, but in any case, these inferences should perhaps be left to the discussion (this also applies to later parts of this paragraph), when you have hopefully convinced readers of the claims you make on lines 62-63.

      We are unsure of what the reviewer is referring to as a misunderstanding. We have chosen this format for the introduction to highlight our results. If this is a problem for the editors we will change as required.

      (30) L.66: quantitative comparison of what?

      Comparison of species. We have changed the wording of this sentence

      (31) L.67-69: this should be in the methods.

      We have used a modern format which highlights our result. We are happy to change the format should the editors wish us to.

      (32) L.74-88: suggest to (re)move this entire paragraph, presenting inferences in such an uncritical manner before presenting the evidence is inappropriate in my view. I have therefore refrained from commenting on this paragraph.

      We have chosen a modern format which highlights our result. We are happy to change the format should the editors wish us to.

      (33) L.271, "must detail variation in the number of raised young": it is not sufficiently clear what this means - what does 'detail' mean in this context? And what does 'number of raised young' mean? The number hatched or raised to fledging?

      We have now made this clear.

      (34) L271, "must detail variation in the number of raised young": looking at table S4, it seems that on the basis of this criterion also brood size manipulation studies where details on the number of young manipulated were missing are excluded. I see little justification for this - surely these manipulations can for example be coded as for example having the average manipulation size in the meta-analysis data set, thereby contributing to tests of manipulation effects, but not to variation within the manipulation groups?

      We have done in part what the reviewer describes. We are specifically interested in the manipulation size, so we required this to compare effect sizes across species and categories, a key advance of our study and outlined in many places in our manuscript. Note, however, that we only need comparative differences, and have used clutch size metrics more generally to obtain a mean clutch size for a species, as well as SD where required. Please also note that our supplement details exactly why studies were excluded from our analysis, as is the preferred practice in a meta-analysis.

      (35) L.271, "referred to as clutch size": the point of this simplification is not clear to me why it is clearly confusing - why not refer to 'brood size' instead?

      Brood size and clutch size can be used interchangeably here because, in the observational studies, the individuals vary in the number of eggs produced, whereas for brood manipulations this obviously happens after hatching and brood is perhaps a more appropriate term, but we wanted to simplify the terminology used. However, we use clutch size throughout as the aim of our study is to determine why individuals differ in the number of offspring they produce, and so clutch size is the most appropriate term for that.

      (36) L.280: according to the specified inclusion criteria (lines 271/272) these studies should already be in the data set, so what does this mean exactly?

      Selection criteria refers to whether a given study should be kept for analysis or not. It does not refer to how studies were found. Please see lines 361-378 for details on how we found studies (additional details are also in the Supplementary Methods).

      (37) L.281: the use of 'quality' here is misleading - natural variation in clutch or brood size will have multiple causes, variation in phenotypic quality of the individuals and their environment (territories) is only one of the causes. Why not simply refer to what you are actually investigating: natural and experimental variation in brood size.

      We disagree, our study aims to separate quality effects from the costs of reproduction and we use observational studies to test for quality differences, though we make no inference about the mechanisms. We do not imply that the environment causes differences in quality, but that to directly compare observation and experimental groups, they should contain similar species. So, to be clear again, quality refers to the positive covariation of clutch size with survival. We feel that we explain this clearly in our study’s rationale and have also improved our writing in several sections on this to avoid any confusion (see responses to earlier comments by the three reviewers).

      (38) L.283, "in most cases": please be exact and say in xx out xx cases.

      We have added the number of studies for each category here.

      (39) L.283-285: presumably readers can see this directly in a table with the extracted data?

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We believe the data are too large to include as a table in the main text and are not essential in understanding the paper. Though we do believe all readers should have access to this information if they wish and so is publicly available.

      (40) L.293: there does not seem to be a table that lists the included studies and effect sizes. It is not uncommon to find major errors in such tables when one is familiar with the literature, and absence of this information impedes a complete assessment of the manuscript.

      We supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. We believe the data are too large to include as a table in the main text and are not essential in understanding the paper. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      (41) L.293: from how many species?

      We have added this detail.

      (42) L.296, "longevity": this is a tricky concept, not usually reported in the studies you used, so please describe in detail what data you used.

      We have removed longevity as we did not use this data in our current version of the manuscript.

      (43) L. 298: again: where can I see this information?

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers.

      (44) L. 304, "we used raw data": I assume that for the majority of papers the raw data were not available, so please explain how you dealt with this. Or perhaps this applies to a selection of the studies only? Perhaps the experimental studies?

      By raw data, we mean the absolute value of offspring in the nest. We have changed the wording of this sentence and added detail about whether the absolute value of offspring was not present for brood manipulation studies (L393-397).

      (45) L.304: When I remember correctly, Santos and Nakagawa examined effects of reducing and enlarging brood size separately, which is of importance because trade-off curves are unlikely to be linear and whether they are or not has major effects on the optimization process. But perhaps you tackled this in another way? I will read on.....

      You are correct that Santos & Nakagawa compared brood increases and reductions to control separately. Note that this only partially accounts non-linearity and it does not take into account the severity of the change in brood size. By using a logistic regression of absolute clutch size, as we have done, we are able to directly compare brood manipulations with experimental studies. Please see Supplementary Methods lines 11-12, where we have added additional detail as to why our approach is beneficial in this analysis.

      (46) L.319: what are you referring to exactly with "for each clutch size transformation"?

      We refer to the raw, standardised and proportional clutch size transformations. We have added detail here to be more clear.

      (47) L.319: is there a cost of survival? Perhaps you mean 'survival cost'? This would be appropriate for the experimental data, but not for the observational data, where the survival variation may be causally unrelated to the brood size variation, even if there is a correlation.

      We have changed “cost of survival” to “effect of parental survival”. We only intend to imply causality for the experimental studies. For observational studies we do not suggest that increasing clutch size is causal for increasing survival, only correlative (and hence we use the phrase “quality”).

      (48) L.320: please replace "parental effort" with something like 'experimental change in brood size'.

      We have changed “parental effort” to “reproductive effort”

      (49) L.321: due to failure of one or more eggs to hatch, and mortality very early in life, before brood sizes are manipulated, it is not likely that say an enlargement of brood size by 1 chick can be equated to the mean clutch size +1 egg / check. For example, in the Wytham great tit study, as re-analysed by Richard Pettifor, a 'brood size manipulation' of unmanipulated birds is approximately -1, being the number of eggs / chicks lost between laying and the time of brood size manipulation. Would this affect your comparisons?

      Though we agree these are important factors in determining what a clutch/brood size actually is for a given individual/pair, as this can vary from egg laying to fledging. We do not believe that accounting for this (if it was possible to do so) would significantly affect our conclusions, as observational studies are comparable in the fact that these birds would also likely see early life mortality of their offspring. It is also possibly the case that parents already factor in this loss, and so a brood manipulation still changes the parental care effort an individual has to incur.

      (50) L.332: instead of "adjusted" perhaps say 'mean centred'?

      We have implemented this suggestion.

      (51) L.345: this statement surprised me, but is difficult to verify because I could not locate a list of the included studies. However, to my best knowledge, most studies reporting brood size manipulation effects on parental survival had this as their main focus, in contrast to your statement.

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers by the journal, although supplied by us on several occasions. We regret that the reviewer was impeded by this unfortunate communication failure, but we did our best to make the data available to the reviewers during the initial review process.

      (52) L.361-362: this seems a realistic approach from an evolutionary perspective, but we know from the jackdaw study by Boonekamp that the survival effect of brood size manipulation in a single year is very different from the survival effect of manipulating as in your model, i.e. every year of an individual's life the same manipulation. For very short-lived species this possibly does not make much difference, but for somewhat longer-lived species this could perhaps strongly affect your results. This should be discussed, and perhaps also explored in your simulations?

      Note that the Boonekamp study does not separate whether the survival effects are additive or

      multiplicative. As such, we do not know whether the survival effects for a single year manipulation are just small and hard to detect, or whether the survival effects are multiplicative. Our simulations assumed that the brood enlargement occurred every year throughout their lives. We have added some text to the discussion on the point you raise.

      (53) L.360: what is "lifetime reproductive fitness"? Is this different from just "fitness"?

      We have changed “lifetime reproductive fitness” to “lifetime reproductive output”.

      (54) L.363: when you are interested in optimal clutch size, why not also explore effects of reducing clutch size?

      As we find that a reduction in clutch size leads to a reduction in survival (for experimental studies), we already know that these individuals would have a reduced fitness return compared to reproducing at their normal level, and so we would not learn anything from adding this into our simulations. The interest in using clutch size enlargements is to find out why an individual does not produce more offspring than it does, and the answer is that it would not have a fitness benefit (unless its clutch size and survival rate combination is out of the bounds of that observable in the wild).

      (55) Fig.1 - using 'parental effort' in the y-axis label is misleading, suggest to replace with e.g. "clutch or brood size". Using "clutch size" in the title is another issue, as the experimental studies typically changed the number of young rather than the number of eggs.

      We have updated the figure axes to say “clutch size” rather than “parental effort”. Please see response to comment 35 where we explain our use of the term “clutch size” throughout this manuscript.

      (56) L.93 - 108: I appreciate the analysis in Table 1, in particular the fact that you present different ways of expressing the manipulation. However, in addition, I would like to see the results of an analysis treating the manipulations as factor, i.e. without considering the scale of the manipulation. This serves two purposes. Firstly, I believe it is in the interest of the field that you include a detailed comparison with the results of Santos & Nakagawa's analysis of what I expect to be largely the same data (manipulation studies only - for this purpose I would also like to see a comparison of effect size between the sexes). Secondly, there are (at least) two levels of meta-analysis, namely quantifying an overall effect size, and testing variables that potentially explain variation in effect size. You are here sort of combining the two levels of analysis, but including the first level also would give much more insight in the data set.

      Our main intention here was to improve on how the same hypothesis was approached by Santos & Nakagawa. We did this by improving our analysis (on a by “egg” basis) and by adding additional studies (i.e. more data). In this process mistakes are corrected (as we re-extracted all data, and did not copy anything across from their dataset – which was used simply to ensure we found the same papers); more recent data were also added, including studies missed by Santos & Nakagawa. This means that the comparison with Santos & Nakagawa becomes somewhat irrelevant, apart from maybe technical reasons, i.e. pointing out mistakes or limitations in certain approaches. We would not be able to pinpoint these problems clearly without considering the whole dataset, yet Santos & Nakagawa only had a small subset of the data that were available to us. In short, meta-analysis is an iterative process and similar questions are inevitably analysed multiple times and updated. This follows basic meta-analytic concepts and Cochrane principles. Except where there is a huge flaw in a prior dataset or approach (like we sometimes found and highlighted in our own work, e.g. Simons, Koch, Verhulst 2013, Aging Cell), in itself a comparison of the kind the reviewer suggests distracts from the biology. With the dataset being made available others can make these comparisons, if required. On the sex difference, we provide a comparison of effect sizes separated between both sexes and mixed sex in Table S2 and Figure S1.

      (57) L.93 - 108: a thing that does not become clear from this section is whether experimentally reducing brood size affects parental survival similarly (in absolute terms) as enlarging brood size. Whether these effects are symmetric is biologically important, for example because of its effect on clutch size optimization. In the text you are specific about the effects of increasing brood size, but the effect you find could in theory be due entirely to brood size reduction.

      We have added detail to make it clear that a brood reduction is simply the opposite trend. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets there is not a range of chicks added for which a non-linear relationship could be estimated. The question also remains of what the shape of this non-linear relationship should be and is hard to determine a priori.

      We have added some discussion on this to our manuscript (L278-282), in response to an earlier comment.

      (58) L.103-107: this is perhaps better deferred to the discussion, because other potential explanations should also be considered. For example, there have been studies suggesting that small birds were provisioning their brood full time already, and hence had no scope to increase provisioning effort when brood size was experimentally increased.

      We agree this is a discussion point but we believe it also provides an important context for why we ran our simulations, and so we believe this is best kept brief but in place. We agree the example you give is relevant but believe this argument is already contained in this section. See line 121-123 “...suggesting that costs to survival were only observed when a species was pushed beyond its natural limits”.

      (59) L.103-107: this discussion sort of assumes that the results in Table 1 differ between the different ways that the clutch/brood size variation is expressed. Is there any statistical support for this assumption?

      We are unsure of what the reviewer means here exactly. Note that in each of the clutch size transformations, experimental and observational effect sizes are significantly opposite. For the proportional clutch size transformation, experimental and observation studies are both separately significantly different from 0.

      (60) L.104: at this point, I would like to have better insight into the data set. Specifically, a scatter plot showing the manipulation magnitude (raw) plotted against control brood size would be useful.

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers by the journal.

      Thank you for this suggestion: this is a useful suggestion also to illustrate how manipulations are relatively stronger for species with smaller clutches, in line with our interpretation of the result presented in Figure 2. We have added Figure S1 which shows the strength of manipulation compared to the species average.

      (61) L. 107: this seems a bold statement - surely you can test directly whether effect size becomes disproportionally stronger when manipulations are outside the natural range, for example by including this characterization as a factor in the models in Table 1.

      It is hard to define exactly what the natural range is here, so it is not easy to factorise objectively, which is why we chose not to do this. However, it is clear that for species with small clutches the manipulation itself is often outside the natural range. Thank you for your suggestion to include a figure for this as it is clear manipulations are stronger in species with smaller clutches. We attribute this to species being forced outside their natural range. We consider our wording makes it clear that this is our interpretation of our findings and we therefore do not think this is a bold statement, especially as it fits with how we interpret our later simulations.

      (62) Fig.3, legend: the term 'node support' does not mean much to me, please explain.

      Node support is a value given in phylogenetic trees to dictate the confidence of a branch. In this case, values are given as a percentage and so can translate to how many times out of 100 the estimate of the phylogeny gives the same branching. Our values are low, as we have relatively few species in our meta-analysis.

      (63) Fig.3: it would be informative when you indicate in this figure whether the species contributed to the experimental or the observational data set or both.

      We have added into Fig 3 whether the species was observational, experimental or both.

      (64) L.139: the p-value refers to the interaction between species clutch size and treatment (observational vs. experimental), but it appears that no evidence is presented for the correlation being significant in either observational or experimental studies.

      We agree that our reporting of the effect size could be misinterpreted and have added detail here. The statistic provided describes the slopes are significantly different between observational and experimental, implying there are differences between the slopes of small and large clutch-laying species.

      (65) L.140: I am wondering to what extent these correlations, which are potentially interesting, are driven by the fact that species average clutch size was also used when expressing the manipulation effect. In other words, to what extent is the estimate on the Y-axis independent from the clutch size on the X-axis? Showing that the result is the same when using survival effect sizes per manipulation category would considerably improve confidence in this finding.

      We are unsure what the reviewer means by “per manipulation category”. Please also note that we have used a logistic regression to calculate our effect sizes of survival, given a unit increase in reproductive effort. So, for example, if a population contained birds that lay 2,3 or 4 eggs, provided that the number of birds which survived and died in each category did not change, if we changed the number of eggs raised to 10,11 or 12, respectively, then our effect size would be the same. In this way, our effect sizes are independent of the species’ average clutch size.

      (66) L.145: when I remember correctly, Santos & Nakagawa considered brood size reduction and enlargement separately. Can this explain the contrasting result? Please discuss.

      You are correct, in that Santos & Nakagawa compared reductions and enlargements to controls separately. However, we found some mistakes in the data extracted by Santos & Nakagawa that we believe explain the differences in our results for sex-specific effect sizes. We do not feel that highlighting these mistakes in the main text is fair, useful or scientifically relevant, as our approach is to improve the test of the hypothesis.

      (67) L.158-159: looking at table S2 it seems to me you have a whole range of estimates. In any case, there is something to be said for taking the estimates for females because it is my impression (and experience) that clutch size variation in most species is a sex-linked trait, in that clutch size tends to be repeatable among females but not among males.

      We agree that, in many cases, the female is the one that ultimately decides on the number of chicks produced. We did also consider using female effect sizes only, however, we decided against this for the following reasons: (1) many of the species used in our meta-analysis exhibit biparental care, as is the case for many seabirds, and so using females only would bias our results towards species with lower male investment; in our case this would bias the results towards passerine species. (2) it has also been shown that, as females in some species are operating at their maximum of parental care investment, it is the males who are able to adjust their workload to care for extra offspring. (3) we are ultimately looking at how many offspring the breeding adults should produce, given the effort it costs to raise them, and so even if the female chooses a clutch size completely independently of the male, it is still the effort of both parents combined that determines whether the parents gain an overall fitness benefit from laying extra eggs. (4) some studies did not clearly specify male or female parental survival and we would not want to reduce our dataset further.

      (68) L.158-168: please explain how you incorporated brood size effects on the fitness prospects of offspring, given that it is a very robust finding of brood size manipulation studies that this affects offspring growth and survival.

      We would argue this is near-on impossible to incorporate into our simulations. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example, this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest. It would be interesting, however, to explore this further using estimates from the literature, but this is beyond our current scope, and would in our initial intuition not be very accurate. It would be interesting to explore how big the effect on offspring should be to constrain effect size strongly. Such work would be more theoretical. The point of our simple fitness projections here is to aid interpretation of the quantitative effect size we estimated.

      (69) L.163: while I can understand that you select the estimate of -0.05 for computational reasons, it has enormous confidence intervals that also include zero. This seems problematic to me. However, in the simulations, you also examined the results of selecting -0.15, which is close to the lower end of the 95% C.I., which seems worth mentioning here already.

      Thank you for this suggestion. Yes, indeed, our range was chosen based on the CI, and we have now made this explicit in the manuscript.

      (70) L.210: defined in this way, in my world this is not what is generally taken to be a selection differential. Is what you show not simply scaled lifetime reproductive success?

      As far as we are aware, a selection differential is the relative change between a given group and the population mean, which is what we have done here. We appreciate this is a slightly unusual context in which to place this, but it is more logical to consider the individuals who produce more offspring as carrying a potential mutation for higher productivity. However, we believe that “selection differential” is the best terminology for the statistic we present. We also detail in our methodology how we calculate this. We have adjusted this sentence to be more explicit about what we mean by selection differential.

      (71) L.177-180: is this not so because these parameter values are closest to the data you based your estimates on, which yielded a low estimate and hence you see that here also?

      We are unsure of what exactly the reviewer means here. The effect sizes for our exemplar species were predicted from each combination of clutch size and survival rate. Note that we used a range of effect sizes, higher than that estimated in our meta-analysis, to explore a large parameter space and that these same conclusions still hold.

      (72) L.191-194: these statements are problematic, because based on the assumption that an increase in brood size does not impact the fitness prospects of the offspring, and we know this assumption to be false.

      Though we appreciate that some cost is often absorbed by the offspring themselves, we are unaware of any evidence that these costs are substantial and large enough to drive within-species variation in reproductive effort, though for some specific species this may be the case. However, in terms of explaining a generalisable, across-species trend, the fitness costs incurred by a reduction in offspring quality are unlikely to be significantly larger than the survival costs to reproduce. We also find it highly unlikely the cost to fitness incurred by a reduction in offspring quality is large enough to counter-balance the effect of parental quality that we find in our observational studies. We do also discuss other costs in our discussion.

      (73) L.205: here and in other places it would be useful to be more explicit on whether in your discussion you are referring to observational or experimental variation.

      We have added this detail to our manuscript. Do note that many of our conclusions are drawn by the combination of results of experimental and observational studies. We believe the addition of Figure 5 makes this more clear to the reader.

      (74) L.225: this may be true (at least, when we overlook the misuse of the word 'quality' here), but I would expect some nuance here to reflect that there is no surprise at all in this result as this pattern is generally recognized in the literature and has been the (empirical) basis for the often-repeated explanation of why experiments are required to demonstrate trade-offs. On a more quantitative level, it is worth mentioning the paper of Vedder & Bouwhuis (2017, Oikos) that essentially shows the same thing, i.e. a positive association between reproductive output and parental survival.

      We have added some discussion on this point, including adding the citation mentioned. However, we would like to highlight that our results demonstrate that brood manipulations are not necessarily a good test of trade-offs, as they fail to recognise that individuals differ in their underlying quality. Though we agree that this result should not necessarily be a surprising one, we have also not found it to be the case that differences in individual quality are accepted as the reason that intra-specific clutch size is maintained – in fact, we find that it is most commonly argued that when costs of reproduction are not identifiedit is concluded that the costs must be elsewhere – yet we cannot find conclusive evidence that the costs of reproduction (wherever they lie) are driving intra-specific variation in reproductive effort. Furthermore, some studies in our dataset have reported negative correlations between reproductive effort and survival (see observational studies, Figure 1).

      (75) L.225-226: perhaps present this definition when you first use the term.

      We have added more detail to where we first use and define this term to improve clarity (L57-58).

      (76) L.227-228, "currently unknown": this statement surprised me, given that there is a plethora of studies showing within-population variation in clutch size to depend on environmental conditions, in particular the rate at which food can be gathered.

      We mean to question that if an individual is “high quality”, why is it not selected for? We have rephrased, to improve clarity.

      (77) L.231: this seems no more than a special case of the environmental effect you mention above.

      We think this is a relevant special case, as it constitutes within-individual variation in reproduction that is mistaken for between-individual variation. This is a common problem in our field, that we feel needs adressing. We only have between-individual variation here in our study on quality, and by highlighting this we show that there might not be any variation between individuals, but this could come about fully (doubtful) or partly (perhaps likely) due to terminal effects.

      (78) L235-236: but apparently depending on how experimental and natural variation was expressed? Please specify here.

      We are not sure what results the reviewer is referring to here, as we found the same effect (smaller clutch laying species are more severely affected by a change in clutch size) for both clutch size expressed as raw clutch size and standardised clutch size.

      (79) L.237: the concept of 'limits' is not very productive here, and it conflicts with the optimality approach you apply elsewhere. What you are saying here can also be interpreted as there being a non-linear relationship between brood size manipulation and parental survival, but you do not actually test for that. A way to do this would be to treat brood size reduction and enlargement separately. Trade-off curves are not generally expected to be linear, so this would also make more sense biologically than your current approach.

      We have replaced “limits” with “optima”. We believe our current approach of treating clutch size as a continuous variable, regardless of manipulation direction, is the best approach, as it allows us to directly compare with observational studies and between species that use different manipulations (now nicely illustrated by the reviewer’s suggested Figure S1). Also note that transforming clutch size to a proportion of the mean allows us to account for the severity in change in clutch size. We also do not believe that treating reductions and enlargements separately accounts for non-linearity, as either we are separating this into two linear relationships (one for enlargements and one for reductions) or we compare all enlargements/reductions to the control, as in Santos & Nakagawa 2012, which does not take into account the severity of the increase, which we would argue is worse for accounting for non-linearity. Furthermore, in the cases where the manipulation involved one offspring only, we also cannot account for non-linearity.

      (80) L.239: assuming birds are on average able to optimize their clutch size, one could argue that any manipulation, large or small, on average forces birds to raise a number of offspring that deviates from their natural optimum. At this point, it would be interesting to discuss in some detail studies with manipulation designs that included different levels of brood size reduction/enlargement.

      We agree with the reviewer that any manipulation is changing an individual’sclutch size away from its own individual optima, which we have argued also means brood manipulations are not necessarily a good test of whether a trade-off occurs in the wild (naturally), as there could be interactions with quality – we have now edited to explicitly state this (L299-300).

      (81) L.242-244: when you choose to maintain this statement, please add something along the lines of "assuming there is no trade-off between number and quality of offspring".

      As explained above, though we agree that the offspring may incur some of the cost themselves, we are not aware of any evidence suggesting this trade-off is also large enough to drive intra-specific variation in clutch size across species. Furthermore, in the context here, the trade-off between number and quality of offspring would not change our conclusion – that the fitness benefit of raising more offspring is offset by the cost on survival. We have added detail on the costs incurred by offspring earlier in our discussion (L309-315). The addition of Figure 5 should help interpret these data.

      (82) L.253: instead of reference 30 the paper by Tinbergen et al in Behaviour (1990) seems more appropriate.

      We believe our current citation is relevant here but we have also added the Tinbergen et al (1990) citation.

      (83) L.253-254: such trade-offs may perfectly explain variation in reproductive effort within species if we were able to estimate cost-benefit relations for individuals. In fact, reference 29 goes some way to achieve this, by explaining seasonal variation in reproductive effort.

      We are unaware of any quantitative evidence that any combination of trade-offs explains intra-specific variation in reproductive effort, especially as a general across-species trend.

      (84) L.255: how does one demonstrate "between species life-history trade-offs"? The 'trade-off' between reproductive rate and survival we observe between species is not necessarily causal, and hence may not really be a trade-off but due to other factors - demonstrating causality requires some form of experimental manipulation.

      Between-species trade-offs are well established in the field, stemming from GC Williams’ seminal paper in 1966, and for example in r/K selection theory. It is possible to move from these correlations to testing for causation, and this is happening currently by introducing transgenes (genes from other species) that promote longevity into shorter-lived species (e.g., naked-mole rat genes into mice). As yet it is unclear what the effects on reproduction are.

      (85) L.256: it is quite a big claim that this is a novel suggestion. In fact, it is a general finding in evolutionary theory that fitness landscapes tend to be rather flat at equilibrium.

      It is important to note here that we simulate the effect size found, and hence this is the novel suggestion, that because the resulting fitness landscape is relatively flat there is no directional selection observed. We did not intend to suggest our interpretation of flat fitness landscapes is novel. We have changed the phrasing of this sentence to avoid misinterpretation.

      (86) L.259: why bring up physiological 'costs' here, given that you focus on fitness costs? Do you perhaps mean fitness costs instead of physiological costs? Furthermore, here and in the remainder of this paragraph it would be useful to be more specific on whether you are considering natural or experimental variation.

      The cost of survival is a physiological cost incurred by the reduction of self-maintenance as a result of lower resource allocation. This is one arm of fitness; we feel it would be confusing here to talk about costs to fitness, as we do not assess costs to future reproduction (which formed the large part of the critique offered by the reviewer). We would like to highlight that the aim of this manuscript was to separate costs of reproduction from the effects of quality, and this is why we have observational and experimental studies in one analysis, rather than separately. Our conclusion that we have found no evidence that the survival cost to reproduce drives within-species variation in clutch size comes both from the positive correlation found in the observational studies and our negligible fitness return estimates in our simulations. We therefore, do not believe it is helpful to separate observational and experimental conclusions throughout our manuscript, as the point is that they are inherently linked. We hope that with the addition of Figure 5 that this is more clear.

      (87) L.262: The finding that naturally more productive individuals tend to also survive better one could say is by definition explained by variation in 'quality', how else would you define quality?

      We agree, and hence we believe quality is a good term to describe individuals who perform highly in two different traits. Note that we also say the lack of evidence that trade-offs drive intra-specific variation in clutch size also potentially suggests an alternative theory, including intra-specific variation driven by differences in individual quality.

      Supplementary information

      (88) Table S1: please provide details on how the treatment was coded - this information is needed to derive the estimates of the clutch size effect for the treatments separately.

      We have added this detail.

      (89) Table S2: please report the number of effect sizes included in each of these models.

      We have added this detail.

      (90) Table S4: references are not given. Mentioning species here would be useful. For example, Ashcroft (1979) studied puffins, which lay a single egg, making me wonder what is meant when mentioning "No clutch or brood size given" as the reason for exclusion. A few more words to explain why specific studies were excluded would be useful. For example, what does "Clutch size groups too large" mean? It surprises me that studies are excluded because "No standard deviation reported for survival" - as the exact distribution is known when sample size and proportion of survivors is known.

      We have updated this table for more clarity.

      (91) Fig.S1: please plot different panels with the same scale (separately for observational and experimental studies). You could add the individual data points to these plots - or at least indicate the sample size for the different categories (female, male, mixed).

      We have scaled all panels to have the same y axis and added sample sizes to the figure legend.

      (92) Fig.S3: please provide separate plots for experimental and observational studies, as it seems entirely plausible that the risk of publication bias is larger for observational studies - in particular those that did not also include a brood size manipulation. At the same time, one can wonder what a potential publication bias among observational studies would represent, given that apparently you did not attempt to collect all studies that reported the relevant information.

      We have coloured the points for experimental and observational studies. Note that a study is an independent effect size and, therefore, does not indicate whether multiple data (i.e., both experimental and observational studies) came from the same paper. As we detail in the paper and above in our reviewer responses, we searched for observational studies from species used in the experimental studies to allow direct comparison between observational and experimental datasets.

      Reviewer #2 (Recommendations For The Authors):

      I strongly recommend improving the theoretical component of the analysis by providing a solid theoretical framework before, from it, drawing conclusions.

      This, at a minimum, requires a statistical model and most importantly a mechanistic model describing the assumed relationships.

      We thank the reviewer for highlighting that our aims and methodology are unclear in places. We have added detail to our model and simulation descriptions and have improved the description of our rationale. We also feel the failure of the journal to provide code and data to the reviewers has not helped their appreciation of our methodology and use of data.

      Because the field uses the same wording for different concepts and different wording for the same concept, a glossary is also necessary.

      We thank the reviewer for raising this issue. During the revision of this manuscript, we have simplified our terminology or given a definition, and we believe this is sufficient for readers to understand our terminology.

      Reviewer #3 (Recommendations For The Authors):

      • The files containing information of data extracted from each study were not available so it has not been possible to check how any of the points raised above apply to the species included in the study. The ms should include this file on the Supp. Info as is standard good practice for a comparative analysis.

      We supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. We believe the data is too large to include as a table in the main text and is not essential in understanding the paper. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      • For clarity, refer to 'the effect size of clutch size on survival" rather than simply "effect size". Figures 1 and 2 require cross-referencing with the main text to understand the y-axis.

      We have added detail to the figure legend to increase the interpretability of the figures.

      • Silhouettes in Figure 3 (or photos) would help readers without ornithological expertise to understand the taxonomic range of the species included in the analyses.

      We have added silhouettes into Figure 3.

      • Throughout the discussion: superscripts shouldn't be treated as words in a sentence so please add authors' names where appropriate.

      We have added author names and dates where required.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for their attentive reading of our manuscript. We appreciate all the comments and suggestions. We have addressed all the concerns and have included point-by-point responses.

      Reviewer #1

      Evidence, reproducibility and clarity

      • *

      Summary:

      * Cacioppo et al perform a meta-analysis of public omics data examining AURKA protein and mRNA expression (including mRNA isoforms with alternative cleavage and polyadenylation), and hsa-let-7a miRNA (shown to target AURKA mRNA) in multiple cancer types from The Cancer Genome Atlas. They conclude AURKA mRNA and protein expression may be discordant in cancer in part due to the interplay between alternative polyadenylation and hsa-let-7a miRNA.

      Major comments:*

      * 1) Unfortunately, there is a major flaw in the TCGA AURKA protein quantification data that underpins much of this study. Following the protein data trail (via https://docs.gdc.cancer.gov/Data/Introduction and its dependents), it appears to rely on the CST anti-AURKA #14475 which is raised to an antigen around Pro70.*

      Response: We believe the reviewer refers to work from Bertolin et al. 2018 paper (https://doi.org/10.7554/eLife.38111.001) that describes the appearance of truncated versions of AURKA in mitochondrial fractions of cell extracts and shows they depend upon the presence of PMPCB mitochondrial matrix peptidase. We are not familiar with any other literature describing this phenomenon. In our own hands we find AURKA present in the mitochondrial fraction, but the protein is mostly full-length (Grant et al. 2018, https://doi.org/10.1098/rsob.170272). In both papers the mitochondrial pool is small relative to the total cellular pool of AURKA. In fact, this mitochondrial pool is so difficult to detect in intact cells that it has not been reported by other labs and is not universally acknowledged. Given the small size of the mitochondrial pool, any increased amounts of mitochondrial AURKA in cancers, it would be unlikely to significantly impact the measured total protein levels.

      2) Following the flaws identified in the protein foundation data, the study would then benefit from some post-validation of findings with actual biological data derived from their own independent assessment of the cancers being examined.

      • *

      Response: The literature thoroughly reports empirical evidence on AURKA protein expression levels in the cancers analysed in this study, therefore we don't believe our own post-validation of findings would add any novelty in this sense.

      Minor comments:

      * 1) All of the Correlation analysis have been tested for statistical significance and these results are available in the supplementary data. However, I think it would be useful if these statistics were also included in the main figures themselves. (Figures 1B, 2B and 2C) A low correlation that is statistically significant is a more powerful statement.*

      Response: We agree, and plan to add the results of the statistical analyses in the Figures 1B, 2B and 2C.

      2) In the materials and methods, Correlation is separated into distinct degrees: none to very strong, but apart from some lines on the graphs, these degrees of correlation strength are never revisited, so they should be included. Perhaps there is a biological difference between AURKA post transcriptional regulation and protein levels with different R score strength?

      Response: We believe that reiterating a discussion on the degrees of correlation strength in the main text would appear repetitive. We do however plan to add a sentence to appropriate points in the main text to redirect the reader to the materials and methods section for information on the distinct degrees of correlation.

      3) In Figure 2D a clustering analysis was performed to show the possible relationships between hsa-let-7a and protein levels. The current visualization is hard to understand. A 3D graph with Protein, mRNA and has-let-7a axis's would be easier to follow. I believe it would also be beneficial to do something similar including the APA data as this is the area that the paper lacks depth.

      • *

      Response: We agree that 3D graphs could aid visualization and plan to provide a link to an interactive 3D view of our analysis.

      * 4) Figure 3B and 3C, can you apply a statistical test on the SLR ratios given the magnitude difference between CCND1 and AURKA SLRs?*

      • *

      Response: Since the values of AURKA and CCND1 SLRs are not always coming from the same dataset and are therefore not matched for patients, we believe it would not be appropriate to make comparisons applying statistical tests.

      * 5) Even though the paper does not claim to provide a unifying hypothesis for APA/has-let-7a regulation of AURKA, I think a more in depth look at the data would be useful. The discussion starts off well when describing what was found with the analysis, but as is, is mostly a re-statement of the results without added insight.*

      Response: We agree that more in depth analysis of more data would be useful in strengthening conclusions. However, given the variability in interplay between APA and hsa-let-7a we describe, it is well beyond the scope of this study (or the extent of TCGA database) to come up with a unifying hypothesis.

      Significance

      • *

      The study is novel in attempting to show additional layers of AURKA regulation that hadn't been previously investigated. Furthermore, factors controlling AURKA expression are of broad interest. Overall, I would like to say this is an interesting investigation into AURKA mRNA expression in cancers. In our opinion the choice of bioinformatic tools is appropriate and well controlled.*

      General Assessment: As noted in the major comments, a major weakness is the reliance on a flawed measure of AURKA protein levels from the foundation dataset. Thus, the study needs to be repeated using an alternative MS derived dataset to accurately quantify total AURKA protein levels. This would greatly improve the study and subsequent claims.

      Advance: The study has potential to extend knowledge in the field in a conceptual way, predicting the complex interplay of factors that regulate AURKA mRNA processing and translation.

      Audience: Currently the paper is only fully accessible a specialized bioinformatician audience but the topic (factors controlling AURKA expression) has a broad interest in many fields not limited to just cancer but also development and other non-cancer diseases.*

      * This review was jointly completed by a mouse model of human disease AURKA biologist with 24 years' experience, and a bioinformatician.*

      • *

      Reviewer #2

      Evidence, reproducibility and clarity

      In the manuscript "Post-transcriptional control drives Aurora kinase A expression in human cancers", authors Cacioppo, Lindon and colleagues analyze publicly available data on transcript and protein levels for many cancer types to determine correlations between transcript and protein levels for Aurora A and the microRNA hsa-let-7a. This study builds on a recent publication from their lab where they show that different polyadenylation isoforms of the Aurora A transcript in triple negative breast cancer correlate with patient survival and affect protein abundance. In this study, they aim to extend this analysis to 18 different cancer types to determine if posttranscriptional regulation potentially plays a role in Aurora A protein abundance. The authors find that for certain cancer types, Aurora A protein abundance does not correlate with mRNA abundance, suggesting that posttranscriptional regulation may be responsible for differences in protein expression in these cancer types. Furthermore, they find negative correlations between expression of hsa-let-7a and mRNA and protein abundance in certain cancer types, implicating this microRNA as a potential regulator of Aurora A mRNA stability.*

      Major comments:

      1. The biggest issue that I have with this analysis relates to the assumption that Aurora A levels will be meaningfully different between individual tumors in all cancer types. For some cancers, the lack of a correlation between mRNA and protein levels for Aurora A could simply be because Aurora A overexpression is not a feature of that cancer type. Looking at the data, the cancer types where they see little-to-no correlation are the cancer types where none of the tumors have high levels of Aurora A mRNA or protein. Therefore, the lack of correlation is likely because differences in protein levels result from noise in the measurements rather than posttranscriptional regulation. Since the lack of correlation between protein and mRNA in these cancer types is the main evidence for the primary conclusion in the paper that "AURKA mRNA and protein expression are often discordant in cancer as a result of dynamic post-transcriptional regulation", I don't think that this conclusion is supported by the data. If anything, the data seems to show that substantial changes in Aurora A protein levels are almost always accompanied by a corresponding change in mRNA levels.

      To address this issue, the authors could look at the variability in Aurora A protein levels for each cancer type, and then focus their correlation analyses on cancer types where overexpression of Aurora A is a feature.*

      Response: We thank the reviewer for this thoughtful comment. We decided not to consider data on AURKA protein levels between healthy and tumour samples because of the lack of proteomic datasets of matching normal tissues for all cancers (except BRCA) in the TCGA database. For this reason, it cannot be excluded that the tumours where we see little-to-no protein-mRNA correlation have in fact high levels of AURKA protein. Indeed, the literature reports wide empirical evidence that AURKA protein is overexpressed in the cancer tissues where we see little-to-no protein-mRNA correlation (Thyroid cancer: Zhao et al, Cell Biosci, 2022; Jingtai et al, Cell Death Dis, 2023. Prostate cancer: Das et al, Pathol, 2010; Chun Yu Lee et al, Cancer Res, 2006. Kidney cancers: Wen et al, Heliyon, 2024; Li et al, Cell Death Dis, 2022. No evidence available for PCPG). Therefore, we believe that is reasonable to propose that in these cancers, which according to our analysis of TCGA data only show minor or no increase in AURKA mRNA expression compared to the normal tissue, lack of correlation is because of post-transcriptional regulation.

      2. The statistical significance of the analyses is often unclear. For the correlations between Aurora A protein levels and hsa-let-7a, authors mention that two cancers have a correlation with "statistical significance", but I cannot find any indication of how that was determined, and it is not shown in the corresponding figure (2C). The only time significance is indicated for a correlation is in Figure 4A. Is this the only correlation in the whole manuscript with a p-value less than .05?

      Response: The results of the statistical analyses are included in the corresponding supplementary data (Sup. Fig 1, Sup. Fig. 2A-B). We plan to add them to the Figures 1B, 2B and 2C as requested by another reviewer.

      3. The SLR for the Aurora A transcripts is only shown in terms of a ratio between cancer and normal tissue. Without the numbers in the absence of normalization, it is difficult to determine how meaningful this is. Is a two-fold change going from .3 to .6 or .001 to .002?

      • *

      Response: We plan to add a supplementary table containing the SLR values for matched normal and cancer samples in the absence of normalization.

      4. Figure 5B is nearly impossible to interpret due to the extreme differences in overall transcript levels between the cancer types. The differences in scaling of the y-axis between the plots makes this even more challenging. The authors state that "It is evident that each isoform has an individual profile of expression across cancers", but this could only be determined from relative expression levels between the different isoforms instead of absolute levels.

      Response: We retrieved this plot from the GEPIA2 platform without possibility of editing the y-axis. We plan to edit the text to "It is likely that each isoform has an individual profile of expression across cancers, however a measure of the relative expression levels between the different isoforms would be required".

      Minor comments:*

      1. In supplementary figure 3, SLR is plotted on a log scale in A and a linear scale in B.*

      Response: We plan to convert the SLR scale in Sup. Fig. 3B to a log scale.

      2. Figure 4D is a correlation of correlations. I don't see how to interpret this in a meaningful way.

      Response: Figure 4D is not intended for quantitative analysis of correlation of correlations (no quantitative coefficients were in fact calculated), rather to visualize how the link of AURKA SLR with AURKA protein levels and that with hsa-let-7a levels can be differently associated in different cancers.

      Significance

      Aurora A is overexpressed in a wide variety of cancer types. This overexpression is commonly believed to result primarily from increased mRNA abundance. The identification of additional mechanisms regulating Aurora A protein levels would therefore be of interest to the field, as these regulatory mechanisms could be contributing to Aurora A's role in cancer progression.*

      To some degree, the significance of the findings presented here depend on whether they convincingly demonstrate substantial post-transcriptional regulation. My interpretation of the data presented in this manuscript is that it largely supports Aurora A protein levels being extremely well correlated with mRNA levels, which is in line with previous findings.*

      • *

      • *

      • *

      Reviewer #3

      Evidence, reproducibility and clarity

      • *

      *Aurora A misregulation at both mRNA and protein levels has been known since the 1990s to be casually associated in vivo, and strongly associated in vitro, with tumourigenesis. The study builds the case that dysregulation of Aurora A mRNA and protein levels (most previously established) are more prevalent in cancer cells than 'normal' cells, using data from TCGA, and extends this to a mechanistic explanation. It evaluates miRNA and the ratio of the two short/long ratio (SLR) isoforms of mRNA across cancer types compared to healthy controls. The work concludes that an interplay between APA (alternative polyadenylation) and hsa-let-7a miRNA (which has known tumor suppressor properties) regulation of AURKA mRNA contributes to alternative splicing, revealing a new factor explaining changes in AURKA expression in many (if not all) cancers. *

      • *

      *Minor points: *

      • *

      *1) To strengthen the study, some analysis of AURKB mRNA would be useful in the same datasets, because this is also an M-phase kinase. *

      • *

      Response: We carried out a specific study of AURKA (and to some extent also of the cell cycle regulator CCND1) using time-limited access to private TCGA datasets. Although we agree that investigation of AURKB would potentially enable us to strengthen some conclusions, this would be a new project that we do not currently have resources for.

      *2) What happens to TPX2 or CEP192 mRNA (splicing or levels) in the same samples? For TPX2 in particular, this is described in the literature to help form the oncogenic holoenzyme, as well as dictating AURKA protein stability. *

      • *

      Response: Again, we like this suggestion but are not in a position to carry out analyses of TPX2 and CEP192 within the scope of this study.

      • *

      *3) Does an alternative AURKA splicing change G1/S to G2/M-phase roles of AURKA? I understand that mRNA is repressed by hsa- let-7a in G1 and S phases but not in G2, so how does non M-phase AURKA protein get made? This may be beyond the scope of the study at this point. *

      • *

      Response: Whether alternative AURKA transcripts change non-mitotic roles of AURKA is an open and intriguing question. In acknowledgement of this point raised by the reviewer, we plan to add a discussion on this in the main text: "Although there is no evidence to date that different AURKA transcripts might influence AURKA activity, instances of isoform-dependent protein localization and function are increasingly reported (Mitschka and Mayr, Nat Rev Mol Cell Biol, 2022). In a previous study, we have detected higher nuclear localization of a reporter protein under the regulation of AURKA short 3'UTR (Cacioppo et al., eLife, 2023). Therefore, there is a possibility that AURKA mRNA isoforms are targeted to different subcellular localizations to support localized translation - or that AURKA protein is co-translationally targeted to different compartments - and AURKA may be preferentially localized in the nucleus when coded by the short 3'UTR mRNA".

      AURKA protein levels are maintained very low in G1 to S phase compared to G2 and M phases. At the level of translation, this is likely ensured by the absence of factors/mechanisms that activate AURKA translation (e.g., hnRNP Q1) and the presence of factors/mechanisms that repress its translation (e.g., hsa-let-7a), the combination of which results in basal translation of AURKA in G1/S until full translational activation in G2 (where a switch likely occurs whereby activating factors operate while repressing factors are disabled). However, the combination and synergy of these factors/mechanisms are likely cell type- and context-dependent.

      • *

      Significance

      *I think the study is strong overall, and the authors are humble enough to describe the work as an exploratory analysis, which though not directly in my area of expertise (since it relies on data assembly and statistical analysis), has the right team to ask the questions and interrogate the data. It builds on a huge amount of literature and a recent study from this team showing that alternative translation is relevant to activation of AURKA, and which linked let-7a to this process. Overall, the study provides a very useful resource for other researchers, assembling a large amount of data around AURKA mRNA variants, Let-7a miRNA and coming to the conclusions that *

      *1) hsa-let-7a potentially negatively controls the rate of degradation or translation of AURKA mRNA in cancer cells. *

      *2) Splicing-related architecture of the 5'UTR of AURKA mRNA likely plays a role in determining the context-dependent cancer expression profile of expression. *

      Overall, with some extra information around the key regulators of AURKA (TPX2 mRNA?) the work is likely to be cited and spur on future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper presents a new protocol for quantifying tRNA aminoacylation levels by deep sequencing. The improved methods for discrimination of aminoacyl-tRNAs from non-acylated tRNAs, more efficient splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction, and the use of an error-tolerating mapping algorithm to map the tRNA sequencing reads provide new tools for anyone interested in tRNA concentrations and functional states in different cells and organisms. The results and conclusions are solid with well-designed tests to optimize the protocol under different conditions.

      Public Reviews:

      We thank both reviewers for suggestions, feedback and improvements. We address these pointwise below.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript of Davidsen and Sullivan describes an improved tRNA-seq protocol to determine aminoacyl-tRNA levels. The improvements include: (i) optimizing the Whitfeld or oxidation reaction to select aminoacyl-tRNAs from oxidation-sensitive non-acylated tRNAs; (ii) using a splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction; (iii) using an error-tolerating mapping algorithm to map the tRNA sequencing reads that contain mismatches at modified nucleotides.

      Strengths:

      The two steps, the oxidation, and the splint-assisted ligation are yield-diminishing steps, thus the protocol of Davidsen and Sullivan is an important improvement of the current protocols to enhance the quantification of aminocyl-tRNAs.

      Weaknesses:

      The oxidation and the selection of aminoacyl-tRNA is the first step in all protocols. Thereafter they differ on whether blunt ligation, hairpin (DM-tRNA-seq, YAMAT-seq, QuantM-seq, mim tRNA-seq, LOTTE tRNA-seq), or splint ligation is used and finally what detection method is applied (i-tRAP, tRNA microarrays). What is the correlation to those alternative approaches (e.g. i-tRAP (PMID 36283829), tRNA microarrays (PMID: 31263264) etc.)? What is the correlation with other approaches with which this improved protocol shares some steps (DM-tRNA-seq, mim-tRNA-seq)?

      We appreciate the fair assessment and fully agree that our work would benefit from a large comparison between all known tRNA-seq methods. We did directly compare many elements of our method to those of other methods (e.g. ligation efficiency and barcode bias); however, as noted by the reviewer we did not perform a direct end-to-end comparison with all other methods. An ideal comparison would require running several different sample conditions and technical replicates through our protocol and repeating the process across a half dozen or so other methods as they are described. Unfortunately, this approach is unlikely to be feasible since each method uses different oligos, reagents and kits, and all would have to be acquired at substantial cost. Some methods also rely on other detection methods such as microarrays, qPCR, or Illumina sequencing, which would also make this goal all the more onerous. There are also different pipelines for data processing that, in some instances, make the final results hard to compare. In short, this would be a monumental and expensive task to do comprehensively. We also worry that, even if these experiments were conducted such that some variables were concluded to be superior, they could still be challengeable based on perceived or actual protocol differences from the prior art. In summary, we think that an overall comparison with each method would be ideal, but practical concerns limit us to optimizing and comparing the variables that we found to be most prone to introducing bias in the results.

      For methods that measure tRNA expression levels (DM-tRNA-seq, YAMAT-seq, QuantM-seq, mim-tRNA-seq, LOTTE tRNA-seq etc.) there are some fundamental problems regarding absolute quantification using NGS that preclude simple comparisons. These problems are well known in the field of microRNA (Fuchs et al. (2012) [PMID: 25942392]) and arise due to several factors introduced during processing steps such as purification, ligation, reverse transcription and amplification. With the lack a “true” quantitation benchmark it would be difficult to make quantitative claims from each.  Therefore, in our own work we benchmark tRNA expression levels for sample-to-sample reproducibility (i.e. precision) as further explained in the response to reviewer #2.

      For comparison to methods that measure tRNA charge we did have an opportunity to compare our results with those of another study. To this end, we have added a figure comparing the baseline charge found using our method and the one used in Evans et al. (Revised manuscript Figure 2—figure supplement 9). This comparison finds broadly similar results for tRNA charge, including similar trends for a subset of Glu, Ser and Pro codons that are notable for their lowered basal tRNA charge.

      Reviewer #2 (Public Review):

      Davidsen and Sullivan present an improved method for quantifying tRNA aminoacylation levels by deep sequencing. By combining recent advances in tRNA sequencing with lysine-based chemistry that is more gentle on RNA, splint oligo-based adapter ligation, and full alignment of tRNA reads, they generate an interesting new protocol. The lab protocol is complemented by a software tool that is openly available on Github. Many of the points highlighted in this protocol are not new but have been used in recent protocols such as Behrens et al. (2021) or McGlincy and Ingolia (2017). Nevertheless, a strength of this study is that the authors carefully test different conditions to optimize their protocol using a set of well-designed controls.

      The conclusions of the manuscript appear to be well supported by the data presented. However, there are a few points that need to be clarified.

      We appreciate the acknowledgement of the strength of our aminoacylation controls and agree that our method is relying on many aspects of the mentioned prior work.  

      (1) One point that remains unsatisfactory is a better benchmarking against the state of the art. It is currently impossible to estimate how much the results of this new protocol differ from alternative methods and in particular from Behrens et al. (2021). Here it will be helpful to perform experiments with samples similar to those used in the mim-tRNAseq study and not with H1299 cells.

      We fully agree that more rigorous benchmarking would be desirable. As also noted in the response to reviewer #1, a full end-to-end comparison of methods would be ideal but would be onerous and expensive in practice, so we focused on optimizing the steps we found to be most prone to introducing bias in the data.

      We agree that Behrens et al., (2021) has substantial methodological overlap with our work and was instrumental in our efforts; however, the focus of their manuscript was largely on quantification of tRNA abundance and modifications, rather than the tRNA charge. In fact, tRNA charge was only determined for yeast in that study. Quantifying the abundance of short RNAs using NGS is very difficult (Fuchs et al. (2012) [PMID: 25942392]) and will likely require the use of a mixture of tRNAs as spike-in references for normalization (Bissels et al. (2009) [PMID: 19861428]). In the case of Behrens et al. (2021), they did not use a spike-in tRNA reference, but instead correlated gene copy number with their measured tRNA abundance. They also compare to Northern blotting for two tRNA transcripts, showing a directionally similar result; however, no quantitative claims can be made measurement accuracy. Until a good method of normalizing tRNA quantification is found, we believe that sample-to-sample reproducibility (i.e. precision) is the most useful objective to optimize because this will allow detection of differential expression. Towards that end, we quantified the precision of our method (Figure 4 and its two supplementary figures) with associated statistics, which can be used to estimate the number of samples required to detect significance during differential expression analysis. For tRNA charge, quantification is easier, which is why we present statistics on both accuracy and precision. In this case we can better compare results across methods, and so we have added a comparison of our results to the charge quantification from Evans et al. (2017) (Figure 2—figure supplement 9).

      (2) While the protocol aims to implement an improved method for quantification of tRNA aminoacylation, it can also be used for tRNA quantification and analysis of tRNA modifications. It will increase the impact of this study if the authors benchmark the outcomes of their protocol with other tRNA sequencing protocols with samples similar to these papers, which will be important for certain research teams that are unlikely to implement two different tRNA sequencing methods. Are there any possible adaptations that would allow the analysis of tRNA fragments?

      The first part of this comment regarding comparison of methods is addressed in response to in the prior reviewer comment and in the response to reviewer 1. In the specific case of tRNA modifications, the issue is similar to abundance quantification in that a “true” reference of modified tRNA is likely necessary for proper quantification, alongside testing of each method simultaneously.

      Regarding tRNA fragments, our method is not suitable for this use case. This is because our adapter ligation step depends on an intact tRNA structure with either CCA or CC overhang on the 3’-end and thus we almost exclusively get reads with CCA/CC ends and no reads from fragments. This specificity is good for increasing charge quantification accuracy but not good for the methods versatility. For a more versatile method we recommend Watkins et al. (2022) [PMID: 35513407].

      (3) Like Behrens et al. (2021), Davidsen and Sullivan use TGIRT-III RT for their analyses. The enzyme is not currently available in a form suitable for tRNA-seq. It would be very helpful to test different new RT enzymes that are commercially available. The example of Maxima RT - Figure 2 Supp 6 - shows significantly lower performance than the presented TGIRT-III RT data. In lines 296-298, the authors mention improvements to the protocol by using ornithine. Why are these improvements not included?

      We share similar concerns that the TGIRT-III enzyme is no longer commercially available. It became unavailable while we were preparing this manuscript, reflected by the fact that almost all our figures are made using this enzyme. Others have discovered this too and Lucas et al. (2023) [PMID: 37024678] tested several RT polymerases using TapeStation as a readout for readthrough. As they reported that Maxima has good performance, we decided to test it on a full run with replicates. The results are outlined in Figure 2—figure supplement 6 and for resubmission we have added a table to the appendix that compares the alignment statistics. Unfortunately, the readthrough of the Maxima polymerase on cytoplasmic tRNAs is not as high as for TGIRT-III; however, interestingly it seems to have better performance for mitochondrial tRNAs (Figure 2 – Figure Supplement 6). Regardless, in the initial paper submission we failed to evaluate whether this readthrough difference affected charge measurements. We have now fixed this by adding Figure 2—figure supplement 7, which shows that there are no differences in charge measurements TGIRT-III vs. Maxima. Not surprisingly, there are substantial differences between polymerases when looking at relative tRNA abundance (which affirms the discussion above related to the difficulty of tRNA abundance quantification); however, the high sample-to-sample reproducibility remains intact with either polymerase. An exhaustive search for better polymerases is warranted but falls outside the scope of our work.

      Regarding the improvements suggested by us, using ornithine as a cleavage catalyst instead of lysine, we first learned about this possibility later and thus only want to make readers aware that other options exist. We have clarified the paragraph to make this clearer.

      (4) A technical concern: The samples are purified multiple times using a specific RNA purification kit. Did the authors test different methods to purify the RNA and does this influence the result of the method?

      In the past, we have relied exclusively on alcohol precipitation but during the development of this protocol we found it easier and more reproducible to use column-based purification when possible. However, as we have not made a direct comparison this remains anecdotal evidence. Nonetheless, to minimize any possible bias of column-based purification you will notice that we use columns with binding capacity 5x higher than the highest amount of RNA/DNA added to the column.

      (5) The study would benefit from an explicit step-by-step protocol, including the choice of adapters that are shown to work best in the protocol.

      This is a great point! We have included tables with all the oligos used (Supplementary file 1), a detailed step-by-step protocol with pictures of anticipated gel results (Supplementary file 2) and an overview of the RNA/DNA manipulations to make it clear where adapter sequences are located (Supplementary file 3). For the data processing we provide a comprehensive example in the Github repository. All this was included in our first submission of this manuscript (as well as on bioRxiv), but we suspect this was not readily accessible to the reviewers. We will make sure that these documents are going to be available through eLife and have emphasized their existence in the main text of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      To stratify this improvement a comparison to the most common methods should be made. For example, how do the results with the improved protocol with i-tRAP (PMID 36283829), tRNA microarrays (PMID: 31263264), or with the approaches the improved protocol shares with some other tRNA-seq approaches (DM-tRNA-seq, mim-tRNA-seq)?

      Once again, we thank the reviewer for the good recommendations. The points about direct comparisons were discussed above.

      Reviewer #2 (Recommendations For The Authors):

      These are all great points; we address them below.

      Minor points:

      - Please use chemical conventions, e.g. for mcm5s2U and NaIO4 with superscript or subscript.

      Fixed.

      - Figure 2F: Glu GAA is only 82% charged; can this be due to mcm5s2U (Figure 3 supp 2) leading to a misalignment? What happens to Ser-NNN? Why is mitochondrial tRNA so much less charged?

      Regarding the Glu-GAA charge at baseline, we do not think this is an artifact of the mcm5s2U modification as it would then also be expected for Gln-CAA and Lys-AAA. The same occurs in the charge data in Evans et al. (2017) and they use a very different alignment strategy. Lastly, the charge titration and half-life experiments show no evidence of inaccuracy/bias for Glu-GAA.

      But the question remains – why is the charge of Glu-GAA so low? At this point our best guess is speculative. It may have something to do with the strong enrichment of Glu-GAA codons in the A site found by ribosome profiling on mouse embryonic stem cells (Ingolia et al. (2011) [PMID: 22056041]).

      - Spell out "clvg" or "dphs" in the figure legend of Figure 2 and others. Similar for other abbreviations in figures. They are not always explained in the legends.

      Fixed.

      - Figure 3 supp 2: Please use U instead of T in the anticodons. The labels are a bit confusing. Please clearly align to the tick (also for Figure 3C).

      Fixed.

      - Line 220-223. Which RT enzyme was used for Figure 3 supp 2? Does it make a difference?

      TGIRT-III was used. Only Figure 2—figure supplement 6 and Figure 2—figure supplement 7 (added for resubmission) show data with the Maxima polymerase. To address the second part of the question we have added a comparison between TGIRT-III and Maxima for mcm5s2U modification detection (Figure 3—figure supplement 3). Interestingly, there is a polymerase specific signature for mcm5s2U modifications; however, more work would be required to determine which polymerase is best suited for detection of this and other modifications.

      - Figure 4 supp 1 and Figure 4 supp 2 change order.

      Fixed.

      Typos:

      - Figure 1 and Figure 1-figure supplement 1: In the periodate the "-" is in a small box (at least in my PDF viewer). Can this box be removed?

      - Line 175: duplicated verb.

      - Line 348: "moved".

      Thanks for catching these. They have now been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Measurement of secreted amylase could be seen as direct evidence of sweating, however, how to determine the causal relationship between climbing behavior and sweating? Friction force may also be reduced when there is too much fingertip moisture.

      As the reviewer notes, measurement of secreted amylase can provide direct evidence of sweating, and we performed an iodine and starch reaction. Upon observing the involvement of TRPV4 in mouse foot pad perspiration, we then considered which type of behavioral analysis would be suitable to evaluate this perspiration. We agree with the reviewer’s point that friction force in the climbing test may be reduced by excessive sweating. However, we did not observe severe sweating in the absence of acetylcholine treatment. Accordingly, we interpreted that the increase in the climbing test failure rate for TRPV4KO mice could reflect the reduced friction force associated with the lack of TRPV4 activity.

      (2) For the human skin immunostaining, did the author use the same TRPV4 antibody as used in the mouse staining? Did they validate the specificity of the antibody for the human TRPV4 channel? 

      We used different antibodies for human and mouse samples. Since commercially available anti-TRPV4 antibodies do not work well with mouse samples, we generated our own anti-TRPV4 antibody and validated its specificity.

      (3) In lines 116-117, the authors tried to determine "the functional interaction of TRPV4 and ANO1 is involved in temperature-dependent sweating", however, they only used the TRPV4 ko mice and did not show any evidence supporting the relationship between TRPV4 and ANO1. 

      As the reviewer pointed out, based on the data presented in the original submission we cannot conclude that an interaction between TRPV4 and ANO1 is involved in perspiration. However, we think that the data for TRPV4KO mice presented in Figure 3 of the original version does indicate that TRPV4 is involved in perspiration. The finding that menthol and its related compounds, which inhibit the function of both TRPV4 and ANO1 (see our publication in Scientific Reports 7: 43132, 2017), blocked perspiration in both wild-type and TRPV4KO mice (original Figure 3C, D) indicates involvement of either TRPV4 or ANO1 in perspiration. In the revised version, we present results for additional iodine and starch reaction experiments using Ani9, a potent and specific ANO1 inhibitor. Ani9 drastically inhibited perspiration from mouse food pads both at 25 °C and 35 °C. Based on these collective results, we concluded that both TRPV4 and ANO1, likely acting as a complex, are involved in perspiration. We present the new data with Ani9 in the revised Figure 3E, F.

      (4) Figure 3-4 is quite confusing. At 25˚C, no sweating difference was observed between TRPV4 and wt mice (Fig 3A-3D), suggesting both Ach-induced sweating and basal sweating are TRPV4-independent at 25˚C, however, the climbing test was done at 26-27 ˚C and the data showed a climbing deficit in TRPV4 ko mice. How to interpret the data is unclear. 

      Thank you for raising this point. In the iodine and starch reaction experiment, we observed no significant reduction in perspiration in the absence of acetylcholine at 25 °C, which is the same condition as in the climbing test, whereas we detected less perspiration for TRPV4KO mice. In a trial using additional mice, we detected significantly less perspiration under control conditions without acetylcholine at 25 °C, which is consistent with the results of the climbing test. We have added this new data to the revised Figure 3A, B.

      (5) Were there any gender differences associated with sweating in mice? In Figure 3, the mouse number for behavior tests should be at least 5. 

      The TRPV4KO mice reproduced poorly and we were unable to obtain sufficient numbers of male and female mice to determine whether there were gender differences in sweating. However, according to the reviewer’s suggestion, and as mentioned above, we increased the number of experiments to obtain the results shown in the revised Figure 3. We did not a observe a significant difference in sweating with the larger sample size, which supports our conclusions.

      (6) 8- to 21-week-old mice were used in the immunostaining, the time span is too long. 

      Given the difficulty in obtaining sufficient numbers of TRPV4KO mice, we used a somewhat wider age distribution to obtain samples for immunostaining. However, we did not observe age-dependent differences in immunostaining. We reference this point in the revised manuscript.

      (7) The authors used homozygous TRPV4 ko mice for all experiments. What are control mice? Are they littermates of the TRPV4 ko mice? 

      We did not use littermates for our in vivo experiments because the TRPV4KO mice reproduced poorly and the litter sizes were small. However, we did backcross the KO mice to the commercially available wild-type mice more than ten times. As such, we expect that the wild-type and TRPV4KO mice will have similar genetic backgrounds. In addition, we have published multiple studies that have successfully used this method, which we think supports the reliability of our results for experiments involving mice.

      Reviewer #2 (Public Review):

      (1) The coexpression data needs additional controls. In the TRPV4 KO mice, there appears to be staining with the TRPV4 Ab in TRPV4 KO mice below the epidermis. This pattern appears similar to that of the location of the secretory coils of the sweat glands (Fig 1A). Is the co-staining the authors note later in Figure 1 also seen in TRPV4 KOs? This control should be shown, since the KO staining is not convincing that the Ab doesn't have off-target binding. 

      We thank the reviewer for raising these concerns about immunostaining. As the reviewer notes, in the low power image the signals appeared to be weak and punctate signals were present in the basal region of glandular cells. Although we did not identify immunohistochemical conditions that produced no signal, tissue sections from WT mice stained with anti-TRPV4 antibody showed conspicuous apical signals for the glandular cells facing lumen. Meanwhile, TRPV4KO tissues showed no signals at the apical region of the glandular cells, where the TRPV4-ANO1 interaction is expected to occur. We confirmed no trace signals in the TRPV4KO tissues in the immunoblotting.

      (2) Are there any other markers besides CGRP for dark cells in mice to support the conclusion that mouse secretory cells have clear cell and dark cell properties? 

      We did not stain with other dark cell markers. Based on previous studies describing the differences between clear and dark cells in mouse eccrine glands, we think that dark and clear cells cannot be clearly discriminated, as we described in lines 93-96 of the Results. We identified secretory cells using CK8 and dark cells with CGRP, a marker of dark cells in human eccrine glands (Zancanaro et al. 1999 J Anat). Our result showed that CGRP immunostaining could not discriminate between clear and dark cells, which is consistent with a previous report showing that mouse secretory cells were assumed to be undifferentiated and primitive based on electron microscopic observation (Kurosumi et al. 1970 Arch Histol Jap).

      (3) The authors utilize menthol (as a cooling stimulus) in several experiments. In the discussion, they interpret the effect of menthol as potentially disrupting TRPV4-ANO1 interactions independent of TRPM8. Yet, the role of TRPM8, such as in TRPM8 KO mice, is not evaluated in this study.

      We performed the iodine and starch reaction experiments with TRPM8KO mice. In the TRPM8KO mice, the sweat spots did not differ from those seen for WT mice (p=0.63, t-test), and there was also a significant reduction in sweating with menthol treatment following acetylcholine stimulation that was similar to that seen for WT mice. These results would rule out the involvement of TRPM8 in a menthol-induced reduction in sweating. We have included this data in the revised Figure 3D.

      (4) Along those lines, the authors suggest that menthol inhibits eccrine function, which might lead to a cooling sensation. But isn't the cooling sensation of sweating from evaporative cooling? In which case, inhibiting eccrine function may actually impair cooling sensations.

      Menthol has a non-specific effect that activates TRPM8, TRPV3 and TRPA1, and inhibits TRPV1, TRPV4 and ANO1. Therefore, we did not carry out a climbing test with menthol in part because menthol-dependent TRPA1 activation decreased the propensity of the mice to climb. As the reviewer notes, TRPM8 activation following topical application of menthol may cause a cooling sensation elicited in sensory neurons beneath the skin. However, the comfortable cooling sensation could also be caused in part by decreased sweating. The relationship between a comfortable cooling sensation and less perspiration following menthol application may be difficult to determine, and we have mentioned this in the updated Discussion.

      (5) The climbing assay is interesting and compelling. The authors note performing this under certain temperature and humidity conditions. Presumably, there is an optimal level of skin moisture, where skin that is too dry has less traction, but skin that is too wet may also have less traction. It would bolster this section of the study to perform this assay under hot conditions (perhaps TRPV4 KO mice, with impaired perspiration, would outperform WT mice with too much sweating?), or with pharmacologic intervention using TRPV4 agonists or antagonists to more rigorously evaluate whether this model correlates to TRPV4 function in the setting of different levels of perspiration.

      We thank the reviewer for this suggestion. Upon detecting the involvement of TRPV4/ANO1 interaction in perspiration, we considered different behavioral analyses that can be performed to demonstrate whether the TRPV4/ANO1 interactions are involved in perspiration. As the reviewer suggested, there should be an optimal level of sweating. Therefore, we first set the room temperature at 26-27 ˚C and humidity at 35-50%. To our knowledge, this is the first demonstration of temperature-dependent sweating of mouse foot pads. In humans, palm sweating is often referred to as psychotic sweating that is known to be regulated by sympathetic nerve activity. Here we tested whether foot pad sweating might be related to friction force wherein sufficient amounts of sweating could increase the friction force and in turn increase the success rate for the climbing test using a vinyl-covered slippery slope that was selected based on several trials to determine the optimal surface material and slope angles. As the reviewer suggests, the success rates could be affected by multiple factors, and hot temperatures likely induce more sweating that could increase the success rates in the climbing test. We will need to carry out additional experiments that are beyond the scope of this study to examine these temperature-dependent effects. Generally, sweating is regulated by sympathetic nerve activity that occurs in response to increased brain neuron excitation. However, here we raise for the first time the possibility that sweating might be regulated by local temperature sensation mediated through TRPV4 that may be effective for fine-tuning of perspiration activity. We have updated the Discussion to reference this possibility.

      (6) There are other studies (PMID 33085914, PMID 31216445) that have examined the role of TRPV4 in regulating perspiration. The presence of TRPV4 in eccrine glands is not a novel finding. Moreover, these studies noted that TRPV4 was not critical in regulating sweating in human subjects. These prior studies are in contradiction to the mouse data and the correlation to human anhidrotic skin in the present study. Neither of these studies is cited or discussed by the authors, but they should be. 

      We thank the reviewer for referencing these other studies concerning the possible involvement of TRPV4 in perspiration in humans. These studies focused on the vasodilating effects of TRPV4 and drew the conclusion that TRPV4 is not involved in sweating in humans, which is in contrast to our data for mice and humans. Multiple factors could explain the apparent difference between the two studies. For example, the parameters they examined differed from ours in that we assessed patients with AIGA, whereas the previous studies involved healthy volunteers. We have updated the Discussion to note the difference in the results of our and previous studies.   

      Reviewer #3 (Public Review):

      (1) Figure 2: The calcium imaging-based approach shows average traces from 6 cells per genotype, but it was unclear if all acinar cells tested with this technique demonstrated TRPV4-mediated calcium influx, or if only a subset was presented.

      “n = 6” does not indicate the number of cells, but rather 6 independent experiments that each had over 20 ROIs of sweat glands. We have clarified this point in the updated figure legend.

      (2) Figure 4: The climbing behavioral test shows a significant reduction in climbing success rate in TRPV4-deficient mice. The authors ascribe this to a lack of hind paw 'traction' due to deficiencies in hind paw perspiration, but important controls and evidence that could rule out other potential confounds were not provided or cited. 

      As noted in our response to Comment 5 made by Reviewer #2, we spent considerable time identifying optimal conditions that would delineate success rates in the climbing experiments. We are confident that TRPV4KO mice had significantly lower success rates than WT mice, but there are various factors that could affect the experimental outcomes. We reference these factors in the updated Discussion.

      (3) In general, the results support the authors' claims that TRPV4 activity is a necessary component of sweat gland secretion, which may have important implications for controlling perspiration as well as secretion from other glands where TRPV4 may be expressed. 

      As described above, the results we obtained in the climbing test can be affected by various factors. However, based on the consistency of the results obtained for the climbing test and the iodine and starch reaction assay, we think that our interpretation is correct. In terms of the involvement of TRPV4/ANO1 interactions in fluid secretion, we previously reported that the TRPV4/ANO1 complex is involved in cerebrospinal fluid secretion in the mouse choroid plexus (FASEB J. 2014) and in saliva and tear secretion in mouse salivary and lacrimal glands (FASEB J. 2018). Together, these findings suggest that this mechanism is common to water efflux from exocrine glands.

      Reviewer #1 (Recommendations For The Authors):

      (1) An exocrine gland-specific trpv4 knockout mouse should be used, as TRPV4 is also expressed by muscles, global knockout TRPV4 may affect the TRPV4-dependent muscle strength and reduce the climbing ability in mice. 

      As the reviewer suggests, use of mice with TRPV4 knockout specific to exocrine glands would be preferable to mice having global TRPV4 knockout given that TRPV4 is expressed in multiple tissues. We agree with this suggestion, but we do not currently have such mice in hand. However, as mentioned above, we have reported the involvement of theTRPV4/ANO1 interaction in cerebrospinal fluid secretion from the choroid plexus in mice (FASEB J. 28: 2238-2248, 2014), as well as saliva and tear secretion in mouse salivary and lacrimal glands (FASEB J. 32: 1841-1854, 2018.), suggesting that the TRPV4/ANO1 interaction could be widely involved in exocrine gland functions that involve water movement. We have updated the Discussion to reference this point.  

      (2) The authors showed Calcium imaging data that Menthol inhibits TRPV4-dependent calcium influx. However, it is well known that menthol induces the sensation of cooling by activating TRPM8. More evidence, including patch clamp recordings, should be done to verify the inhibition effects of menthol on TRPV4 and ANO1. Moreover, Fig 3E-3F could only suggest that menthol-induced cooling sensation may affect sweating but not the inhibition effect of menthol on TRPV4 and ANO1 channels. 

      We agree that more evidence including patch-clamp recordings can verify the inhibitory effects of menthol on TRPV4 and ANO1. We did not include such experiments here since we previously showed that menthol and related agents indeed inhibit TRPV4- and ANO1-mediated currents (Sci. Rep. 7: 43132, 2017). We now cite this paper in the revised version.

      (3) Excepting the climbing test, are there any other better models to asses the sweating-related behaviors? 

      When we detected the involvement of TRPV4/ANO1 interactions in perspiration, we considered different types of behavioral analyses that could be used to demonstrate TRPV4/ANO1-dependent perspiration. We think that the climbing experiment is the best test, particularly since foot pads are one of the few regions on mice that is not covered by fur and thus amenable to evaluation of perspiration using an iodine and starch test.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was confused by a section in the introduction on lines 59-60: How does Cl- efflux lead to the formation of a physical complex in cells with high intracellular Cl-? What is the physical complex? This seems like several disparate concepts combined together, which need to be clarified.

      We apologize for the incomplete descriptions of several of our previous works. We have amended the Introduction section in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) TRPV4 is expressed by multiple other cell types in the skin (keratinocytes, macrophages etc.) which may have an impact on peripheral sensory function. Is there evidence that TRPV4-deficient animals have relatively normal sensory acuity and/or proprioception? Such evidence would lend more credibility to the reported findings in the climbing test. 

      As the reviewer points out, TRPV4 is expressed by multiple other cell types in the skin. To date we have found that TRPV4KO mice show no differences in sensory functions compared to WT mice. Whether TRPV4 is involved in proprioception is unclear, based on both our own observation and those that appear in the literature, although TRPV4 is clearly activated by mechanical stimuli. We previously compared the mechanical sensitivity of TRPV4 and Piezo1 in bladder epithelial cells, and found that Piezo 1 shows much higher sensitivity relative to TRPV4 (J. Biol. Chem. 289: 16565-16575, 2014), which is consistent with the involvement of Piezo1, rather than TRPV4, in proprioception. Although TRPV4 is reported to be expressed in sensory neurons, we did not detect TRPV4-mediated responses in isolated rat and mouse DRG neurons, suggesting that TRPV4-positive sensory neurons are relatively rare.

      (2) The methods section refers to loading entire sweat glands with Fura-2 dye for calcium imaging, but the figure legend refers to sweat gland acinar cells. Resolving this ambiguity would help readers to interpret the data. 

      We apologize for this error and have made an appropriate correction in the revised manuscript.

      (3) Alternatively, could acute intraplantar injection of a TRPV4 antagonist (e.g. GSK205) in wild-type mice phenocopy the TRPV4-knockout mouse deficits, or could normal climbing behavior be restored in the TRPV4 knockout by adding artificial perspiration to their hindpaws?

      We thank the reviewer for raising this interesting possibility and suggesting use of TRPV4 agonists or antagonists in the climbing tests. We agree that results of such an experiment would support the involvement of TRPV4 in sweating. We tried to do such experiments using injection of TRPV4 regulators into mouse hindpaws. However, the injections themselves appeared to impact climbing ability, perhaps in part due to painful sensations associated with the injection. Similarly, menthol injection appeared to reduce climbing activity, likely through pain sensations associated with TRPA1 activation. As such, we did not pursue these experiments.

    1. Author response:

      Reviewer 1:

      A limit of the paper is that the biological mechanisms by which intracellular mechanics is modulated (e.g. among cell types) remains unexplored and only briefly discussed. Yet this limit is greatly offset by the rigor of the approach.

      We thank the reviewer for the valuable feedback. The question regarding the biological mechanisms responsible for the different mechanical properties is, indeed, a highly important and interesting issue. In line with the reviewer, we consider this so important that it requires an extra, dedicated research focus, which is far beyond the scope of this article. By introducing the concept of the mechanical fingerprint, we provide in this work the framework to systematically investigate biological mechanisms but also the functional relevance of the intracellular mechanical properties in future studies. In the revised manuscript, we’ll elaborate on the discussion.

      Reviewer 2:

      The most difficult part of the method is the part with actin polymerization inhibition with cytochalasin B. The data shows that viscoelastic parameters as well as active energy parameters are unaffected by cytochalasin B. It is reasonable to expect that elasticity will reduce and fluidity will increase upon application of such a drug. The stiffness-reducing effect was observed only when CB was used with nocodazole most likely because of phagocytosis of the bead, which is governed by microtubule. The use of other actin-depolymerizing drugs such as latrunculin A would be needed to test actin’s role in mechanical fingerprints. If actin’s role is only explained by accompanying microtubule inhibition, it is not a convenient system to directly test the mechano-adaptation process.

      We thank the reviewer for the time and the instructive feedback. Our finding that actin depolymerization has no effect on the intracellular mechanics may appear unfamiliar, as many rheological studies performed on the cell’s cortex highlight the importance of actin on the mechanical properties of the whole cell. However, as the actin network is reported to be very sparse away from the cortex it is not impossible that the mechanical properties may be dominated by other structures in the cytoplasm. Indeed, our findings are consisted with other studies that see no strong effect of actin depolymerization on the interphase intracellular mechanics (e.g. https://doi.org/10.1016/j.bpj.2023.04.011 or https://doi.org/10.1038/s41567-021-01368-z). Still, we fully agree with the reviewers that this is an important point. In a revised version we aim to investigate the effect of other actin-depolymerizing drugs and will try to perform immunostaining to visualize and further illuminate the potential compensation mechanism between actin and MT.

      Depolymerization of MT with nocodazole did not reduce the solid-like property A. Adding discussion and comparison with other papers in the literature using nocodazole will be helpful in understanding why.

      Again, we agree with the reviewer and propose to further study this point by performing additional immunostainings and by elaborating on the discussion, also including the results of other studies.

      Reviewer 3:

      The importance of the mechanical fingerprint is diluted due to some missing controls needed for biological relevance.

      We thank the reviewer for his valuable time and feedback. This comment is in line with the point already raised by reviewer 1 and highlights the important question of how the intracellular mechanical properties are related to the actual cell function. We fully agree with the reviewers that at this point we can only report on differences, but cannot claim a biological function that is depending on the fingerprint. Although we think the alignment between function and the mechanical fingerprints allows the hypothesis that the biological system is tuning its mechanical properties for a specific function, we do not want to make any claim in this direction at the current state of our research. Hence, to answer these intriguing questions, carefully designed control experiments are required, as pointed out by the reviewer. However, this direction is not the scope of this manuscript. Here, we establish the tools we’ll use in future studies to address these highly relevant questions. Therefore, we propose to discuss these important future directions in a revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kroll et al. conduct an in-depth behavioral analysis of F0 knockouts of 4 genes associated with late-onset Alzheimer's Disease (AD), together with 3 genes associated with early- onset AD. Kroll and colleagues developed a web application (ZOLTAR) to compare sleep-associated traits between genetic mutants with those obtained from a panel of small molecules to promote the identification of affected pathways and potential therapeutic interventions. The authors make a set of potentially important findings vis-à-vis the relationship between AD-associated genes and sleep. First, they find that loss-of-function in late-onset AD genes universally results in nighttime sleep loss, consistent with the well-supported hypothesis that sleep disruption contributes to Alzheimer's-related pathologies. psen-1, an early-onset associated AD gene, which the authors find is principally responsible for the generation of AB40 and AB42 in zebrafish, also shows a slight increase in activity at night and slight decreases in nighttime sleep. Conversely, psen-2 mutations increase daytime sleep, while appa/appb mutations have no impact on sleep. Finally, using ZOLTAR, the authors identify serotonin receptor activity as potentially disrupted in sorl1 mutants, while betamethasone is identified as a potential therapeutic to promote reversal of psen2 knockout-associated phenotypes.

      This is a highly innovative and thorough study, yet a handful of key questions remain. First, are nighttime sleep loss phenotypes observed in all knockouts for late-onset AD genes in the larval zebrafish a valid proxy for AD risk?

      We cannot say, but it is an interesting question. We selected the four late-onset Alzheimer’s risk genes (APOE, CD2AP, CLU, SORL1) based on human genetics data and brain expression in zebrafish larvae, not based on their likelihood to modify sleep behaviour, which we could have tried by searching for overlaps with GWAS of sleep phenotypes, for example. Consequently, we find it remarkable that all four of these genes caused a night-time sleep phenotype when mutated. We also find it reassuring that knockout of appa/appb and psen2 did not cause a night-time sleep phenotype, which largely excludes the possibility that the phenotype is a technical artefact (e.g. caused by the F0 knockout method) or a property of every gene expressed in the larval brain.

      Having said that, it could still be a coincidence, rather than a special property of genes associated with late-onset AD. In addition to testing additional late-onset Alzheimer’s risk genes, the ideal way to answer this question would be to test in parallel a random set of genes expressed in the brain at this stage of development. From this random set, one could estimate the proportion of genes that cause a night-time sleep phenotype when mutated. One could then use that information to test whether late-onset Alzheimer’s risk genes are indeed enriched for genes that cause a night-time sleep phenotype when mutated.

      For those mutants that cause nighttime sleep disturbances, do these phenotypes share a common underlying pathway? e.g. Do 5-HT reuptake inhibitors promote sleep across all 4 late-onset genes in addition to psen1? Can 5-HT reuptake inhibitors reverse other AD-related pathologies in zebrafish? Can compounds be identified that have a common behavioral fingerprint across all or multiple AD risk genes? Do these modify sleep phenotypes?

      To attempt to answer these questions, we used ZOLTAR to generate predictions for all the knockout behavioural fingerprints presented in the study, in the same way as for sorl1 in Fig. 5 and Fig. 5–suppl. 1. Here are the indications, targets, and KEGG pathways which are shared by the largest number of knockouts:

      – Four indications are shared by 4/7 knockouts: “mydriasis” (dilated pupils, significant for psen1, apoea/apoeb, cd2ap, clu); “fragile X syndrome” (psen1, apoea/apoeb, cd2ap, sorl1), “insomnia” (psen2, apoea/apoeb, cd2ap, sorl1); “malignant essential hypertension” (appa/appb, psen1, apoea/apoeb, cd2ap).

      – Two targets are shared by 5/7 knockouts: “glycogen synthase kinase−3 alpha” (psen1, apoeab, cd2ap, clu, sorl1) and “neuronal acetylcholine receptor beta−2” (appa/appb, psen1, apoeab, cd2ap, clu).

      – Two KEGG pathways are shared by 5/7 knockouts: “cholinergic synapse” (psen1, apoea/apoeb, cd2ap, clu, sorl1) and “nitrogen metabolism” (appa/appb, psen1, psen2, cd2ap, clu).

      As reminder, we hypothesised that loss of Sorl1 affected serotonin signalling based on the following annotations being significant: indication “depression”, target “serotonin transporter”, and KEGG pathway “serotonergic synapse”. All three are also significant for psen2 knockouts, but none others. ZOLTAR therefore does not predict serotonin signalling to be a major theme common to all mutants with a night-time sleep loss phenotype.

      While perhaps not surprising, we find reassuring that insomnia appears in the indications shared by the largest number of knockouts. apoea/apoeb, cd2ap, sorl1 also happen to be the knockouts with the largest loss in night-time sleep.

      Particularly interesting is cholinergic signalling appearing in the most common targets and KEGG pathways. Acetylcholine signalling is a major theme in research on Alzheimer’s disease. For example, the first four drugs ever approved by the FDA to treat Alzheimer’s disease were acetylcholinesterase inhibitors, which increase acetylcholine signalling by preventing its breakdown by acetylcholinesterase. These drugs are generally considered only to treat symptoms and not modify disease course, but this view has been called into question (Munoz-Torrero, 2008; Relkin, 2007). If, as ZOLTAR suggests, mutations in several Alzheimer’s risk genes affect cholinergic signalling early in development, this would point to a potential causal role of cholinergic disruption in Alzheimer’s disease.

      We see that literature also exists on the involvement of glycogen synthase kinase-3 in AD (Lauretti et al., 2020). We plan to explore further these predictions in a future study.

      Finally, the web- based platform presented could be expanded to facilitate comparison of other behavioral phenotypes, including stimulus-evoked behaviors.

      Yes, absolutely. The behavioural dataset we used (Rihel et al., 2010) did not measure other stimuli than day/night light transitions, but the “SauronX” platform and dataset (Myers-Turnbull et al., 2022) seems particularly well suited for this. To provide some context, we and collaborators have occasionally used the dataset by Rihel et al. (2010) to generate hypotheses or find candidate drugs that reverse a behavioural phenotype measured in the sleep/wake assay (Ashlin et al., 2018; Hoffman et al., 2016). The present work was the occasion to enable a wider and more intuitive use of this dataset through the ZOLTAR app, which has already proven successful. Future versions of ZOLTAR will seek to incorporate larger drug datasets using more types of measurements.

      Finally, the authors propose but do not test the hypothesis that sorl1 might regulate localization/surface expression of 5-HT2 receptors. This could provide exciting / more convincing mechanistic support for the assertion that serotonin signaling is disrupted upon loss of AD-associated genes.

      5-HT receptor type 4a is another candidate as it was shown to interact with sorting nexin 27, a subunit of retromer (Joubert et al., 2004). We see that antibodies against human 5-HT receptor type 2 and 4a exist; whether they would work in zebrafish remains to be tested, and in our experience, the availability of antibodies suitable for immunohistochemistry in the zebrafish is a serious experimental roadblock.

      Despite these important considerations, this study provides a valuable platform for high-throughput analysis of sleep phenotypes and correlation with small-molecule-induced sleep phenotypes.

      Strengths:

      - Provides a useful platform for comparison of sleep phenotypes across genotypes/drug manipulations.

      - Presents convincing evidence that nighttime sleep is disrupted in mutants for multiple late-onset AD-related genes.

      - Provides potential mechanistic insights for how AD-related genes might impact sleep and identifies a few drugs that modify their identified phenotypes

      Weaknesses:

      - Exploration of potential mechanisms for serotonin disruption in sorl1 mutants is limited.

      - The pipeline developed can only be used to examine sleep-related / spontaneous movement phenotypes and stimulus-evoked behaviors are not examined.

      - Comparisons between mutants/exploration of commonly affected pathways are limited.

      Thank you for these excellent suggestions, please see our answers above.

      Reviewer #2 (Public Review):

      Summary:

      This work delineates the larval zebrafish behavioral phenotypes caused by the F0 knockout of several important genes that increase the risk for Alzheimer's disease. Using behavioral pharmacology, comparing the behavioral fingerprint of previously assayed molecules to the newly generated knockout data, compounds were discovered that impacted larval movement in ways that suggest interaction with or recovery of disrupted mechanisms.

      Strengths:

      This is a well-written manuscript that uses newly developed analysis methods to present the findings in a clear, high-quality way. The addition of an extensive behavioral analysis pipeline is of value to the field of zebrafish neuroscience and will be particularly helpful for researchers who prefer the R programming language. Even the behavioral profiling of these AD risk genes, regardless of the pharmacology aspect, is an important contribution. The recovery of most behavioral parameters in the psen2 knockout with betamethasone, predicted by comparing fingerprints, is an exciting demonstration of the approach. The hypotheses generated by this work are important stepping stones to future studies uncovering the molecular basis of the proposed gene-drug interactions and discovering novel therapeutics to treat AD or co-occurring conditions such as sleep disturbance.

      Weaknesses:

      - The overarching concept of the work is that comparing behavioral fingerprints can align genes and molecules with similarly disrupted molecular pathways. While the recovery of the psen2 phenotypes by one molecule with the opposite phenotype is interesting, as are previous studies that show similar behaviorally-based recoveries, the underlying assumption that normalizing the larval movement normalizes the mechanism still lacks substantial support. There are many ways that a reduction in movement bouts could be returned to baseline that are unrelated to the root cause of the genetically driven phenotype. An ideal experiment would be to thoroughly characterize a mutant, such as by identifying a missing population of neurons, and use this approach to find a small molecule that rescues both behavior and the cellular phenotype. If the connection to serotonin in the sorl1 was more complete, for example, the overarching idea would be more compelling.

      Thank you for this cogent criticism.

      On the first point, we were careful not to claim that betamethasone normalises the molecular/cellular mechanism that causes the psen2 behavioural phenotype. Having said that, yes, to a certain extent that would be the hope of the approach. As you say, every compound which normalises the behavioural fingerprint will not normalise the underlying mechanism, but the opposite seems true: every compound that normalises the underlying mechanism should also normalise the behavioural fingerprint. We think this logic makes the “behaviour-first” approach innovative and interesting. The logic is to discover compounds that normalise the behavioural phenotype first, only subsequently test whether they also normalise the molecular mechanism, akin to testing first whether a drug resolves the symptoms before testing whether it actually modifies disease course. While in practice testing thousands of drugs in sufficient sample sizes and replicates on a mutant line is challenging, the dataset queried through ZOLTAR provides a potential shortcut by shortlisting in silico compounds that have the opposite effect on behaviour.

      You mention a “reduction in movement bouts” but note here that the number of behavioural parameters tested is key to our argument. To take the two extremes, say the only behavioural parameter we measured in psen2 knockout larvae was time active during the day, then, yes, any stimulant used at the right concentration could probably normalise the phenotype. In this situation, claiming that the stimulant is likely to also normalise the underlying mechanism, or even that it is a genuine “phenotypic rescue”, would not be convincing. Conversely, say we were measuring thousands of behavioural parameters under various stimuli, such as swimming speed, position in the well, bout usage, tail movements, and eye angles, it seems almost impossible for a compound to rescue most parameters without also normalising the underlying mechanism. The present approach is somewhere in-between: ZOLTAR uses six behavioural parameters for prediction (e.g. Fig 6a), but all 17 parameters calculated by FramebyFrame can be used to assess rescue during a subsequent experiment (Fig. 6c). For both, splitting each parameter in day and night increases the resolution of the approach, which partly answers your criticism. For example, betamethasone rescued the day-time hypoactivity without causing night-time hyperactivity, so we are not making the “straw man argument” explained above of using any broad stimulant to rescue the hypoactivity phenotype.

      Furthermore, for diseases where the behavioural defect is the primary concern, such as autism or bipolar disorder, perhaps this behaviour-first approach is all that is needed, and whether or not the compound precisely rescues the underlying mechanism is somewhat secondary. The use of lithium to prevent manic episodes in bipolar disorder is a good example. It was initially tested because mania was thought to be caused by excess uric acid and lithium can dissolve uric acid (Mitchell and Hadzi-Pavlovic, 2000). The theory is now discredited, but lithium continues to be used without a precise understanding of its mode of action. In this example, behavioural rescue alone, with tolerable secondary effects, is sufficient to be beneficial to patients, and whether it modulates the correct causal pathway is secondary.

      On the second point, we agree that testing first ZOLTAR on a mutant for which we have a fairly good understanding of the mechanism causing the behavioural phenotype could have been a productive approach. Note, however, that examples already exist in the literature. First, Hoffman et al. (2016) found that drugs generating behavioural fingerprints that positively correlate with the cntnap2a/cntnap2b double knockout fingerprint are enriched with NMDA and GABA receptor antagonists. In experiments analogous to our citalopram treatment (Fig. 5c,d), cntnap2a/cntnap2b knockout larvae were found to be overly sensitive to the NMDA receptor antagonist MK-801 and the GABAA receptor antagonist pentylenetetrazol (PTZ). Among other drugs tested, zolpidem, a GABAA receptor agonist, caused opposite effects on wild-type and cntnap2a/cntnap2b knockout larvae. Knockout larvae also had fewer GABAergic neurons in the forebrain. Second, Ashlin et al. (2018) found that the fingerprint of pitpnc1a knockout larvae clustered with anti-inflammatory compounds. Flumethasone, an anti-inflammatory corticosteroid, caused a lower increase in activity when added to knockout larvae compared to wild-type larvae. While these studies did not use precisely the same analysis that ZOLTAR runs, they used the same rationale and behavioural dataset to make these predictions (Rihel et al., 2010), which shows that approaches like ZOLTAR can point to causal processes.

      Related to your next point, we may reduce the discussion on sorl1 and serotonin and add some of the present arguments instead, depending on the results from  testing a second SSRI (see next point).

      - The behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram is based on a small number of animals. The KO Euclidean distance measure is also more spread out than for the other datasets, and it looks like only five or so fish are driving the group difference. It also appears as though the numbers were also from two injection series. While there is nothing obviously wrong with the data, I would feel more comfortable if such a strong statement of a result from a relatively subtle phenotype were backed up by a higher N or a stable line. It is not impossible that the observed difference is an experimental fluke. If something obvious had emerged through the HCR, that would have also supported the conclusions. As it stands, if no more experiments are done to bolster the claim, the confidence in the strength of the link to serotonin should be reduced (possibly putting the entire section in the supplement and modifying the discussion). The discussion section about serotonin and AD is interesting, but I think that it is excessive without additional evidence.

      We mostly agree with this criticism. One could interpret the larger spread of the data for sorl1 larvae treated with 10 µM citalopram as evidence that the knockout larvae do indeed react differently to the drug at this dose. However, the result indeed does not survive removing the top 5 (p = 0.87) or top 3 (p = 0.18) sorl1 larvae.

      Given that the HCR did not reveal anything striking, we agree with you that too much of our argument relies on this result being robust. As you and reviewer #3 suggest, we plan on repeating this experiment with a different serotonin reuptake inhibitor (SSRI). If the other SSRI also shows a differential effect, this should strengthen the claim that ZOLTAR correctly predicted serotonin signalling as being affected by the loss of Sorl1, even if we did not discover the molecular mechanism.

      - The authors suggest two hypotheses for the behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram. While the first is tested, and found to not be supported, the second is not tested at all ("Ruling out the first hypothesis, sorl1 knockouts may react excessively to a given spike in serotonin." and "Second, sorl1 knockouts may be overly sensitive to serotonin itself because post-synaptic neurons have higher levels of serotonin receptors."). Assuming that the finding is robust, there are probably other reasons why the mutants could have a different sensitivity to this molecule. However, if this particular one is going to be mentioned, it is surprising that it was not tested alongside the first hypothesis. This work could proceed without a complete explanation, but additional discussion of the possibilities would be helpful or why the second hypothesis was not tested.

      There are no strong scientific reasons why this hypothesis was not tested. The lead author (F Kroll) moved to a different lab and country so the project was finalised at that time. We do not plan on testing this hypothesis at this stage. However, we will adapt the wording to make it clear this is one possible alternative hypothesis which could be tested in the future, rather than the only alternative.

      - The authors claim that "all four genes produced a fairly consistent phenotype at night". While it is interesting that this result arose in the different lines, the second clutch for some genes did not replicate as well as others. I think the findings are compelling, regardless, but the sometimes missing replicability should be discussed. I wonder if the F0 strategy adds noise to the results and if clean null lines would yield stronger phenotypes. Please discuss this possibility, or others, in regard to the variability in some phenotypes.

      For the first part of this point, please see below our answer to Reviewer #3, point (2) c.

      Regarding the F0 strategy potentially adding variability, it is an interesting question which we tested in a larger dataset of behavioural recordings from F0 and stable knockouts for the same genes (unpublished). In summary, the F0 knockout method does not increase clutch-to-clutch or larva-to-larva variability in the assay. F0 knockout experiments found many more significant parameters and larger effect sizes than stable knockout experiments, but this difference could largely be explained by the larger sample sizes of F0 knockout experiments. In fact, larger sample sizes within individual clutches appears to be a major advantage of the F0 knockout approach over in-cross of heterozygous knockout animals as it increases sensitivity of the assay without causing substantial variability. We plan to report in more details on this analysis in a separate paper as we think it would dilute the focus of the present work.

      - In this work, the knockout of appa/appb is included. While APP is a well-known risk gene, there is no clear justification for making a knockout model. It is well known that the upregulation of app is the driver of Alzheimer's, not downregulation. The authors even indicate an expectation that it could be similar to the other knockouts ("Moreover, the behavioural phenotypes of appa/appb and psen1 knockout larvae had little overlap while they presumably both resulted in the loss of Aβ." and "Comparing with early-onset genes, psen1 knockouts had similar night-time phenotypes, but loss of psen2 or appa/appb had no effect on night-time sleep."). There is no reason to expect similarity between appa/appb and psen1/2. I understand that the app knockouts could unveil interesting early neurodevelopmental roles, but the manuscript needs to be clarified that any findings could be the opposite of expectation in AD.

      On “there is no reason to expect similarity […]”, we disagree. Knockout of appa/appb and knockout psen1 will both result in loss of Aβ (appa/appb encode Aβ and psen1 cleaves Appa/Appb to release Aβ, cf. Fig. 3e). Consequently, a phenotype caused by the loss of Aβ, or possibly other Appa/Appb cleavage products, should logically be found in both appa/appb and psen1 knockouts.

      On “it is well known that the upregulation of APP is the driver of Alzheimer’s, not downregulation”; we of course agree. Among others, the examples of Down syndrome, APP duplication (Sleegers et al., 2006), or mouse models overexpressing human APP show definitely that overexpression of APP is sufficient to cause AD. Having said that, we would not be so quick in dismissing APP knockout as potentially relevant to understanding of Alzheimer’s disease. Loss of soluble Aβ due to aggregation could contribute to pathology (Espay et al., 2023). Without getting too much into this intricate debate, links between levels of Aβ and risk of disease are often counter-intuitive too. For example, out of 138 PSEN1 mutations screened in vitro, 104 reduced total Aβ production and 11 even seemingly abolished the production of both Aβ40 and Aβ42 (Sun et al., 2017). In short, loss of soluble Aβ occurs in both AD and in our appa/appb knockout larvae, but the ideal approach would be to study zebrafish larvae with an in-frame deletion in the Aβ sequence within appa/appb.

      We will adapt the language to address your point. We would not want to imply, for example, that the absence of a night-time sleep phenotype for appa/appb is contradictory to the body of literature showing links between Aβ and sleep, including in zebrafish (Özcan et al., 2020). As you say, our experiment tested loss of App, including Aβ, while the literature typically reports on overexpression of APP, as in APP/PSEN1-overexpressing mice (Jagirdar et al., 2021).

      Reviewer #3 (Public Review):

      In this manuscript by Kroll and colleagues, the authors describe combining behavioral pharmacology with sleep profiling to predict disease and potential treatment pathways at play in AD. AD is used here as a case study, but the approaches detailed can be used for other genetic screens related to normal or pathological states for which sleep/arousal is relevant. The data are for the most part convincing, although generally the phenotypes are relatively small and there are no major new mechanistic insights. Nonetheless, the approaches are certainly of broad interest and the data are comprehensive and detailed.

      A notable weakness is the introduction, which overly generalizes numerous concepts and fails to provide the necessary background to set the stage for the data.

      Major points

      (1) The authors should spend more time explaining what they see as the meaning of the large number of behavioral parameters assayed and specifically what they tell readers about the biology of the animal. Many are hard to understand--e.g. a "slope" parameter.

      We agree that some parameters do not tell something intuitive about the biology of the animal. It would be easy to speculate. For example, the “activity slope” parameter may indicate how quickly the animal becomes tired over the course of the day. On the other hand, fractal dimension describes the “roughness/smoothness” of the larva’s activity trace (Fig. 2–suppl. 1a); but it is not obvious how to translate this into information about the physiology of the animal. We do not see this as an issue though. While some parameters do provide intuitive information about the animal’s behaviour (e.g. sleep duration or sunset startle as a measure of startle response), the benefit of having a large number of behavioural parameters is to compare behavioural fingerprints and assess rescue of the behavioural phenotype by small molecules (Fig. 6c). For this purpose, the more parameters the better. The “MoSeq” approach from Wiltschko et al., 2020 is a good example from literature that inspired our own Fig. 6c. While some of the “behavioural syllables” may be intuitive (e.g. running or grooming), it is probably pointless to try to explain the ‘meaning’ of the “small left turn in place with head motion” syllable (Wiltschko et al., 2020). Nonetheless, this syllable was useful to assess whether a drug specifically treats the behavioural phenotype under study without causing too many side effects. Unfortunately, ZOLTAR has to reduce the FramebyFrame fingerprint (17 parameters) to just six parameters to compare it to the behavioural dataset from Rihel et al., 2010, but here, more parameters would almost certainly translate into better predictions too, regardless of their intuitiveness.

      It is true however that we do not give much information on how some of the less intuitive parameters, such as activity slope or fractal dimension, are calculated or what they describe about the dataset (e.g. roughness/smoothness for fractal dimension). We will improve this in our revised version.

      (2) Because in the end the authors did not screen that many lines, it would increase confidence in the phenotypes to provide more validation of KO specificity. Some suggestions include:

      a. The authors cite a psen1 and psen2 germline mutant lines. Can these be tested in the FramebyFrame R analysis? Do they phenocopy F0 KO larvae?

      We unfortunately do not have those lines. We investigated the availability of importing a psen2 knockout line from abroad, but the process of shipping live animals is becoming more and more cost and time prohibitive. However, we observed the same pigmentation phenotype for psen2 knockouts as reported by Jiang et al., 2018, which is at least a partial confirmation of phenocopying a loss of function stable mutant. 

      b. psen2KO is one of the larger centerpieces of the paper. The authors should present more compelling evidence that animals are truly functionally null. Without this, how do we interpret their phenotypes?

      We disagree that there should be significant doubt about these mutants being truly functionally null,  given the high mutation rate and presence of the expected pigmentation phenotype (Jiang et al., 2018, Fig. 3f and Fig. 3–suppl. 2). The psen2 F0 knockouts were virtually 100% mutated at three exons across the gene (mutation rates were locus 1: 100 ± 0%; locus 2: 99.99 ± 0.06%; locus 3: 99.85 ± 0.24%). Additionally, two of the three mutated exons had particularly high rates of frameshift mutations (locus 1: 97 ± 5%; locus 2: 88 ± 17% frameshift mutation rate). It is virtually impossible that a functional protein is translated given this burden of frameshift mutations. Phenotypically, in addition to the pigmentation defect, double psen1/psen2 F0 knockout larvae had curved tails, the same phenotype as caused by a high dose of the γ-secretase inhibitor DAPT (Yang et al., 2008). These double F0 knockouts were lethal, while knockout of psen1 or psen2 alone did not cause obvious morphological defects. Evidently, most larvae must have been psen2 null mutants in this experiment, otherwise functional Psen2 would have prevented early lethality.

      Translation of zebrafish psen2 can start at downstream start codons if the first exon has a frameshift mutation, generating a seemingly functional Psen2 missing the N-terminus (Jiang et al., 2020). Zebrafish homozygous for this early frameshift mutation had normal pigmentation, showing it is a reliable marker of Psen2 function even when it is mutated. This mechanism is not a concern here as the alternative start codons are still upstream of two of the three mutated exons (the alternative start codons discovered by Jiang et al., 2020 are in exon 2 and 3, but we targeted exon 3, exon 4, and exon 6).

      We understand that the zebrafish community may be cautious about F0 phenotyping compared to stably generated mutants. As mentioned to Reviewer 2, we are planning to assemble a paper that expressly examines F0s vs. stable mutants to allay some of these concerns. We would also suggest that our current manuscript, which combines CRISPR-F0 rapid screening with in silico pharmacological predictions, ultimately represents a first step in characterizing the functions of genes.

      c. Related to the above, for cd2AP and sorl1 KO, some of the effect sizes seem to be driven by one clutch and not the other. In other words, great clutch-to-clutch variability. Should the authors increase the number of clutches assayed?

      Correct, there is great clutch-to-clutch variability in this behavioural assay. This is not specific to our experiments. Even within the same strain, wild-type larvae from different clutches (i.e. non-siblings) behave differently (Joo et al., 2021). This is why it is essential to compare behavioural phenotypes within individual clutches (i.e., from a single pair of parents, one male and one female), as we explain in Methods (section Behavioural video-tracking) and in the documentation of the FramebyFrame package. We often see two different experimental designs in literature: comparing non-sibling wild-type and mutant larvae, or pooling different clutches which include all genotypes (e.g., pooling multiple clutches from heterozygous in-crosses or pooling wild-type clutches before injecting them). The first experimental design causes false positive findings, as the clutch-to-clutch variability we and others (Joo et al., 2021) observe gets interpreted as a behavioural phenotype. The second experimental design should not cause false positives but will decrease the sensitivity of the assay by increasing the spread within genotypes. In both cases, the clutch-to-clutch variability is hidden, either by interpreting it as a phenotype (first case) or by adding it to animal-to-animal variability (second case). Our experimental design is technically more challenging as it requires obtaining large clutches from unique pairs of parents. However, this approach is better as it clearly separates the different sources of variability (clutch-to-clutch or animal-to-animal). As for every experiment, yes, a larger number of replicates would be better, but we do not plan to assay additional clutches at this time. Our work heavily focuses on the sorl1 and psen2 knockout behavioural phenotypes. The key aspects of these phenotypes were effectively tested in four clutches as sorl1 were also tested in the citalopram experiment (Fig. 5), and psen2 was also tested in the small molecule rescue experiment (Fig. 6 and Fig. 6–suppl. 1). In the citalopram experiment, one H2O-treated sorl1 knockout clutch (n = 10) replicates fairly well the baseline recordings in Fig. 4–suppl. 5, the other does not but had especially low sample size (n = 6).

      We also plan to test another SSRI on sorl1 knockouts, so this point will be addressed.

      (3) The authors make the point that most of the AD risk genes are expressed in fish during development. Is there public data to comment on whether the genes of interest are expressed in mature/old fish as well? Just because the genes are expressed early does not at all mean that early- life dysfunction is related to future AD (though this could be the case, of course). Genes with exclusive developmental expression would be strong candidates for such an early-life role, however. I presume the case is made because sleep studies are mainly done in juvenile fish, but I think it is really a pretty minor point and such a strong claim does not even need to be made.

      This is a fair criticism but we do not make this claim, at least not from expression. The reviewer is probably referring to the following quote:

      “[…] most of these were expressed in the brain of 5–6-dpf zebrafish larvae, suggesting they play a role in early brain development or function,”

      which does not mention future risk of Alzheimer’s disease. We do suggest that these genes have a function in development. After all, every gene that plays a role in brain development must be expressed during development, so this wording seems reasonable. As noted, the primary goal was to check that the genes we selected were indeed expressed in zebrafish larvae before performing knockout experiments. Our discussion does raise the hypothesis that mutations in Alzheimer’s risk genes impact brain development and sleep early in life, but this argument primarily relies on our observation that knockout of late-onset Alzheimer’s risk genes causes sleep phenotypes in 7-day old zebrafish larvae and from previous work showing brain structural differences in infants and children at high genetic risk of Alzheimer’s disease (Dean et al., 2014; Quiroz et al., 2015), not solely on gene expression early in life.

      (4) A common quandary with defining sleep behaviorally is how to rectify sleep and activity changes that influence one another. With psen2 KOs, the authors describe reduced activity and increased sleep during the day. But how do we know if the reduced activity drives increased behavioral quiescence that is incorrectly defined as sleep? In instances where sleep is increased but activity during periods during wake are normal or elevated, this is not an issue. But here, the animals might very well be unhealthy, and less active, so naturally they stop moving more for prolonged periods, but the main conclusion is not sleep per se. This is an area where more experiments should be added if the authors do not wish to change/temper the conclusions they draw. Are psen2 KOs responsive to startling stimuli like controls when awake? Do they respond normally when quiescent? Great care must be taken in all models using inactivity as a proxy for sleep, and it can harm the field when there is no acknowledgment that overall health/activity changes could be a confound. Particularly worrisome is the betamethasone data in Figure 6, where activity and sleep are once again coordinately modified by the drug.

      This is a fair criticism. We agree it is a concern, especially in the case of psen2 as we claim that day-time sleep is increased while zebrafish are diurnal. We do not rely heavily on the day-time inactivity being sleep (the ZOLTAR predictions or the small molecule rescue do not change whether the parameter is called sleep or inactivity), but  our choice of labelling may be misleading. We will try to test this claim by plotting the distribution of the inactive period durations. If psen2 knockout larvae indeed sleep more during the day compared to controls, we might predict that inactive periods longer than 1 minute to increase disproportionately compared to the increase in shorter inactive periods.

      To address, “are psen2 KO responsive to startling stimuli like controls when awake/when quiescent”, we can try to look at the behaviour of psen2 knockout larvae that were awake (i.e., moved in the preceding one minute) or ‘asleep’ (i.e., did not move in the preceding one minute) at the light transitions and count the proportion of psen2 knockout or control larvae which displayed a startle response. If most psen2 knockouts react to the light transition, it should at least exclude the concern that they are very unhealthy, as the reviewer suggests. This criticism seems challenging to definitely address experimentally though. A possible approach could be to use a closed-loop system which, after one minute of inactivity, triggers a stimulus which is sufficient to startle an awake larva but not an asleep larva. If psen2 knockout larvae indeed sleep more during the day, the stimulus should usually not be sufficient to startle them. Note, how to calibrate this stimulus is also not straightforward. We do not plan to test this, but our analysis of the light transitions may provide a decent proxy.

      (5) The conclusions for the serotonin section are overstated. Behavioural pharmacology purports to predict a signaling pathway disrupted with sorl1 KO. But is it not just possible that the drug acts in parallel to the true disrupted pathway in these fish? There is no direct evidence for serotonin dysfunction - that conclusion is based on response to the drug. Moreover, it is just 1 drug - is the same phenotype present with another SSRI? Likewise, language should be toned down in the discussion, as this hypothesis is not "confirmed" by the results (consider "supported"). The lack of measured serotonin differences further raises concern that this is not the true pathway. This is another major point that deserves further experimental evidence, because without it, the entire approach (behavioral pharm screen) seems more shaky as a way to identify mechanisms. There are any number of testable hypotheses to pursue such as a) Using transient transgenesis to visualize 5HT neuron morphology (is development perturbed: cell number, neurite morphology, synapse formation); b) Using transgenic Ca reporters to assay 5HT neuron activity.

      Regarding the comment, “is it not just possible that the drug acts in parallel to the true disrupted pathway”, we think no, assuming we understand correctly your question. Key to our argument is the fact that sorl1 knockout larvae react differently to the drug than control larvae. As an example, take night-time sleep bout length, which was not affected by knockout of sorl1 (Fig. 4–suppl. 5). For the sake of the argument, say only dopamine signalling (the “true disrupted pathway”) was affected in sorl1 knockouts but that serotonin signalling was intact. Assuming that citalopram specifically alters serotonin signalling, then treatment should cause the same increase in sleep bout length in both knockouts and controls as serotonin signalling is intact in both. This is not what we see, however. Citalopram caused a greater increase in sleep bout length in sorl1 knockouts than in scrambled-injected larvae. In other words, the effect is non-additive, in the sense that citalopram did not add the same number of Z-scores to sorl1 knockouts or controls. We think this shows that serotonin signalling is somehow different in sorl1 knockouts. Nonetheless, we would concede that the experiment does not necessarily says much about the importance of the serotonin disruption caused by loss of Sorl1. It could be, for example, that the most salient consequence of loss of Sorl1 is cholinergic disruption (see reply to Reviewer #1 above) and that serotonin signalling is a minor theme.

      Furthermore, we agree with you and Reviewer #2 that the conclusions are overly confident. We will repeat this experiment with another SSRI as you suggest. Your suggestions to further test the serotonin system in the sorl1 knockouts are excellent as well, however we do not plan to pursue them at this stage.

      References:

      Ashlin TG, Blunsom NJ, Ghosh M, Cockcroft S, Rihel J. 2018. Pitpnc1a Regulates Zebrafish Sleep and Wake Behavior through Modulation of Insulin-like Growth Factor Signaling. Cell Rep 24:1389–1396. doi:10.1016/j.celrep.2018.07.012

      Chen D, Wang X, Huang T, Jia J. 2022. Sleep and Late-Onset Alzheimer’s Disease: Shared Genetic Risk Factors, Drug Targets, Molecular Mechanisms, and Causal Effects. Front Genet 13. doi:10.3389/fgene.2022.794202

      Cirrito JR, Disabato BM, Restivo JL, Verges DK, Goebel WD, Sathyan A, Hayreh D, D’Angelo G, Benzinger T, Yoon H, Kim J, Morris JC, Mintun MA, Sheline YI. 2011. Serotonin signaling is associated with lower amyloid-β levels and plaques in transgenic mice and humans. Proc Natl Acad Sci U S A 108:14968–14973. doi:10.1073/pnas.1107411108

      Dean DC, Jerskey BA, Chen K, Protas H, Thiyyagura P, Roontiva A, O’Muircheartaigh J, Dirks H, Waskiewicz N, Lehman K, Siniard AL, Turk MN, Hua X, Madsen SK, Thompson PM, Fleisher AS, Huentelman MJ, Deoni SCL, Reiman EM. 2014. Brain Differences in Infants at Differential Genetic Risk for Late-Onset Alzheimer Disease A Cross-sectional Imaging Study. JAMA Neurol 71:11–22. doi:10.1001/jamaneurol.2013.4544

      Eriksen JL, Sagi SA, Smith TE, Weggen S, Das P, McLendon DC, Ozols VV, Jessing KW, Zavitz KH, Koo EH, Golde TE. 2003. NSAIDs and enantiomers of flurbiprofen target γ-secretase and lower Aβ42 in vivo. J Clin Invest 112:440–449. doi:10.1172/JCI18162

      Espay AJ, Herrup K, Kepp KP, Daly T. 2023. The proteinopenia hypothesis: Loss of Aβ42 and the onset of Alzheimer’s Disease. Ageing Res Rev 92:102112. doi:10.1016/j.arr.2023.102112

      Hoffman EJ, Turner KJ, Fernandez JM, Cifuentes D, Ghosh M, Ijaz S, Jain RA, Kubo F, Bill BR, Baier H, Granato M, Barresi MJF, Wilson SW, Rihel J, State MW, Giraldez AJ. 2016. Estrogens Suppress a Behavioral Phenotype in Zebrafish Mutants of the Autism Risk Gene, CNTNAP2. Neuron 89:725–733. doi:10.1016/j.neuron.2015.12.039

      in ’t Veld Bas A., Ruitenberg Annemieke, Hofman Albert, Launer Lenore J., van Duijn Cornelia M., Stijnen Theo, Breteler Monique M.B., Stricker Bruno H.C. 2001. Nonsteroidal Antiinflammatory Drugs and the Risk of Alzheimer’s Disease. N Engl J Med 345:1515–1521. doi:10.1056/NEJMoa010178

      Jagirdar R, Fu C-H, Park J, Corbett BF, Seibt FM, Beierlein M, Chin J. 2021. Restoring activity in the thalamic reticular nucleus improves sleep architecture and reduces Aβ accumulation in mice. Sci Transl Med 13:eabh4284. doi:10.1126/scitranslmed.abh4284

      Jiang H, Newman M, Lardelli M. 2018. The zebrafish orthologue of familial Alzheimer’s disease gene PRESENILIN 2 is required for normal adult melanotic skin pigmentation. PLOS ONE 13:e0206155. doi:10.1371/journal.pone.0206155

      Jiang H, Pederson SM, Newman M, Dong Y, Barthelson K, Lardelli M. 2020. Transcriptome analysis indicates dominant effects on ribosome and mitochondrial function of a premature termination codon mutation in the zebrafish gene psen2. PloS One 15:e0232559. doi:10.1371/journal.pone.0232559

      Joo W, Vivian MD, Graham BJ, Soucy ER, Thyme SB. 2021. A Customizable Low-Cost System for Massively Parallel Zebrafish Behavioral Phenotyping. Front Behav Neurosci 14.

      Joubert L, Hanson B, Barthet G, Sebben M, Claeysen S, Hong W, Marin P, Dumuis A, Bockaert J. 2004. New sorting nexin (SNX27) and NHERF specifically interact with the 5-HT4a receptor splice variant: roles in receptor targeting. J Cell Sci 117:5367–5379. doi:10.1242/jcs.01379

      Lauretti E, Dincer O, Praticò D. 2020. Glycogen synthase kinase-3 signaling in Alzheimer’s disease. Biochim Biophys Acta Mol Cell Res 1867:118664. doi:10.1016/j.bbamcr.2020.118664

      Leng Y, Ackley SF, Glymour MM, Yaffe K, Brenowitz WD. 2021. Genetic Risk of Alzheimer’s Disease and Sleep Duration in Non-Demented Elders. Ann Neurol 89:177–181. doi:10.1002/ana.25910

      Mitchell PB, Hadzi-Pavlovic D. 2000. Lithium treatment for bipolar disorder. Bull World Health Organ 78:515–517.

      Munoz-Torrero D. 2008. Acetylcholinesterase Inhibitors as Disease-Modifying Therapies for Alzheimer’s Disease. Curr Med Chem 15:2433–2455. doi:10.2174/092986708785909067

      Muto V, Koshmanova E, Ghaemmaghami P, Jaspar M, Meyer C, Elansary M, Van Egroo M, Chylinski D, Berthomier C, Brandewinder M, Mouraux C, Schmidt C, Hammad G, Coppieters W, Ahariz N, Degueldre C, Luxen A, Salmon E, Phillips C, Archer SN, Yengo L, Byrne E, Collette F, Georges M, Dijk D-J, Maquet P, Visscher PM, Vandewalle G. 2021. Alzheimer’s disease genetic risk and sleep phenotypes in healthy young men: association with more slow waves and daytime sleepiness. Sleep 44. doi:10.1093/sleep/zsaa137

      Myers-Turnbull D, Taylor JC, Helsell C, McCarroll MN, Ki CS, Tummino TA, Ravikumar S, Kinser R, Gendelev L, Alexander R, Keiser MJ, Kokel D. 2022. Simultaneous analysis of neuroactive compounds in zebrafish. doi:10.1101/2020.01.01.891432

      Özcan GG, Lim S, Leighton PL, Allison WT, Rihel J. 2020. Sleep is bi-directionally modified by amyloid beta oligomers. eLife 9:e53995. doi:10.7554/eLife.53995

      Quiroz YT, Schultz AP, Chen K, Protas HD, Brickhouse M, Fleisher AS, Langbaum JB, Thiyyagura P, Fagan AM, Shah AR, Muniz M, Arboleda-Velasquez JF, Munoz C, Garcia G, Acosta-Baena N, Giraldo M, Tirado V, Ramírez DL, Tariot PN, Dickerson BC, Sperling RA, Lopera F, Reiman EM. 2015. Brain Imaging and Blood Biomarker Abnormalities in Children With Autosomal Dominant Alzheimer Disease: A Cross-Sectional Study. JAMA Neurol 72:912–919. doi:10.1001/jamaneurol.2015.1099

      Relkin NR. 2007. Beyond symptomatic therapy: a re-examination of acetylcholinesterase inhibitors in Alzheimer’s disease. Expert Rev Neurother 7:735–748. doi:10.1586/14737175.7.6.735

      Rihel J, Prober DA, Arvanites A, Lam K, Zimmerman S, Jang S, Haggarty SJ, Kokel D, Rubin LL, Peterson RT, Schier AF. 2010. Zebrafish Behavioral Profiling Links Drugs to Biological Targets and Rest/Wake Regulation. Science 327:348–351. doi:10.1126/science.1183090

      Sleegers K, Brouwers N, Gijselinck I, Theuns J, Goossens D, Wauters J, Del-Favero J, Cruts M, van Duijn CM, Van Broeckhoven C. 2006. APP duplication is sufficient to cause early onset Alzheimer’s dementia with cerebral amyloid angiopathy. Brain J Neurol 129:2977–2983. doi:10.1093/brain/awl203

      Sun L, Zhou R, Yang G, Shi Y. 2017. Analysis of 138 pathogenic mutations in presenilin-1 on the in vitro production of Aβ42 and Aβ40 peptides by γ-secretase. Proc Natl Acad Sci 114:E476–E485. doi:10.1073/pnas.1618657114

      Weggen S, Rogers M, Eriksen J. 2007. NSAIDs: small molecules for prevention of Alzheimer’s disease or precursors for future drug development? Trends Pharmacol Sci 28:536–543. doi:10.1016/j.tips.2007.09.004

      Wiltschko AB, Tsukahara T, Zeine A, Anyoha R, Gillis WF, Markowitz JE, Peterson RE, Katon J, Johnson MJ, Datta SR. 2020. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci 23:1433–1443. doi:10.1038/s41593-020-00706-3

      Yang T, Arslanova D, Gu Y, Augelli-Szafran C, Xia W. 2008. Quantification of gamma-secretase modulation differentiates inhibitor compound selectivity between two substrates Notch and amyloid precursor protein. Mol Brain 1:15. doi:10.1186/1756-6606-1-15

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      A. General Statements

      We thank the reviewers for their constructive feedback. We have made significant revisions to the mathematical modelling section of the manuscript to address your concerns. Therefore, some of the specific issues and concerns raised in previous reviews no longer apply. Where that is the case, please see the relevant context in the revision as indicated in the point-by-point description section below. We summarize the key points in the revised manuscript as follows.

      1. The key finding of our study, involving experimental measurements and mathematical modelling, is plasticity in the MinD concentration gradient, which results from spatial differences in molecular interactions and is an intrinsic property of the Min system during cell growth. This study reveals not only the role of the MinD concentration gradient in modulating bacterial cell division site placement but also showcasing an example of cellular components in the form of a concentration gradient in fundamental cellular processes, a concept crucial in cell biology. This work provides conceptual advancement in a quantitative understanding of MinD oscillations in the cellular environment and provides implications for bacterial cell division regulation for further studies in the field.

      2. The reviewer requested clarification on the differences between our study and previous studies involving experimental measurements and mathematical modelling of Min oscillations in cells. We would like to emphasize that although the goal of the previous works was to measure the spatiotemporal distribution of oscillating MinD concentration gradients as a function of cell length, these works conceived the problem differently and therefore used different experimental designs and execution methods, which differentiates our key conclusions from theirs. This is also true for mathematical modelling. Although similar observations can be found in some respects, they are not directly comparable due to the different mathematics and assumptions used in the simulations. For example, our model was built to adequately investigate the biological question of the MinD concentration gradient during cell elongation but not to evaluate the impact of cell shape and confinement or the nucleation effect of MinD. Thus, our model cannot be generalized to other shapes, such as those observed in the study by Wu et al., 2015 (Wu et al, 2015). Therefore, we would like to draw attention to the experimental rigor and to the specific points and views that contribute to our understanding of Min systems. We now provide a comprehensive comparison between them in the Supplemental Information.

      3. We have re-run the simulation to refine and improve the modelling procedures and results, and the corresponding text and illustration are provided in the Results section of the main text (Lines 265-279, 614-653) and Fig. S6. In brief, we fixed the diffusion coefficients D_D and D_E from Meacci et al. (2006) (Meacci et al, 2006); the dissociation rate constant k_de from a previous simulation (Wu et al., 2015); and the experimentally measured MinD and MinE concentrations in this study. Meanwhile, the diffusion coefficients D_d and D_de were assumed values based on bacterial membrane protein diffusion (Schavemaker et al, 2018). This operation allowed us to probe for the general behaviours of the system. As a result, we were able to obtain a few parameter sets, including #2728, that generate features of the oscillation period, λ_N and I_Ratio, that highly mimic MinD oscillation in the cellular context (Figs. 4C, S7-9). We further tested the impact of different kinetic constants, k_de, k_dD, k_dE, k_D, and k_(ADP→ATP), which represent different molecular interactions influencing the oscillation period, λ_N and I_Ratio (Fig 4D-H). Our findings have provided us with a solid theoretical view of how oscillation features may be controlled by different molecular interactions. Furthermore, the modelling results help us understand the possible mechanisms associated with oscillation cycle maintenance and length-dependent variable concentration gradients.

      4. Regarding the inclusion or removal of results from more culture conditions, we decided to keep only one condition as in the previous version for the following reasons. In order to draw convincing conclusions, we consider it more important to characterize all aspects under the same growth condition and avoid manipulation. Therefore, the main conclusions are drawn from our experiments characterizing several aspects of MinD oscillations in cells growing with 0.4% glucose. In support of these observations, we decided to maintain only one other condition, 0.1% glucose. Further analysis of cells growing under other conditions will not change the main conclusions but will increase the difficulty of determining how the MinD concentration changes with cell growth.

      5. Studying the variable concentration gradient underlying the dynamic oscillations of the Min system may be of broad interest to cell biologists since the concentration gradient plays a fundamental role in various cellular processes, and the concept of concentration gradients is crucial in cell biology. Examples of related processes include passive and active transport, osmosis, cell signalling, and maintenance of cellular homeostasis. These processes allow cells to respond to their environment, regulate their internal conditions, and perform important functions required for survival and normal function. In addition, variable concentration gradients, characterized by the numerical descriptor λ_N and was reproduced in a simple mathematical model, demonstrate a nonlinear dynamics behaviour in physical biology. Therefore, the audience of this work can include the broader general audience of cell biology and physical biology rather than just the immediate specialized audience interested in the Min system. We will also reiterate the importance of specialized research, which often provides the basis for broader application and understanding.

      B. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Parada et al. studied both experimentally and theoretically the MinD concentration distribution of Min waves during cell growth. The main finding was that (i) the gradient of MinD is steeper for longer cells and accordingly the MinD concentration at the middle of cell is lower, (ii) period of the oscillation is independent to the cell length, and (iii) those features are shared even under glucose starvation except the MinD gradient is steeper. (iv) Those results are supplemented by the analyses of the reaction-diffusion equations in which parameters that can reproduce the MinD concentration distribution are identified. I think the results are interesting; basically, as the cell grows, the contrast of the wave becomes clearer, such the MinD concentration at the cell centre decreases. The results may clarify the mechanism of FtsZ accumulation at the cell centre more quantitatively. The experiments were performed by measuring the fluorescent intensity of MinD during cell growth and analysing the intensity distribution along the long axis of the cell. The theoretical results were based on the analyses of the reaction-diffusion model. Both approaches are already well established and the results sound. Nevertheless, I do not think the novelty of this work is not well highlighted in the current manuscript; I think most of the results, except (iii) and (iv), have already been shown explicitly or implicitly in the previous studies. Min oscillations in a growing cell have been analysed both theoretically and experimentally in (Meacci 2005) and [1] (Fischer-Friedrich et al, 2010). The concentration distribution and period of the oscillation were measured. The complete results were presented in [2] (Meacci et al., 2006), and I am not aware of those results in scientific journals (the thesis is available online). Nevertheless, I think it is fair to cite those studies and compare the current results with them. In fact, in [2], it was shown that the concentration of MinD near the cell centre decreases as the cell grows, the total MinD concentration is approximately constant during the growth (therefore, the number of the molecules increases), and that the variance of the period becomes smaller as the cell grows. I do not think those previous studies spoil this work, and this work deserves publication somewhere. Still, the authors should highlight the novelty of this study more clearly.

      ANS: We thank the reviewer for recognizing the soundness of our experimental and theoretical approaches and results. The key finding of our study, involving experimental measurements and mathematical modelling, is plasticity in the MinD concentration gradient, which results from spatial differences in molecular interactions and is an intrinsic property of the Min system during cell growth. This study reveals not only the role of the MinD concentration gradient in modulating bacterial cell division site placement but also showcasing an example of cellular components in the form of a concentration gradient in fundamental cellular processes, a concept crucial in cell biology. We believe that the established techniques and methods are integral to a broad range of works and provide confidence in improving them and using them to test hypotheses and obtain results. We also appreciate the reviewer for pointing out that Meacci's PhD thesis entitled "Physical aspects of Min oscillations in Escherichia coli" (Meacci & Kruse, 2005) is available online for public access. This thesis, along with two publications (Meacci & Kruse, 2005) (Meacci et al., 2006), explored Min oscillations in growing cells and used mathematical models. These two published works are cited in the previous version of the manuscript because we agree that these earlier works provide valuable context. As recommended, we went through these works again and the work by Fischer-Friedrich et al. (2010) (Fischer-Friedrich et al., 2010) to compare their wet experiments and mathematical models with ours, which are detailed in the Supplemental Information (Lines 26-147). Here, we emphasize that although the published works and our work set the goal of measuring the spatiotemporal distribution of oscillating MinD concentration gradients as a function of cell length, we conceived the problem differently and therefore used different experimental designs and analysis approaches, which have led to the key conclusions that differentiate our work from theirs.

      Major comments: (i) In (Meacci 2005) and [1,2], it was claimed that the standard deviation of the period is comparable with the mean period, particularly for the shorter cell. Therefore, they did not claim the period is independent to the cell length. As far as I understood, the variance arises from the variance of the total protein concentration in the assemble of cells. I am wondering how the authors are able to conclude the constant period in different cell length. I also point out that in the theoretical part of (Meacci 2005), the period is, in fact, increasing as the cell grows and suddenly decreases at the length in which cell division occurs.

      ANS: In our experiments, we found that the oscillation periods ranged from 36.8 to 65.6 sec, as measured from a population of cells (length of 1.9-4.5 µm; main text, Fig. 1E). Moreover, the standard deviations of the period ranged from 5.4% to 34.8% of the period, with larger standard deviations more common in shorter cells (Fig. 1D), indicating that regular interpolar oscillations are more likely to occur in longer cells. This observation echoes the study by Fischer-Friedrich et al. (2010) (Fischer-Friedrich et al., 2010), who reported stochastic switching MinD oscillation between two cell poles in cells below 2.5 μm. MinD starts to oscillate regularly from pole-to-pole between 2.5-3 μm with an oscillation period of 80 sec. Above 3.5 μm, MinD invariably undergoes regular oscillation with an initial period of 87 sec and then decreases to 70 sec at the end. In their study, they focused on the length-dependent switching from stochastic to regular oscillation states and speculated that the amount of MinE bound to the membrane critically influenced the shift from stochastic to regular interpolar oscillations. In addition, their observation of a longer period at the initial phase and a shorter period after the cells grew beyond 3.5 μm somewhat coincided with our simulation results, as shown in Fig. 4C-H, left. In Meacci's work (Thesis: Figure 2.14; Meacci and Kruse (2005) (Meacci & Kruse, 2005): Figure 5(b)), the temporal oscillation periods were measured from 40 to 120 sec when focusing on cells with lengths similar to those in our measurements (black dots in Meacci's chart). Our measurements of oscillation periods clearly show much smaller fluctuations than those in Meacci's study and are more comparable to Fischer-Friedrich's measurements. Differences can arise across different bacterial strains and culture conditions that may significantly affect the amount and quality of protein expressed in individual studies. In short, all three works differ in terms of experimental design and execution. Although similar observations can be found in some aspects, they are not directly comparable. Therefore, we would like to draw attention to the experimental rigor and specific points and views that contribute to our understanding of the Min system. We have changed the wording from 'constant period' to 'fairly stable period' throughout the manuscript. This description is based on our experimental measurements (Fig. 1D, E) and is also supported by our mathematical modelling (Fig. 4C-H, left). In response to the statement from the theoretical model of (Meacci & Kruse, 2005): "the period is increasing as the cell grows and suddenly decreases at the length in which cell division occurs." First, our simulation results revealed a mild increase in the oscillation period during cell elongation (Fig. 4C). The increase is adjustable by varying the reaction rate constants in the simulation (Fig. 4D-H). Second, although we did not simulate dividing cells, our experimental measurements clearly showed that this period increased in newborn cells (Fig. S4). As mentioned above, although similar observations can be found in different studies, they are not directly comparable because the experiments were performed differently for different purposes. We have added comparison of different models in the Supplemental Information (Lines 26-147).

      (ii) I do not think the explanations of the reaction-diffusion model were well described. The authors mentioned that they studied a one-dimensional model and used the delta function to describe the membrane reaction. Did the authors study 1D cytosol and 0D membrane? Then, why the surface diffusion term exists in (4) and (5)? I believe the authors simply assumed that both the membrane and the cytosol are 1D (with larger diffusion constants for cytosolic Min concentrations). Then, the delta functions in (1)-(5) are not necessary. In (Wu 2015), the delta function was used in order to treat a 2D membrane embedded in 3D space.

      Besides that, there is no description of the initial conditions for the concentration fields to solve the reaction-diffusion equations. I think the description of the no-flux boundary condition is better put in the Methods rather than supplementary materials.

      ANS: Thank you for your suggestions to improve the description of the numerical model. As summarized below, we have rewritten this section of 'Simulating the dynamic MinD concentration gradient in growing cells' in the manuscript (Lines 237-279). We have specified the dimensionality of the rate and diffusion constants of each molecule, where applicable, in our 1D model from Lines 237-264. Their dimensionality can also be conceived from their units, as listed in Tables 2 and S4. We have specified the initial 'no-flux' boundary conditions in Lines 267, 630, and 647. We agree that the delta function is not necessary and have removed it from the equations.

      (iii) As in the previous comment, the current model did not take into account the geometry of the system; namely, cytosol is in 3D and membrane is on 2D. Recent theoretical studies can handle the effect, and also the effect of confinement. I would appreciate it if the authors would make a comment on whether those issues are relevant or not for the conclusion of this work.

      ANS: Thank you for pointing out this interesting aspect of cell geometry as investigated in Wu et al., 2015 (Wu et al., 2015). Our model is built to adequately describe changes in the MinD concentration gradient during cell elongation under the assumption that a 1D description is sufficient. Thus, our model cannot be generalized to other shapes, such as those observed in Wu et al., 2015 (Wu et al., 2015). This point is now commented upon in Supplemental Information, lines 120-123.

      (iv) I would appreciate it if the authors would describe the screening process more clearly. I did understand the first screening is a finite imaginary part and a positive real part at the first mode of spatial inhomogeneity in the eigenvalues. However, I did not understand the other processes clearly. The second screening is based on \lambda_N and I_Ratio, but its criteria is not clear. I think both quantities fluctuated in experimental results and I am not sure what to define numerical results match them. The third process is based on a fitting error using the fitting function of linear increase plus a constant. I am not sure why we need to exclude, for example, the bottom right example in Fig.S6 because it shows no oscillation until the cell length of 3um but then the gradient linearly increases. Please clarify how to justify the criteria. The same argument applies to the fourth screening process. It is not clear why the slope should be smaller than 2.

      ANS: Thank you for your suggestions to improve the description of the screening process. We have re-run the simulation to refine and improve the screening process, and the corresponding text and illustration are provided in the Results section of the main text (Lines 237-279, 614-653) and Fig. S6.

      (v) The authors claimed that the steeper gradient of MinD under glucose starvation results in cell division for shorter cells. I do not think the claim is convincing. It is necessary to measure the correlation between the length at the cell division and the gradient. It would also be nicer to show the correlation under other parameters. I think those studies truly support the authors' claim and the novelty of this work.

      ANS: Thank you for the comments. We would like to draw your attention to the right side of the graph shown in Fig. 3B, E, where measurements were obtained from cells prior to division. Our claim that "the steeper gradient of MinD under glucose starvation results in cell division for shorter cells" is also supported by the wave slope (λ_N range): 0.4% glucose of 1.49-2.66 (cell length range: 1.7-4.5 µm) and glucose starvation of 1.34-3.54 (cell length range: 2.1-3.8 µm). Therefore, under glucose starvation, λ_N increases more significantly with increasing length, allowing us to speculate on the contribution of steeper concentration gradient in stressed shorter cell to division. In the revised manuscript, the statement is kept in the Results section (Lines 217-218), but removed from the abstract. About the correlation between the concentration gradient and cell length at division under different conditions, we consider it more important to characterize all aspects under the same growth condition and avoid manipulation. In this study, the main conclusions are drawn from our experiments characterizing several aspects of MinD oscillations in cells growing with 0.4% glucose. In support of these observations, we decided to maintain only one other condition, 0.1% glucose. Further analysis of cells growing under other conditions will not change the main conclusions but will increase the difficulty of determining how the MinD concentration changes with cell growth.

      (vi) The conclusion at Line 346 "This plasticity arises from spatial differences in molecular interactions between MinD and MinE, as demonstrated..." looks unclear to me. My understanding is that (i) by screening the randomly sampled parameters in the reaction-diffusion model, the authors found the parameters that "match" experimental results, and (ii) the parameters after screening show the correlation between them (k_dD-k_dE and k_D-k_ATP->ADP). The logic heavily relies on the reaction-diffusion model is quantitatively correct. First, I think it is better to explain the logic more explicitly, that is, the claim of the molecular interaction is not based on the experimental facts. Second, I personally think the reaction-diffusion model used in this work does not reproduce quantitatively the experimental results, as discussed in (iii) and also (iv). Please make some discussions on how to justify the comparison between the model and experiments.

      ANS: Thank you for your constructive comments. To address these questions, we have re-run the simulation to refine and improve the results, and the corresponding text and illustration are provided in the Results section of the main text (Lines 237-279, 614-653) and Fig. S6. The kinetic parameters used in this study are described in the main text, lines 258-264: 'To randomly search for combinations of the parameter sets k_dD, k_dE, k_D, and k_(ADP→ATP), the following parameters were fixed in the simulation: the diffusion coefficients D_d and D_de were assumed values based on bacterial membrane proteins (Schavemaker et al., 2018), the diffusion coefficients D_D and D_E were from Meacci et al. (2006) (Meacci et al., 2006), and the dissociation rate constant k_de were from a previous simulation (Wu et al., 2015). This operation allowed us to probe for the general behaviours of the system.' Lines 277-279: 'This screening process reduced the parameter sets to 23, including set #2827, which, judging by the correlation plots for length vs. period, λ_N, and I_Ratio (Figs. S7-S9), showed features similar to those of the experimental data (Figs. 1E, 3B, C).' Based on the parameters of set #2827, we rigorously tested the impact of different kinetic constants that represent different molecular interactions on the oscillation period, λ_N and I_Ratio (Fig 4D-H). The results are described in the section of 'Effect of the kinetic rate constant on the MinD concentration gradient' of the main text, lines 323-349. This effort has provided us with a solid theoretical view of how oscillation features may be controlled by different molecular interactions. In addition, a comparison between our modelling and experimental results is described in the main text, section 'In silico oscillation resembles oscillation in a cellular context', lines 300-321.

      (vii) I did not capture the point why the authors can claim "... further distinguishing in vivo and in vitro observations. " at Line 350. I did not find the results comparing with vitro studies. I would appreciate a demonstration of vitro results and/or references.

      ANS: To avoid confusion, this sentence has been removed.

      Minor comments: (1) Line 214: It should be "Fange and Elf".

      ANS: Line 238 in the revised manuscript: This has been corrected.

      (2) I think it is better to show sampled points in Fig. 4C and 4D to show how dense the authors sampled in the parameter space.

      ANS: Since we have rewritten this part, the suggested revision is no longer applicable.

      REFERENCES: [1] Fischer-Friedrich, Elisabeth / Meacci, Giovanni / Lutkenhaus, Joe / Chaté, Hugues / Kruse, Karsten, "Intra- and intercellular fluctuations in Min-protein dynamics decrease with cell length", Proceedings of the National Academy of Sciences, 107, 6134-6139 (2010). [2] Meacci, Giovanni, "Physical Aspects of Min Oscillations in Escherichia Coli", PhD thesis (2006) available at

      Reviewer #1 (Significance (Required)):

      General assessment: I think the strength of this study is that it potentially shows the quantitative correlation between the MinD concentration gradient during the oscillation and the cell length when it divides. However, the current data of glucose starvation is not convincing enough. The model parts are interesting but their connection to the experiments is not clear in the current manuscript.

      ANS: Thank you for your comment. The key finding of our study, involving experimental measurements and mathematical modelling, is plasticity in the MinD concentration gradient, which results from spatial differences in molecular interactions and is an intrinsic property of the Min system during cell growth. We hypothesized that if the plasticity of the MinD concentration gradient is an intrinsic property of the system, then this property would be robust and show consistent behaviour under different growth conditions. Therefore, we tested this hypothesis by studying MinD oscillations under a low-glucose condition, and the results strengthened the main conclusion derived from experiments under the regular growth condition containing 0.4 % glucose. We believe that further analysis of cells growing under other conditions will not change the main conclusions but may increase the difficulty of determining how the MinD concentration changes with cell growth. Therefore, we decide to make this section concise, containing only one additional condition, even though we have more data than presented here. As mentioned earlier in this response letter, we have re-run the simulation to refine and improve the results, and the corresponding text and illustration are provided in the Results section of the main text (Lines 237-279, 614-653) and Fig. S6. This operation allowed us to probe for the general behaviours of the system. As a result, we were able to obtain a few parameter sets, including #2728, that generate features of the oscillation period, λ_N and I_Ratio, that strongly mimic MinD oscillation in the cellular context (Figs. 4C, S7-9). We further tested the impact of different kinetic constants, k_de, k_dD, k_dE, k_D, and k_(ADP→ATP), which represent different molecular interactions influencing the oscillation period, λ_N and I_Ratio (Figs. 4D-H). This effort has provided us with a solid theoretical view of how oscillation features may be controlled by different molecular interactions.

      Advance: The advance of this study is to measure the MinD concentration gradient under glucose starvation, and to compare the experimental results with the (simplified) model under a wide range of parameters. I do not think the advance in the current manuscript looks conceptual level because the conceptual conclusions are not really convincing from the results. In this respect, the advance of this work may be technical.

      ANS: Thank you for this constructive comment and have responded as follows. In combination with both experimental and theoretical efforts in the revised manuscript, this work provides conceptual advancement in a quantitative understanding of MinD oscillations in the cellular environment and provides implications for bacterial cell division regulation for further studies in the field. Specifically, we would like to emphasize that this work revealed the inherent plasticity and adaptability of the MinD concentration gradient that contributes to division site selection. The mathematical modelling provided us with a solid theoretical view of how oscillation features may be controlled by different molecular interactions.

      Audience: As a theoretician working on biophysics, including the model of the Min system, I think a specialised audience would be interested in this study. People who are studying the mechanism of the Min oscillation and resulting cell division, particularly those who are interested in both experiments and models, would be interested in this work. For the broad audience, I do not think the novelty of this study is well described.

      ANS: Thank you for your comment. We would like to point out that studying the variable concentration gradient underlying the dynamic oscillations of the Min system may be of broad interest to cell biologists since the concentration gradient plays a fundamental role in various cellular processes, and the concept of concentration gradients is crucial in cell biology. Examples include passive and active transport, osmosis, cell signalling, and maintenance of cellular homeostasis. These processes allow cells to respond to their environment, regulate their internal conditions, and perform important functions required for survival and normal function. In addition, the variable concentration gradient, characterized by the numerical descriptor λ_N and reproduced in a simple mathematical model, demonstrates a nonlinear dynamics behaviour in physical biology. Therefore, the audience of this work may include the broader general audience of cell biology and physical biology rather than just the immediate specialized audience interested in the Min system. We will also reiterate the importance of specialized research, which often provides the basis for broader application and understanding.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This work by Parada et al showed that in the oscillatory Min System, MinD gradient was steeper in longer e.coli cells, while period was stable. This behavior was recapitulated in a mathematical model and it also revealed coordinated reaction rates in a wide range of parameter space.

      ANS: We thank the reviewer for the concise summary of our work.

      Major comments: 1. There were some inconsistencies between experimental and modeling data. Wave slope (𝜆𝑁) plateaued at ~3um in the model but not shown in the experiment (Fig.3B). The period was much less in the model (Fig. S8) than in the experiment (Fig. 1B).

      ANS: Thank you for pointing out this problem. We have re-run the simulation to refine and improve the results, and the corresponding text and illustration are provided in the Results section of the main text (Lines 237-279, 614-653) and Fig. S6. This operation allowed us to probe for the general behaviours of the system. As a result, we were able to obtain a few parameter sets, including #2728, that generate features of the oscillation period, λ_N and I_Ratio, that highly mimic MinD oscillation in the cellular context (Figs. 4C, S7-9). Regarding oscillation period, the simulation result was shorter than the experimental measurements. Even though, based on the parameters of set #2827, we rigorously tested the impact of different kinetic constants that represent different molecular interactions on the oscillation period, λ_N and I_Ratio (Main text, lines 323-349; Fig 4D-H). This effort has provided us with a theoretical view of how oscillation features may be controlled by different molecular interactions. We found that the rate constants k_de, representing detachment of the MinDE complex from the membrane, and k_(ADP→ATP), representing recharging of MinD-ADP with ATP, more significantly affected the oscillation period. The results suggested that the oscillation cycle time is tunable. In response to the question of the wave slope (λ_N) plateaued at ~3um in the modelling (Fig. 3B) but not shown in the experiment (Fig. 1D), we think this is due to experimental examination of a heterogenous population of cells versus simulating a growing bacterial cell. We came up with conclusions and hypotheses through wet experiments, which were further strengthened using mathematical modelling, providing insights into kinetic properties of the Min system.

      1. Generally, I found that the data of starved condition added little to the major message. Unless the model can recapitulate the even steeper gradient in such condition by tuning starvation-related parameters, it may be removed.

      ANS: We thank the reviewer for this suggestion. The key finding of our study, involving experimental measurements and mathematical modelling, is plasticity in the MinD concentration gradient, which results from spatial differences in molecular interactions and is an intrinsic property of the Min system during cell growth. We hypothesized that if the plasticity of the MinD concentration gradient is an intrinsic property of the system, then this property would be robust and show consistent behaviour under different growth conditions. Therefore, we tested this hypothesis by studying MinD oscillations under a low-glucose condition, and the results strengthened the main conclusion derived from experiments under the regular growth condition containing 0.4 % glucose. We agree that further analysis of cells growing under other conditions will not change the main conclusions but may increase the difficulty of determining how the MinD concentration changes with cell growth. Therefore, we decide to make this section concise, containing only one additional condition, even though we have more data than presented here.

      1. The authors need to compare what was different/novel between the model in this study and previous models such as Wu, et al 2015 and highlight the uniqueness of this work.

      ANS: Thank you for this suggestion. We now provide a comprehensive comparison between them in the Supplemental Information (Lines 26-147). We would like to emphasize that although the goal of the previous works was to measure the spatiotemporal distribution of oscillating MinD concentration gradients as a function of cell length, these works conceived the problem differently and therefore used different experimental designs and execution methods, which differentiates our key conclusions from theirs. This is also true for mathematical modelling. Although similar observations can be found in some respects, they are not directly comparable due to the different mathematics and assumptions used in the simulations. Therefore, we would like to draw attention to the experimental rigor and to the specific points and views that contribute to our understanding of Min systems.

      1. The model explored parameter space of reaction rates and found 60 sets. The KdE, KD, KdD, KADP-ATP ranged 6 orders of magnitude. It is interesting data in itself, but cells were not likely to vary that much for reaction rates. The relevance should be discussed.

      ANS: Thank you for pointing out this problem. For this revision, we re-ran the simulation to refine and improve the results, allowing us to identify parameter sets that generate features resembling the experimental measurements. Using set #2728 as an example, the variations in the five rate constants k_de, k_dD, k_dE, k_D, and k_(ADP→ATP) fall within a small range (Table 2, S4), eliminating the concern that arose from the previous version of the manuscript. We found that this parameter set allows for maximum utilization of MinD and MinE molecules, which are fixed in number according to experimental measurements, to drive membrane-associated oscillations in the simulation.

      Minor comments: 1. Fig.1B colors were conflicting. The legend was different than diagram. Fig.1C no scale for x axis.

      ANS: We have resolved the colour conflict in Fig. 1B, and a time range has been added to Fig. 1C.

      1. Fig.S6A How the 638 oscillatory parameter sets were matched with experimental data and screened to 174 sets was not clear. Data of fitting errorANS: Thank you for your suggestions to improve the description of the screening process. In this revision, we have re-run the simulation to refine and improve the results, and the corresponding text and illustration are provided in the Results section of the main text (Lines 237-279, 614-653) and Fig. S6. This operation allowed us to probe for the general behaviours of the system. The mentioned filter no longer applies.

      2. Significant digits were not used properly. For example, the period (table 1) was showed as 46.00 sec, but the imaging interval was 12 sec, the 2 decimal digits were thus meaningless. The same argument goes for length measurement at 2.84 um, while the optical resolution of the microscope used should be no good than 200nm.

      ANS: We have corrected this significant digit throughout the manuscript.

      1. For scatter plot like Fig.1D-G, generally smaller dots would show trend more obvious.

      ANS: We have modified the plots and used smaller dots in Figs. 1D-G, 3B, C, E, F, S3D, and S5B, C.

      1. The molecular mechanism of why MinD gradient increases with length was not the scope of the current study, but better to be discussed.

      ANS: Let me address this comment in another way. The key finding of our study, involving experimental measurements and mathematical modelling, is plasticity in the MinD concentration gradient, which results from spatial differences in molecular interactions and is an intrinsic property of the Min system during cell growth. In the revised manuscript, we have re-run the simulation to refine and improve the modelling procedures and results, and the corresponding text and illustration are provided in the Results section of the main text (Lines 265-279, 614-653) and Fig. S6. In brief, we fixed the diffusion coefficients D_D and D_Efrom Meacci et al. (2006) (Meacci et al., 2006); the dissociation rate constant k_de from a previous simulation (Wu et al., 2015); and the experimentally measured MinD and MinE concentrations in this study. Meanwhile, the diffusion coefficients D_d and D_de were assumed values based on bacterial membrane protein diffusion (Schavemaker et al., 2018). This operation allowed us to probe for the general behaviours of the system. As a result, we were able to obtain a few parameter sets, including #2728, that generate features of the oscillation period, λ_N and I_Ratio, that highly mimic MinD oscillation in the cellular context (Figs. 4C, S7-9). We further tested the impact of different kinetic constants, k_de, k_dD, k_dE, k_D, and k_(ADP→ATP), which represent different molecular interactions influencing the oscillation period, λ_N and I_Ratio (Fig 4D-H). Our findings have provided us with a solid theoretical view of how oscillation features may be controlled by different molecular interactions. Furthermore, the modelling results help us understand the possible mechanisms associated with oscillation cycle maintenance and length-dependent variable concentration gradients.

      1. Fig. S8, why sudden jump in period in many of the sets of both groups?

      ANS: This supplemental figure is now Fig. S7. A slower oscillation at the initiation of oscillation appears to be a common property in our simulation.

      Reviewer #2 (Significance (Required)):

      Min system was well-studied oscillation mechanism to restrict FtsZ at cell center. Previous work has shown how the system work molecularly, simulated the behavior and reconstituted many different patterns in vitro. The major new information from this work was: 1. the rigorously measured endogenous level of MinD and MinE; 2. gradient increased with length; 3. a model recapitulated this relationship and explored parameter space of reaction rates. The paper was well presented, experiments and analysis were rigorous, and the conclusions were not overstated. It should interest specialized cell biologists studying cell size, oscillation pattern.

      ANS: Many thanks to Reviewer 2 for recognizing the contributions of our work to the understanding of the Min system and its role in cell division. We also thank you for identifying professional cell biologists studying cell size and oscillation patterns as readers of our paper. We would like to emphasize that cellular concentration gradients play a fundamental role in various cellular processes and that the concept of concentration gradients is crucial in cell biology. These concentration gradient-mediated processes allow cells to respond to their environment, regulate their internal conditions and perform important functions required for survival. In addition, the variable concentration gradient, characterized by the numerical descriptor λ_N and reproduced in a simple mathematical model, demonstrates a nonlinear dynamics behaviour in physical biology. Therefore, the audience of this work may include a broader audience in the field of cell biology and physical biology rather than just an immediate specialist audience. We will also reiterate the importance of specialized research, which often provides the basis for broader application and understanding.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript shows that the concentration of MinD does not change during the division cycle of E. coli. Due to the oscillation pattern the concentration of MinD decreases at the mid-cell which makes it favorable for the division. The mid-cell decrease in concentration of MinD is majorly length dependent. The oscillation pattern is not due to the change in concentration of MinD, but due to the plasticity arises from the spatial differences in molecular interactions between MinD and MinE. The manuscript is well written, the experiments are performed carefully and the results will be of interest to readers from variety of field. However, there are several concerns need explanation.

      ANS: We greatly appreciate the positive feedback from the reviewer, and we address the specific concerns below.

      Major concerns: One of my major concern is these interactions are not shown experimentally but explained using either the previously published literature or mathematical models. Further, the previous literatures are shown on in vitro models which does not mimic the in vivo system fully.

      ANS: We thank the reviewer for the important point that reaction rates in previous studies and in our model of Min oscillations have not been experimentally tested. We are aware of the lack of experimental measurements, but these reaction rates cannot be measured in batch reactions using classical biochemical methods. To accurately measure these reaction rates, the experiments require advanced techniques and methods to handle spatial and temporal resolution, which is beyond the scope of our current study. However, in the revised manuscript, we have re-run the simulation to refine and improve the results, and the corresponding text and illustration are provided in the Results section of the main text (Lines 237-279, 614-653) and Fig. S6. In our simulation, we fixed the diffusion coefficients D_D and D_E from Meacci et al. (2006) (Meacci et al., 2006); the dissociation rate constant k_de from a previous simulation (Wu et al., 2015); and the experimentally measured MinD and MinE concentrations in this study. Meanwhile, the diffusion coefficients D_d and D_de were assumed values based on bacterial membrane protein diffusion (Schavemaker et al., 2018). This operation allowed us to probe for the general behaviours of the system. As a result, we were able to obtain a few parameter sets, including #2728, that generate features of the oscillation period, λ_N and I_Ratio, that highly mimic MinD oscillation in the cellular context (Figs. 4C, S7-9). Interestingly, we found that this parameter set allows for maximum utilization of MinD and MinE molecules, which are fixed numbers from experimental measurements, to drive membrane-associated oscillations in the simulation. We further tested the impact of different kinetic constants, k_de, k_dD, k_dE, k_D, and k_(ADP→ATP), which represent different molecular interactions influencing the oscillation period, λ_N and I_Ratio (Figs. 4D-H). Our findings have provided us with a solid theoretical view of how oscillation features may be controlled by different molecular interactions, and help us understand the possible mechanisms associated with oscillation cycle maintenance and length-dependent variable concentration gradients.

      The concentration of MinD does not change with the increasing length of the cell. Is the MinD concentration (or copy numbers) is different in the case of cells growing in low glucose and when compared to the cells growing at high glucose?

      ANS: Thank you for the comments. As shown in Figs. 2B, C, the concentration of MinD changed with cell length, but the number of MinD molecules per unit area did not change significantly with cell length. Although how the number of MinD molecules changes when cells are grown under low-glucose conditions is unclear, this number does not appear to be essential for the following reasons. We focused on studying Min oscillations during the normal growth cycle, minimizing experimental manipulations to analyse oscillation dynamics. Measurements of oscillations in cells grown under low-glucose conditions support the primary measurements. We think that further analysis of MinD concentration changes in growing cells under low-glucose conditions will not change the main conclusion of this manuscript: 'plasticity in the MinD concentration gradient is an intrinsic property of the Min system during cell growth',

      As per the current study a particular I-ratio at the mid-cell is required to initiate the cell division. In the case of cells growing at low glucose, how this required I-ratio is achieved at the mid-cell?

      ANS: Thank you for the excellent question. As described in the main text, lines 199-201, I_Ratio is defined as the ratio of the minimum intensity to the maximum intensity measured from the experimental data, which gradually decreases as the cell length increases (Fig. 3C). Since the minimum and maximum intensities were measured from the concentration gradient, which is characterized by the slope of the concentration gradient (λ_N), there exists a correlation between I_Ratio and λ_N. That is, a larger λ_N will result in a smaller I_Ratio, and vice versa. When comparing measurements made from cells grown with 0.4% and 0.1% glucose (Fig. 3B, C, E, F), the changes in λ_N are more drastic within a shorter length under low-glucose condition, which is accompanied by more drastic changes in I_Ratio. Furthermore, when the I_Ratio value was approximately 0.5, the corresponding cell length was significantly shorter under low-glucose condition. Therefore, we speculate that there may be an effective I_Ratio that is low enough for stable FtsZ ring formation. This effective I_Ratio can occur at any cell length, allowing us to see that bacteria divide at shorter cell lengths under low-glucose conditions. This property necessitates a faster reduction in the concentration gradient to reach the effective I_Ratio for cells dividing at shorter lengths. As a result, by adjusting λ_N as a function of length, the steepness of the I_Ratio reduction can be altered. Please see the main text, lines 389-406.

      There is decrease in the MinD oscillation time observed in low glucose condition. As explained by the authors the MinD oscillation is mainly guided by the FtsE induced removal of MinD from the membrane, how the authors can explain this decrease?

      ANS: Thank you for raising the question of how the MinE-induced detachment of membrane-bound MinD contributes to the oscillation time of MinD under low-glucose conditions. Although this is an interesting question, determining what regulates MinE-induced detachment of membrane-bound MinD under low-glucose conditions is beyond the scope of the current study. This unknown regulatory mechanism that regulates MinD-MinE interactions in growing cells under low glucose conditions is worthy of further investigation. However, our modelling results have provided a theoretical view of how oscillation features may be controlled by different molecular interactions between MinD and MinE and may guide future experiments investigating the underlying mechanism involved. Please refer to the Results section: 'Spatiotemporal distribution of the concentration gradient' in the main text, lines 351-373.

      Further, it is explained that the concentration of cellular ATP is in much higher concentration compared to the required amount for this oscillation. As the Iratio is majorly dependent on the cell length, what could be the reason for the differential N in the case of low and high glucose condition?

      ANS: Please refer to the previous answer to the question: 'As per the current study a particular I-ratio at the mid-cell is required to initiate the cell division. In the case of cells growing at low glucose, how this required I-ratio is achieved at the mid-cell?'. (this letter, Lines 764-779) In addition, our modelling in search of parameter sets that generate characteristics of MinD oscillation resembling oscillation in vivo allowed us to evaluate the impact of different molecular interactions, as represented by different rate constants (Fig. 4), which has provided important information for future mechanistic investigations, although not in the present study. Please see the Results section: 'Effect of the kinetic rate constant on the MinD concentration gradient' in the main text, lines 323-349.

      MinD is a highly insoluble protein. It also has an amphipathic helix and thus most of the time it binds to the membrane. The method used by the author to determine the cellular MinD concentration (mentioned in Fig S1) will only give the concentration of the soluble MinD and not of the total MinD. How the authors justify this as the total concentration. This is also the same in the case of MinE copy number calculation. Authors may need to perform the transcriptome analysis and compare both the data.

      ANS: We thank the reviewer for the comments. Since the attachment of MinD and MinE to the membrane is transient and MinD-membrane interactions require ATP, we expected that most of the protein would be released from the membrane into the cytoplasm after cell disruption, sufficiently representing the total MinD concentration. Furthermore, our measurements of molecule numbers are within the range of previous measurements (Di Ventura & Sourjik, 2011; Juarez & Margolin, 2010; Meacci & Kruse, 2005; Tostevin & Howard, 2006; Touhami et al, 2006). Thus, we believe that our current measurements are reliable and sufficient for subsequent interpretation.

      One of the main question asked by the authors in the abstract is. "How the intracellular Min protein concentration gradients are coordinated with cell growth to achieve spatiotemporal accuracy of cell division is unknown". Although the authors have shown that there is a change in concentration gradient during cell growth, the mechanism for the same is not very well explained. Authors have not provided any specific explanation for the increase in the velocity of the MinD oscillation and the gradient formation. How the velocity of MinD is increasing although there is no increase in the MinD concentration.

      ANS: We have changed 'the mechanism' to 'the exact way' in the abstract (Abstract, line 28). Moreover, in the revised manuscript, we have improved the mathematical model and performed a thorough investigation of the variations in the kinetic constants. This effort has provided us with a solid theoretical view of how oscillation features may be controlled by different molecular interactions. The results may guide future experiments investigating the underlying mechanism involved. Please refer the answers to previous questions above.

      Figure 2B: shows the overall concentration of MinD in a single cell varies between 1180 - 1160 molecules/um2. In Fig 2C it is mentioned that mid-cell has a MinD concentration of 120-20 molecuels/ um2. Further, Fig3C and 3F shows I-ratio values varies between 0.6-0.4. Considering the values given the I-ratio (I min/ I max) should be between 0.1- 0.01. Authors need to explain the same. Figure 2C: The data in both the Y-axes are not matching and needs more clarification in the legend. Whether the number of molecules were counted only in the marked 200 nm area? If so, why the Y-axis 1 (molecules/um2) is decreasing 7 times, whereas, Y-axis 2 (molecules) is only by 2 times.

      ANS: In this work, we measured sfGFP-MinD intensity through fluorescence microscopy. The fluorescence intensity was converted into molecular numbers based on estimates from Western blot analyses (Fig. S1). This number of molecules for MinD and MinE was assumed to be the mean number, which was fit into the midpoint of the doubling time (Fig. 2B, black dashed line; main text, lines 166-167). Fig. 2C was obtained by further processing the same dataset to restrict the region of analysis to the midcell zone. Please refer to the main text, lines 158-178. However, the λ_N and I_Ratio values were calculated from the processed intensity data (Fig. S2; main text, lines 190-209, 533-559). Because of the conversion from intensity to molecule number in Figs. S2B, C and the image processing procedure applied to the calculation of λ_N and I_Ratio, it is not possible to directly compare the fold change and the upper and lower limits between molecule numbers and the λ_N and I_Ratio values.

      Other comments: Line 84: Requires reference for this statement.

      ANS: A recent review article has been added in the main text, line 84: '(Cameron & Margolin, 2024)'.

      Line 96: Can authors provide other evidence or validation for the determination of the copy numbers such as transcriptome analysis.

      ANS: We thank the reviewer for this suggestion. However, we believe that direct measurement of cellular protein abundance is reliable and sufficient for our purposes. Furthermore, transcriptome-measured RNA abundance does not translate directly to protein abundance in living cells because posttranscriptional processing, translation, posttranslational processing, and protein stability issues complicate the interpretation. Therefore, protein abundance measurement from cell extracts is straightforward for our purpose.

      Fig 1C: what is the units of time in Fig 1C? Is it equal for all the cell lengths?

      ANS: As described in the main text, lines 511-512, 'Time-lapse images of sfGFP-MinD were acquired at 12-sec intervals for 10 min or before the fluorescence diminished'. This condition is applied to all the acquired images in this work.

      Page 6, line 136-138: what could be the possible mechanism for change in velocity at different cell cycle time?

      ANS: To avoid confusion, we have modified the text and tone down the velocity when mentioned. This is because the mentioned velocity is inferred from the measured oscillation period and cell length but not from direct measurements; our emphasis is on understanding how the oscillation period remains fairly stable during cell growth rather than how the velocity changes. In the revised manuscript, we used modelling results to elucidate the possible mechanism related to period maintenance. The corresponding text and illustration are provided in the Results section (Lines 300-373) and the Discussion section of the main text (Lines 407-446) and Figs. 4, 5. In brief, this simulation allowed us to probe for general behaviours of the system, allowing us to obtain a few parameter sets that generate features of the oscillation period, λ_N and I_Ratio highly mimicking MinD oscillation in the cellular context (Fig 4C, S7-9). We further tested the impact of different kinetic constants, k_de, k_dD, k_dE, k_D, and k_(ADP→ATP), which represent different molecular interactions influencing the oscillation period, λ_N and I_Ratio (Fig 4D-H). This effort has provided us with a solid theoretical view of how oscillation features may be controlled by different molecular interactions. Please see the Results section: 'Effect of the kinetic rate constant on the MinD concentration gradient' in the main text, lines 323-349.

      Page 7, line 155: Any evidence for claiming the same?

      ANS: The sentence has been modified as follows: 'Thus, the fairly stable oscillation period and variable velocity did not change the precision of the septum placement.' (Main text, lines 155-156)

      Page 7, line 156: Is there any proof authors can show that burst MinD synthesis occurs during the division? If not in the case of MinD, is it shown in any other protein?

      ANS: The text is now in line 168-171: 'Interestingly, the value after division was not doubled, which could indicate a balanced outcome between de novo synthesis and degradation or a burst of MinD synthesis at cell division followed by constant synthesis.' In previous studies by Männik et al. (2018) (Mannik et al, 2018) and Vischer et al. (2015) (Vischer et al, 2015), the division protein FtsZ increased the cellular concentration throughout the cell cycle under slow growth conditions and degraded rapidly at the end of the cell cycle, a process controlled by the ClpXP protease. Because we do not know the relevance of these observations to our study, which focused on the plasticity of the MinD concentration gradient, we decided not to discuss them in the manuscript.

      Page 9, line 217: The Fig 4A is not explained clearly and all the terms mentioned needs to be explained. This figure is used to explain the differential concentration of MinD at the poles and the mid-cell, thus needs to be explain more clearly.

      ANS: Thank you for your comments. Please refer to the above answer to the question: 'One of my major concern is these interactions are not shown experimentally but explained using either the previously published literature or mathematical models. Further, the previous literatures are shown on in vitro models which does not mimic the in vivo system fully.', in this letter, lines 691-715.

      Page 12, line 285: What is meaning of default speed of MinD oscillation in new-born cells? Do the authors observed any specific velocity in the new-born cells? What is the explanation for length dependent oscillation velocity for MinD?

      ANS: Thank you for the questions. As mentioned earlier, the emphasis of this study is on understanding how the oscillation period remains relatively stable while showing plasticity of the concentration gradient during cell growth. The velocity is inferred from the oscillation period and cell length but is not a direct measurement. To avoid confusion, we have modified the text and placed less emphasis on the velocity when mentioned.

      Reviewer #3 (Significance (Required)):

      General assessment: Major work of the manuscript is relying on the mathematical models, whereas the audience are majorly from the biology fields and thus simplified explanations are required in many places. Many of the legends in the figures require more explanation for better understanding. If possible more experimental data can be added, specifically to explain the model mentioned in figure 4A.

      ANS: We have modified the figure legends to include more explanations. As mentioned above, we have also revised Fig. 4 to include improvements in modelling results to better fit the experimental data and to examine the impacts of the kinetics constants of the reaction steps in the Min system. Please refer to lines 691-715 in this letter.

      Advance: The study is adding to the existing knowledge and will be helpful to fill the conceptual gaps in understanding the mid-cell MinD concentration and what may favor the initiation of bacterial division. Audience: Majorly the microbiology community will be interested in the study. This will also be interest to Physicists and mathematical persons working to understand bacterial division.

      ANS: We thank the reviewer for this positive comment.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      The study by Parada et al. illuminates the intricate interplay between Min proteins, exemplified by MinD, and cell growth in E. coli. Their findings demonstrate that the MinD concentration gradient steepens progressively as cells elongate, potentially influencing FtsZ ring formation via MinC. Moreover, their comprehensive reaction-diffusion model not only corroborates experimental observations of length-dependent concentration gradients but also underscores the critical role of kinetic interactions involving Min proteins, the membrane, and ATP. This elucidation significantly advances our understanding of the oscillatory mechanisms within the Min system. Both the experimental and simulation data are robust, and the manuscript is exceptionally well-written. I express my full support for publication pending the satisfactory resolution of the outlined concerns.

      ANS: We appreciate the reviewer's positive feedback and have addressed most issues to the best of our ability.

      1. Remove the dot in front of "Min" in line 57.

      ANS: This has now been removed.

      1. In lines 82-84, the statement "...The distribution of the division inhibitor MinC may be synchronized with spatiotemporal differences in MinD concentrations, leading to a stable placement of the FtsZ ring at the midcell..." suggests a potential synchronization between MinC and MinD oscillations. It is crucial to investigate if sfGFP-MinC exhibits similar concentration gradient oscillatory behavior in vivo as observed with MinD.

      ANS: Thank you for bringing up this question. The key finding of our study, involving experimental measurements and mathematical modelling, is plasticity in the MinD concentration gradient, which results from spatial differences in molecular interactions and is an intrinsic property of the Min system during cell growth. With many investigations already covered in this manuscript, we prefer to investigate sfGFP-MinC in future studies, which will have different focuses on how MinC dynamics are coupled with the variable MinD concentration gradient to directly impact FtsZ ring formation.

      1. Ensure consistent significant digits throughout the text. For instance, 1.95{plus minus}0.16 μM in line 97, 1.4{plus minus}0.13 μM in line 98, and 1.9 {plus minus} 0.2 μM in line 100 have varying precision. Consider using integers for molecules.

      ANS: We have corrected the significant digits in the main text and supplemental information.

      1. Address the discrepancy in expression levels of MinD and MinE between strain FW1541 and its parental strain W3110. Given the labeling effect, it is possible that MinD expression levels differ. However, MinC's expression level should be approximately the same. Conduct whole-genome sequencing of both strains to identify any additional mutations.

      ANS: Thank you for the comments. As described in the main text (Lines 67-70), the most important aspect is the concentration ratio between MinD and MinE. Although the numbers are not the same, they are comparable to those in previous studies (Hale et al, 2001; Li et al, 2014; Schmidt et al, 2016; Shih et al, 2002) (Main text, lines 113-115). Furthermore, we performed whole-genome sequencing of the W3110 and FW1541 strains. We confirmed that sfGFP was correctly inserted. The sequence alignment of the minCDE locus is provided for your reference but not for publication. Although there are some sporatic point mutations, there is no obvious reason to believe that the mutations would impact Min protein expression. We will organize the deposition data as soon as I can.

      1. Clarify the apparent discrepancy between lines 112 and 127. Line 112 suggests that the periodic regularity of interpolar oscillations increases with cell length, as demonstrated in Fig 1B-C, 1E, Fig S5. However, in the subsequent section (starting from line 127), the authors state that oscillation periods remain relatively stable across cells of different lengths. Provide clarification on this apparent discrepancy.

      ANS: Thank you for pointing out this confusion caused by misuse of the term. In Lines 122-123, the statement has been modified as follows: '...the uniformity of the oscillation intervals appears to increase with length...' In line 139, 'The oscillation period' refers to the time required for the oscillation cycle. Since the correction in line 123 should suffice to clarify, we did not modify the statement in line 139.

      1. Specify if the analysis was limited to non-constricted cells. If so, state this explicitly in the text, as it could impact the interpretation of results, especially in relation to the linear dependence of cell length on time before constriction, as shown in Fig S3C.

      ANS: We did not specifically remove those constricted cells, but cells before splitting were considered one cell. We have added a statement to clarify in Lines 144-145.

      1. Improve clarity in Fig 2A by using distinct colors (e.g., green and red) for differentiation on the Y-axis.

      ANS: The Y axes of Fig. 2A have been modified.

      1. Correct "of" to "from" in line 223 for improved clarity and accuracy.

      ANS: Corrected.

      1. Include the missing "A" in Fig S6A for completeness and accuracy.

      ANS: This figure has been updated.

      1. Ensure consistency in referencing style (full names versus short names) throughout the manuscript.

      ANS: This has now been done.

      Reviewer #4 (Significance (Required)):

      While numerous commendable in vitro studies have explored the oscillatory behavior of the Min system, this work uniquely delves into the oscillation of MinD within live cells. It unveils the remarkable coordination between intracellular Min protein concentration gradients and cell growth, shedding light on the precise spatiotemporal regulation of cell division.

      ANS: We thank the reviewer for this positive comment.

      References Di Ventura B, Sourjik V (2011) Self-organized partitioning of dynamically localized proteins in bacterial cell division. Molecular systems biology 7: 457 Fischer-Friedrich E, Meacci G, Lutkenhaus J, Chate H, Kruse K (2010) Intra- and intercellular fluctuations in Min-protein dynamics decrease with cell length. Proceedings of the National Academy of Sciences of the United States of America 107: 6134-6139 Hale CA, Meinhardt H, de Boer PA (2001) Dynamic localization cycle of the cell division regulator MinE in Escherichia coli. The EMBO journal 20: 1563-1572 Juarez JR, Margolin W (2010) Changes in the Min oscillation pattern before and after cell birth. Journal of bacteriology 192: 4134-4142 Li GW, Burkhardt D, Gross C, Weissman JS (2014) Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157: 624-635 Mannik J, Walker BE, Mannik J (2018) Cell cycle-dependent regulation of FtsZ in Escherichia coli in slow growth conditions. Molecular microbiology 110: 1030-1044 Meacci G, Kruse K (2005) Min-oscillations in Escherichia coli induced by interactions of membrane-bound proteins. Phys Biol 2: 89-97 Meacci G, Ries J, Fischer-Friedrich E, Kahya N, Schwille P, Kruse K (2006) Mobility of Min-proteins in Escherichia coli measured by fluorescence correlation spectroscopy. Phys Biol 3: 255-263 Schavemaker PE, Boersma AJ, Poolman B (2018) How Important Is Protein Diffusion in Prokaryotes? Front Mol Biosci 5: 93 Schmidt A, Kochanowski K, Vedelaar S, Ahrne E, Volkmer B, Callipo L, Knoops K, Bauer M, Aebersold R, Heinemann M (2016) The quantitative and condition-dependent Escherichia coli proteome. Nature biotechnology 34: 104-110 Shih YL, Fu X, King GF, Le T, Rothfield L (2002) Division site placement in E. coli: mutations that prevent formation of the MinE ring lead to loss of the normal midcell arrest of growth of polar MinD membrane domains. The EMBO journal 21: 3347-3357 Tostevin F, Howard M (2006) A stochastic model of Min oscillations in Escherichia coli and Min protein segregation during cell division. Phys Biol 3: 1-12 Touhami A, Jericho M, Rutenberg AD (2006) Temperature dependence of MinD oscillation in Escherichia coli: running hot and fast. Journal of bacteriology 188: 7661-7667 Vischer NO, Verheul J, Postma M, van den Berg van Saparoea B, Galli E, Natale P, Gerdes K, Luirink J, Vollmer W, Vicente M, den Blaauwen T (2015) Cell age dependent concentration of Escherichia coli divisome proteins analyzed with ImageJ and ObjectJ. Front Microbiol 6: 586 Wu F, van Schie BG, Keymer JE, Dekker C (2015) Symmetry and scale orient Min protein patterns in shaped bacterial sculptures. Nature nanotechnology 10: 719-726

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Parada et al. studied both experimentally and theoretically the MinD concentration distribution of Min waves during cell growth. The main finding was that (i) the gradient of MinD is steeper for longer cells and accordingly the MinD concentration at the middle of cell is lower, (ii) period of the oscillation is independent to the cell length, and (iii) those features are shared even under glucose starvation except the MinD gradient is steeper. (iv) Those results are supplemented by the analyses of the reaction-diffusion equations in which parameters that can reproduce the MinD concentration distribution are identified.

      I think the results are interesting; basically, as the cell grows, the contrast of the wave becomes clearer, such the MinD concentration at the cell centre decreases. The results may clarify the mechanism of FtsZ accumulation at the cell centre more quantitatively. The experiments were performed by measuring the fluorescent intensity of MinD during cell growth and analysing the intensity distribution along the long axis of the cell. The theoretical results were based on the analyses of the reaction-diffusion model. Both approaches are already well established and the results sound. Nevertheless, I do not think the novelty of this work is not well highlighted in the current manuscript; I think most of the results, except (iii) and (iv), have already been shown explicitly or implicitly in the previous studies. Min oscillations in a growing cell have been analysed both theoretically and experimentally in (Meacci 2005) and [1]. The concentration distribution and period of the oscillation were measured. The complete results were presented in [2], and I am not aware of those results in scientific journals (the thesis is available online). Nevertheless, I think it is fair to cite those studies and compare the current results with them. In fact, in [2], it was shown that the concentration of MinD near the cell centre decreases as the cell grows, the total MinD concentration is approximately constant during the growth (therefore, the number of the molecules increases), and that the variance of the period becomes smaller as the cell grows. I do not think those previous studies spoil this work, and this work deserves publication somewhere. Still, the authors should highlight the novelty of this study more clearly.

      Major comments:

      (i) In (Meacci 2005) and [1,2], it was claimed that the standard deviation of the period is comparable with the mean period, particularly for the shorter cell. Therefore, they did not claim the period is independent to the cell length. As far as I understood, the variance arises from the variance of the total protein concentration in the assemble of cells. I am wondering how the authors are able to conclude the constant period in different cell length. I also point out that in the theoretical part of (Meacci 2005), the period is, in fact, increasing as the cell grows and suddenly decreases at the length in which cell division occurs.

      (ii) I do not think the explanations of the reaction-diffusion model were well described. The authors mentioned that they studied a one-dimensional model and used the delta function to describe the membrane reaction. Did the authors study 1D cytosol and 0D membrane? Then, why the surface diffusion term exists in (4) and (5)? I believe the authors simply assumed that both the membrane and the cytosol are 1D (with larger diffusion constants for cytosolic Min concentrations). Then, the delta functions in (1)-(5) are not necessary. In (Wu 2015), the delta function was used in order to treat a 2D membrane embedded in 3D space.

      Besides that, there is no description of the initial conditions for the concentration fields to solve the reaction-diffusion equations. I think the description of the no-flux boundary condition is better put in the Methods rather than supplementary materials.

      (iii) As in the previous comment, the current model did not take into account the geometry of the system; namely, cytosol is in 3D and membrane is on 2D. Recent theoretical studies can handle the effect, and also the effect of confinement. I would appreciate it if the authors would make a comment on whether those issues are relevant or not for the conclusion of this work.

      (iv) I would appreciate it if the authors would describe the screening process more clearly. I did understand the first screening is a finite imaginary part and a positive real part at the first mode of spatial inhomogeneity in the eigenvalues. However, I did not understand the other processes clearly. The second screening is based on \lambda_N and I_Ratio, but its criteria is not clear. I think both quantities fluctuated in experimental results and I am not sure what to define numerical results match them.

      The third process is based on a fitting error using the fitting function of linear increase plus a constant. I am not sure why we need to exclude, for example, the bottom right example in Fig.S6 because it shows no oscillation until the cell length of 3um but then the gradient linearly increases. Please clarify how to justify the criteria. The same argument applies to the fourth screening process. It is not clear why the slope should be smaller than 2.

      (v) The authors claimed that the steeper gradient of MinD under glucose starvation results in cell division for shorter cells. I do not think the claim is convincing. It is necessary to measure the correlation between the length at the cell division and the gradient. It would also be nicer to show the correlation under other parameters. I think those studies truly support the authors' claim and the novelty of this work.

      (vi) The conclusion at Line 346 "This plasticity arises from spatial differences in molecular interactions between MinD and MinE, as demonstrated..." looks unclear to me. My understanding is that (i) by screening the randomly sampled parameters in the reaction-diffusion model, the authors found the parameters that "match" experimental results, and (ii) the parameters after screening show the correlation between them (k_dD-k_dE and k_D-k_ATP->ADP). The logic heavily relies on the reaction-diffusion model is quantitatively correct. First, I think it is better to explain the logic more explicitly, that is, the claim of the molecular interaction is not based on the experimental facts. Second, I personally think the reaction-diffusion model used in this work does not reproduce quantitatively the experimental results, as discussed in (iii) and also (iv). Please make some discussions on how to justify the comparison between the model and experiments.

      (vii) I did not capture the point why the authors can claim "... further distinguishing in vivo and in vitro observations. " at Line 350. I did not find the results comparing with vitro studies. I would appreciate a demonstration of vitro results and/or references.

      Minor comments:

      1. Line 214: It should be "Fange and Elf".
      2. I think it is better to show sampled points in Fig.4C and 4D to show how dense the authors sampled in the parameter space.

      REFERENCES:

      [1] Fischer-Friedrich, Elisabeth / Meacci, Giovanni / Lutkenhaus, Joe / Chaté, Hugues / Kruse, Karsten, "Intra- and intercellular fluctuations in Min-protein dynamics decrease with cell length", Proceedings of the National Academy of Sciences, 107, 6134-6139 (2010).

      [2] Meacci, Giovanni, "Physical Aspects of Min Oscillations in Escherichia Coli", PhD thesis (2006) available at https://www.pks.mpg.de/fileadmin/user_upload/MPIPKS/group_pages/BiologicalPhysics/dissertations/GiovanniMeacci2006.pdf

      Significance

      General assessment:

      I think the strength of this study is that it potentially shows the quantitative correlation between the MinD concentration gradient during the oscillation and the cell length when it divides. However, the current data of glucose starvation is not convincing enough. The model parts are interesting but their connection to the experiments is not clear in the current manuscript.

      Advance:

      The advance of this study is to measure the MinD concentration gradient under glucose starvation, and to compare the experimental results with the (simplified) model under a wide range of parameters. I do not think the advance in the current manuscript looks conceptual level because the conceptual conclusions are not really convincing from the results. In this respect, the advance of this work may be technical.

      Audience:

      As a theoretician working on biophysics, including the model of the Min system, I think a specialised audience would be interested in this study. People who are studying the mechanism of the Min oscillation and resulting cell division, particularly those who are interested in both experiments and models, would be interested in this work. For the broad audience, I do not think the novelty of this study is well described.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This structural and biochemical study of the mouse homolog of acidic mammalian chitinase (AMCase) enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments. The methods and analysis of data are solid, providing several lines of evidence to support a development of mechanistic hypotheses. While the findings and interpretation will be valuable to those studying AMCase in mice, the broader significance, including extension of the results to other species including human, remain unclear.

      Public Reviews:

      Reviewer #1 (Public Review):

      General comments:

      This paper investigates the pH-specific enzymatic activity of mouse acidic mammalian chitinase (AMCase) and aims to elucidate its function's underlying mechanisms. The authors employ a comprehensive approach, including hydrolysis assays, X-ray crystallography, theoretical calculations of pKa values, and molecular dynamics simulations to observe the behavior of mouse AMCase and explore the structural features influencing its pH-dependent activity.

      The study's key findings include determining kinetic parameters (Kcat and Km) under a broad range of pH conditions, spanning from strong acid to neutral. The results reveal pH-dependent changes in enzymatic activity, suggesting that mouse AMCase employs different mechanisms for protonation of the catalytic glutamic acid residue and the neighboring two aspartic acids at the catalytic motif under distinct pH conditions.

      The novelty of this research lies in the observation of structural rearrangements and the identification of pH-dependent mechanisms in mouse AMCase, offering a unique perspective on its enzymatic activity compared to other enzymes. By investigating the distinct protonation mechanisms and their relationship to pH, the authors reveal the adaptive nature of mouse AMCase, highlighting its ability to adjust its catalytic behavior in response to varying pH conditions. These insights contribute to our understanding of the pH-specific enzymatic activity of mouse AMCase and provide valuable information about its adaptation to different physiological conditions.

      Overall, the study enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments.

      Reviewer #2 (Public Review):

      Summary:

      In this study of the mouse homolog of acidic mammalian chitinase, the overall goal is to provide a mechanistic explanation for the unusual observation of two pH optima for the enzyme. The study includes biochemical assays to establish kinetic parameters at different solution pH, structural studies of enzyme/substrate complexes, and theoretical analysis of amino acid side chain pKas and molecular dynamics.

      Strengths:

      The biochemical assays are rigorous and nicely complemented by the structural and computational analysis. The mechanistic proposal that results from the study is well rationalized by the observations in the study.

      Weaknesses:

      The overall significance of the work could be made more clear. Additional details could be provided about the limitations of prior biochemical studies of mAMC that warranted the kinetic analysis. The mouse enzyme seems unique in terms of its behavior at high and low pH, so it remains unclear how the work will enhance broader understanding of this enzyme class. It was also not clear can the findings be used for therapeutic purposes, as detailed in the abstract, if the human enzyme works differently.

      We have edited the paper to address these concerns

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Regarding the pH profiles of mouse AMCase, previous studies have reported its activity at pH 2.0 and within the pH range of 3-7. In this paper, the authors conducted kinetic measurements and showed that pH 6.5 is optimal for kcat/Km. The authors emphasize the significance of mouse AMCase's activity in the neutral region, particularly at pH 6.5, for understanding its physiological relevance in humans. To provide a comprehensive overview, it would be valuable for the authors to summarize the findings from previous and current studies, discuss their implications for future pulmonary therapy in humans, and cite relevant literature. Additionally, the authors should highlight their research's specific contributions and novel findings, such as the determination of kinetic parameters (Kcat and Km) under different pH conditions. Emphasizing why previous studies may have required these observations and underscoring the importance of the present findings in addressing those knowledge gaps will help readers understand the significance of the study and its impact on the field of enzymology.

      We thank the reviewer for this comment. In keeping with the knowledge gaps addressed directly by this paper, we have not augmented the discussion of future pulmonary therapy in humans. We have summarized the present findings at the end of the introduction as follows:

      “We measured the mAMCase hydrolysis of chitin, which revealed significant activity increase under more acidic conditions compared to neutral or basic conditions. To understand the relationship between catalytic residue protonation state and pH-dependent enzyme activity, we calculated the theoretical pKa of the active site residues and performed molecular dynamics (MD) simulations of mAMCase at various pHs. We also directly observed conformational and chemical features of mAMCase between pH 4.74 to 5.60 by solving X-ray crystal structures of mAMCase in complex with oligomeric GlcNAcn across this range.”

      (2) Regarding the implications of the pKa values and Asp138 orientation for the pH optima, it would be valuable for the authors to discuss the variations in optimal activity by pH among GH-18 chitinases and investigate the underlying factors contributing to these differences. In particular, exploring the role of Asp138 orientation in chitotriosidase, another mammalian chitinase, would provide important insights. Chitotriosidase is known to be inactive at pH 2.0, and it would be interesting to investigate whether the observed orientation of Asp138 towards Glu140 in mouse AMCase for pH 2.0 activity is lacking in chitotriosidase.

      There are similar rotations of the two acidic residues in the literature on Chit1. The variety of crystal pH conditions and the lack of a straightforward mechanism for pKa shifts in AMCase make it difficult to draw a comparison to why Chit1 is inactive at low pH, but this is an interesting area for future study. See a more full discussion in: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760363/

      Furthermore, considering the lower activity of human AMCase at pH 2.0, it would be worthwhile to examine whether the Asp138 orientation towards Glu140, as observed in mouse AMCase, is also absent in human AMCase. Exploring this aspect will help determine if the orientation of Asp138 plays a critical role in pH-dependent activity in human AMCase.

      The situation for hAMCase is similar to Chit1 as the rotations observed here for mAMCase are also present. It is not the whether Asp138 can rotate, but rather the relevant energetic penalties as we discuss in the manuscript.

      (3) In a previous study by Okawa et al.(Loss and gain of human acidic mammalian chitinase activity by nonsynonymous SNPs. Mol Biol Evol 33, 3183-3193, 2016), it was reported that specific amino acid substitutions (N45D, D47N, and R61M) encoded by nonsynonymous single nucleotide polymorphisms (nsSNPs) in the N-terminal region of human AMCase had distinct effects on its chitinolytic activity. Introducing these three residues (N45D, D47N, and R61M) could activate human AMCase. This activation significantly shifted the optimal pH from 4-5 to 2.0.

      Considering the significant impact of these amino acid substitutions on the pH-dependent activity of human AMCase, the authors should discuss this point in the manuscript's discussion section. Incorporating the findings and relating them to the current study's observations on pH optima and Asp138 orientation can provide a comprehensive understanding of the factors influencing pH-dependent activity in AMCase.

      We added a citation and dicuss how the mutations identified by this study could potentially shift the pKa of key catalytic residues:

      “Okawa et al identified how primate AMCase lost activity by integration of specific, potentially pKa-shifting, mutations relative to the mouse counterpart42b.”

      (4) To further strengthen the discussion, the authors could explore the ancestral insectivorous nature of placental mammals and the differences in chitinase activity between herbivorous and omnivorous species. Incorporating these aspects would add depth and relevance to the overall discussion of AMCase. AMCase is an enzyme known for its role in digesting insect chitin in the stomachs of various insectivorous and omnivorous animals, including bats, mice, chickens, pigs, pangolins, common marmosets, and crab-eating monkeys 1-7. However, in certain animals, such as dogs (carnivores) and cattle (herbivores), AMCase expression and activity are significantly low, leading to impaired chitin digestion 8. These observations suggest a connection between dietary habits and the expression and activity of the AMCase gene, ultimately influencing chitin digestibility across different animal species 8.

      (1) Strobelet al. (2013). Insectivorous bats digest chitin in the stomach using acidic mammalian chitinase. PloS one 8, e72770.

      (2) Ohno et al. (2016). Acidic mammalian chitinase is a proteases-resistant glycosidase in mouse digestive system. Sci Rep 6, 37756.

      (3) Tabata et al. (2017). Gastric and intestinal proteases resistance of chicken acidic chitinase nominates chitin-containing organisms for alternative whole edible diets for poultry. Sci Rep 7, 6662.

      (4) Tabata et al. (2017). Protease resistance of porcine acidic mammalian chitinase under gastrointestinal conditions implies that chitin-containing organisms can be sustainable dietary resources. Sci Rep 7, 12963.

      (5) Ma et al. (2018). Acidic mammalian chitinase gene is highly expressed in the special oxyntic glands of Manis javanica. FEBS Open Bio 8, 1247-1255.

      (6) Tabata et al. (2019). High expression of acidic chitinase and chitin digestibility in the stomach of common marmoset (Callithrix jacchus), an insectivorous nonhuman primate. Sci. Rep. 9. 159.

      (7) Uehara et al. (2021). Robust chitinolytic activity of crab-eating monkey (Macaca fascicularis) acidic chitinase under a broad pH and temperature range. Sci. Rep. 11, 15470.

      (8) Tabata et al. (2018). Chitin digestibility is dependent on feeding behaviors, which determine acidic chitinase mRNA levels in mammalian and poultry stomachs. Sci Rep 8, 1461.

      This overall point is covered by our brief discussion on diet differences:

      “However, hAMCase is likely too destabilized at low pH to observe an increase in _k_cat. hAMCase may be under less pressure to maintain high activity at low pH due to humans’ noninsect-based diet, which contains less chitin compared to other mammals with primarily insect-based diets42. “

      (5) It is important for the authors to clearly state the limitations of their simulations and emphasize the need for experimental validation or additional supporting evidence. This will provide transparency and enable readers to understand the boundaries of the study's findings. A comprehensive discussion of limitations would contribute to a more robust interpretation of the results.

      We added a sentence to the discussion:

      “Our simulations have important limitations that could be overcome by quantum mechanical simulations that allow for changes in protonation state and improved consideration of polarizability.”

      Minor comments:

      (1) Regarding the naming of AMCase, it is important to accurately describe it based on its acidic isoelectric point rather than its enzymatic activity under acidic conditions based on the original paper (Reference #14 (Boot, R. G. et al. Identification of a novel acidic mammalian chitinase distinct from chitotriosidase. J. Biol. Chem. 276, 6770-6778 (2001)).

      We have made this modification

      (2) In the introduction, providing more context regarding the terminology of acidic mammalian chitinase (AMCase) would be beneficial. While AMCase was initially discovered in mice and humans, subsequent research has revealed its presence in various vertebrates, including birds, fish, and other species. Therefore, it would be appropriate to include the alternative enzyme name, Chia (chitinase, acidic), in the introduction to reflect its broader distribution across different organisms. This clarification would enhance the readers' understanding of the enzyme's taxonomy and facilitate further exploration of its functional significance in diverse biological systems.

      We have made this modification

      (3) The authors mention that AMCase is active in tissues with neutral pHs, such as the lung. However, it is important to consider that the pH in the lung is lower, around 5, due to the presence of dissolved CO2 that forms carbonic acid. The lung microenvironment is known to vary, and specific regions or conditions within the lung may have slightly different pH levels. By addressing the pH conditions in the lungs and their relationship to AMCase's activity, the authors can enhance our understanding of the enzyme's function within its physiological context. A thorough discussion of the specific pH conditions in the lung and their implications for AMCase's activity would provide valuable insights into the enzyme's role in lung pathophysiology.

      To keep the focus on the insights we have made, we have elected not to expand this discussion.

      (4) It would be helpful for the authors to provide more information about the substrate or products of AMCase. The basic X-ray crystal structures used in this study are GlcNAc2 or GlcNAc3, known products of AMCase. Including details about the specific ligands involved in the enzymatic reactions would enhance the understanding of the study's focus.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change the discussion of substrates here.

      (5) The authors should critically evaluate the inclusion of the term "chitin-binding" in the Abstract and Introduction. Suppose substantial evidence or discussion regarding the specific chitin-binding properties of the enzyme or its relevance to the immune response needs to be included. In that case, removing or modifying that statement might be appropriate.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change the discussion of “chitin-binding” here.

      (6) The authors developed an endpoint assay to measure the activity of mouse AMCase across a broad pH range, allowing for direct measurement of kinetic parameters. The authors should provide a more detailed description of the methods used, including any specific modifications made to the previous assay, to ensure reproducibility and facilitate further research in the field. It is important to clearly show the novelty of their endpoint assay compared to previous methods employed in other reports. The authors should also explain how their modified endpoint assay differs from existing assays and highlight its advancements or improvements. This will help readers understand the unique features and contributions of the assay in the context of previous methods.

      We have included a detailed method description and figures already. See also our previous paper by Barad which includes other, related, assays.

      (7) The authors suggest that mouse AMCase may be subject to product inhibition, potentially due to its transglycosylation activity, which can affect the Michaelis-Menten model predictions at high substrate concentrations. However, the reviewer needed help understanding the specific impact of transglycosylation on the kinetic parameters. It would be helpful for the authors to provide a more appropriate and detailed explanation, clarifying how transglycosylation activity influences the kinetic behavior of AMCase and its implications for the observed results.

      The experiments to conclusively demonstrate this are beyond our current capabilities.

      (8) In the Abstract, the authors state, "We also solved high resolution crystal structures of mAMCase in complex with chitin, where we identified extensive conformational ligand heterogeneity." This reviewer suggests replacing "chitin" with "oligomeric GlcNAcn" throughout the text, specifically about biochemical experiments. It is important to accurately describe the experimental conditions and ligands used in the study.

      We have made these changes throughout the manuscript

      (9) In the introduction, the authors mention "a polymer of β(1-4)-linked N-acetyl-D-glucosamine (GlcNAc)". In this case, the letter "N" should be italicized to conform to the proper notation for the monosaccharide abbreviation.

      corrected (and hopefully would have been done so by the copy editor!)

      (10) In the introduction, the authors state, "In the absence of AMCase, chitin accumulates in the airways, leading to epithelial stress, chronic activation of type 2 immunity, and age-related pulmonary fibrosis5,6". It is recommended to clarify that "AMCase" refers to "acidic mammalian chitinase (AMCase)" in this context, as it is the first mention of the enzyme in the introduction.

      We moved that section so that it flows better and is introduced with the full name.

      (11) In the introduction, the authors state, "Mitigating the negative effects of high chitin levels is particularly important for mammalian lung and gastrointestinal health." This reviewer requests further clarification on the connection between chitin and gastrointestinal health. Please provide an explanation or reference to support this statement.

      We have modified this sentence to:

      “Chitin levels can be potentially important for mammalian lung and gastrointestinal health.”

      (12) In the introduction, the authors mention that "Acidic Mammalian Chitinase (AMCase) was originally discovered in the stomach and named for its high enzymatic activity under acidic conditions." It is recommended to include Reference #14 (Boot et al. J. Biol. Chem. 276, 6770-6778, 2001) as it provides the first report on mouse and human AMCase, contributing to the understanding of the enzyme.

      However, it is worth noting that while this paragraph primarily focuses on human tissues, Reference #14 primarily discusses mouse AMCase but also reports on human AMCase. Additionally, References #8 and #9 mainly discuss mouse AMCase. This creates confusion in the description of human and mouse AMCase within the paragraph.

      Considering that this paper aims to focus on the unique features of mouse AMCase, it is suggested that the authors provide a more specific and balanced description of both human and mouse AMCase throughout the main text..

      We have clarified the origin of the name AMCase and the results distinguish the two orthologs in the text with h or mAMCase.

      (13) Figure 1A in the Introduction section has been previously presented in several papers. The authors should consider moving this figure to the Results section and present an alternative figure based on their experimental results to enhance the novelty and impact of the study.

      We have considered this option, but prefer the original placement.

      (14) In the Results section, the authors mentioned, "Prior studies have focused on relative mAMCase activity at different pH18,20, limiting the ability to define its enzymological properties precisely and quantitatively across conditions of interest." It would be beneficial for the authors to include reference #14, the first report showing the pH profile of mouse AMCase, to support their statement.

      We have added this reference

      (15) Regarding the statement, "To overcome the pH-dependent fluorescent properties of 4MU-chitobioside, we reverted the assay into an endpoint assay, which allowed us to measure substrate breakdown across different pH (Supplemental Figure 1A)", the authors should provide a more detailed description of the improvements made to measure AMCase activity. Additionally, it would be helpful to include a thorough explanation of the figure legend for Supplementary Figure 1A to provide clarity to readers.

      We have included a detailed method description and figures already. See also our previous paper by Barad which includes other, related, assays.

      (16) Figure 1B shows that the authors used the AMCase catalytic domain. It would benefit the authors to explain the rationale behind this choice in the figure legend or the main text.

      This point is addressed in the text:

      “Previous structural studies on AMCase have focused on interactions between inhibitors like methylallosamidin and the catalytic domain of the protein.”

      (17) For Figures 1C-E, it is recommended that the authors include error bars in their results to represent the variability or uncertainty of the data. In Figure 1E, the authors should clarify the units of the Y-axis (e.g., sec-1 µM-1). Additionally, in Figure 1F, the authors should explain how the catalytic acidity is shown.

      We have added error bars and axis labels. Figure 1F is conceptual, so we are leaving it as is.

      (18) The authors stated, "These observations raise the possibility that mAMCase, unlike other AMCase homologs, may have evolved an unusual mechanism to accommodate multiple physiological conditions." It would be helpful for the authors to compare and discuss the pH-dependent AMCase activity of mouse AMCase with other AMCase homologs to support this statement.

      That is an excellent idea for future comparative studies, but beyond the scope of what we are examining in this paper.

      (19) The authors should explain Supplemental Figures 1B and C in the Results or Methods sections to provide context for these figures.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change these sections.

      (20) Supplemental Figure 3 is missing any description. It would be important for the authors to include a mention of this figure in the main text before Supplemental Figure 4 to guide the readers.

      The full legend is in there now and the reference to Supplemental 4 was mislabeled.

      (21) For Supplemental Figure 4, the authors should explain the shape of the symbol used in the figure. Additionally, they should explain "apo" and "holoenzyme" in the context of this figure.

      Unclear what a shape means in this context - perhaps the confusion arises because these are violin plots showing distributions.

      (22) Table 1 requires a more detailed explanation of its contents. Additionally, Tables 2 and 3 need to be included. The authors should include these missing tables in the revised version and explain their contents appropriately.

      Table 1 is the standard crystallographic table - there isn’t much more detailed explanation that can be offered. Tables 2 and 3 were not transferred properly by BioRxiv but were included in the review packet as requested a day after submission.

      (23) In Figure 4, it would be beneficial to enlarge Panels A-C to improve the ease of comprehension for readers. Additionally, it is recommended to use D136, D138, and E140 instead of D1, D2, and E to label the respective parts. The authors should also explain the meaning of the symbol used in the figure.

      Since it is a minor comment, we have elected not to change these figures.

      (24) In Figure 5, it would be beneficial to enlarge Panels A-C to improve the ease of comprehension for readers.

      Since it is a minor comment, we have elected not to change these figures.

      (25) Similarly, in Figure 6, all panels should be enlarged to enhance the ease of comprehension for readers.

      Since it is a minor comment, we have elected not to change these figures.

      Reviewer #2 (Recommendations For The Authors):

      In general, I did not identify many detailed or technical concerns with the work. A few items for the authors to consider are listed below.

      (1) The interpretation of the crystallographic datasets seems complicated by the heterogeneity in the substrate component. It might be nice to see more critical analysis of the approach here. Are there other explanations or possible models that were considered? Do other structures of chitinases or other polysaccharide hydrolases exhibit the same phenomenon?

      We have tried in writing it to provide a very critical approach to this and it is quite likely that other structures contain unmodeled density containing similar heterogeneity (but it is just unmodeled).

      (2) It would be ideal to include more experimental validation of the proposed mechanism. Much of the manuscript includes theoretical validations (pKa estimation, dynamics, etc) - but it would be optimal to make an enzyme variant or do an experiment with a substrate analog.

      Yes - we agree that follow on experiments are needed to fully test the mechanism and that those will be the subject of future work.

      (3) For an uninitiated reviewer, I think the major issue with this study is that the broader significance of the work and how it fits into the context of other work on these enzymes is not clear. It would be helpful to be more specific about what we know of mechanism from work on other enzymes to help the reader understand the motivation for this study.

      We have added w few additional references, guided by reviewer 1 comments, that should help in this respect.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02393

      Corresponding author(s): Katja Petzold

      1. General Statements [optional]

      We thank the reviewers for recognising the impact of our manuscript. The reviewers noted the novelty of the miRNA bulge structure, the importance of the three observed binding modes and their potential for use in future structure-based drug design, and the possible importance of the duplex release phenomenon. We are also thankful for the relevant and constructive feedback provided.

      Our responses to the comments are written point by point in blue, and any changes in the manuscript are shown in red.

      2. Description of the planned revisions

      In response to Reviewer 1 - major comment 2

      Some of the data is over-interpreted. For example, in Figure 3A, it is concluded that supplementary regions are more important for weaker seeds. Only two 8-mer seeds are present among the twelve target sites and thus it might be difficult to generalize.

      We found the relationship between seed type and the effect of supplementary pairing in our data intriguing. To further investigate this effect, we tested whether it exists in published microarray data from HCT116 cells transfected with six different miRNAs (Linsley et al., 2007; Argawal et al., 2015). Here we found that the for the two miRNAs (miR-103 and miR-106b) where we see an impact of supplementary pairing, the difference is primarily driven by 7mer-m8 seeds.

      Since the effect appears to be specific to the miRNA, we would like to test whether it can be observed for miR-34a in a larger dataset. Therefore, we plan to transfect HEK293T cells with miR-34a and analyse the mRNA response via RNAseq. We will repeat the analysis shown above, using the predicted number of supplementary pairs to categorise the dataset into groups with or without the effect of supplementary pairing. We will then compare the three seed types within these groups.

      In response to Reviewer 2 - minor comment 1, "why was the 34-nt 3'Cy3-labeled miR34a complementary probe shifted up in the presence of AGO?".

      We plan to investigate the upper band, which we hypothesise is a result of duplex release, using EMSA to ascertain whether the band height agrees with the size of the duplex.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1

      Evidence, reproducibility and clarity

      Sweetapple et al. Biophysics of microRNA-34a targeting and its influence on down-regulation

      In this study, the authors have investigated binding of miR-34a to a panel of natural target sequences using EMSA, luciferase reporter systems and structural probing. The authors compared binding within a binary and a ternary complex that included Ago2 and find that Ago2 affects affinity and strengthens weak binders and weakens strong binders. The affinity is, however, generally determined by binary RNA-RNA interactions also in the ternary complex. Luciferase reporter assays containing 12 different target sites that belong to one of three seed-match types were tested. Generally, affinity is a strong contributor to repression efficiency. Duplex release, a phenomenon observed for specific miRNA-target complementarities, seems to be more pronounced when high affinity within the binary complex is observed. Furthermore, the authors use RABS for structural probing either in a construct in CIS or binding by the individual miRNA in TRANS or in a complex with Ago2. They find pronounced asymmetric target binding and Ago2 does not generally change the binding pattern. The authors observe one specific structural group that was unexpected, which was mRNA binding with bulged miRNAs, which was expected sterically problematic based on the known structures. MD simulations, however, revealed that such structures could indeed form.

      This is an interesting manuscript that contributes to our mechanistic understanding of the miRNA-target pairing rules. The combination of affinity measurements, structural probing and luciferase reporters allow for a broad correlation of target binding and repression strength, which is a well-thought and highly conclusive approach. However, there are a number of shortcomings that are summarized below.

      The manuscript is not easy to read and to follow for several reasons. First, many of the sub-Figures are not referenced in the text of the results section (1C, 1D, 2C, 4D), which is somewhat annoying. Figure 4A seems to be mis-labeled. Second, a lot of data is presented in suppl. Figures. It should be considered to move more data into the main text in order to make it easier for readers to evaluate and follow.

      Thank you for bringing this to our attention. We have now revised the figure references accordingly.

      We have relocated gel images of BCL2, WNT1, MTA2 and the control samples from Figure S3 and S4 to the main results (Figure 2A-B) to improve readability and provide controls and details that aid in clear understanding. Additionally, we have relocated panel C from Figure S6 to Figure 2C to enhance the clarity of our rationale for using polyuridine (pU) in our AGO2 binding assays.

      The updated figure is shown below, with changes to the legend marked in red.

      Figure 2. Binary and ternary____ complex binding affinities measured by EMSA. (A) Binary (mRNA:miR-34a) binding assays showing examples of BCL2, WNT1 and MTA2. (B) Ternary (mRNA:miR-34a-AGO2) binding assays showing examples of BCL2, WNT1, MTA2, and the three control targets PERFECT, SCRseed, and SCRall. The Cy5 labelled species is indicated with asterisk (*). F indicates the free labelled species (miR34a or mRNA), B indicates binary complex, and T indicates ternary complex. Adjacent titrations points differ two-fold in concentration, with maximum concentrations stated at the top right. Adjacent titration points for MTA2 differed three-fold to assess a wider concentration range. In theternary assay, miRNA duplex release from AGO2 was observed for amongst others BCL2, WNT1, PERFECT, and SCRseed (band indicated with B), while it was not observed for SCRall and MTA2. See Figures S3 and S4 for representative gel images for all targets. See Supplementary files 2 and 3 for all images and replicates. (C) Titrations with increasing miR-34a-AGO2 concentration against Cy5-labelled SCRall (left) or PNUTS (right) comparing the absence and presence of 20 μM polyuridine (pU) during equilibration. pU acted as a blocking agent, reducing nonspecific binding, as seen by the different KD,app values for SCRall and PNUTS after addition of 20 μM pU. Therefore, all final mRNA:miR-34a-AGO2 EMSAs were carried out in the presence of 20 μM pU. Labels are as stated above. (D) Individual binding profiles for each of the 12 mRNA targets assessed by electrophoretic mobility assay (EMSA). Each datapoint represents an individual experiment (n=3). Blue represents results for the binary complex, and green represents results for the ternary complex. Dotted horizontal lines represent the KD,app values, which are also stated in blue and green with standard deviations (units = nM). Note that the x-axis spans from 0.1 to 100,000 in CCND1, MTA2 and NOTCH2, whereas the remaining targets span 0.1 to 10,000.

      Some of the data is over-interpreted. For example, in Figure 3A, it is concluded that supplementary regions are more important for weaker seeds. Only two 8-mer seeds are present among the twelve target sites and thus it might be difficult to generalize.

      We have revised our wording to recognise that more 8-mer sites would be required to draw a stronger conclusion based on this hypothesis. This hypothesis would be interesting to confirm in a larger dataset but is unfortunately outside of the scope of this paper.

      Our hypothesis also aligns with recent data from Kosek et al. (NAR 2023; Figure 2D) where SIRT1 with an 8mer and 7mer-A1 seed was compared. Only the 7mer-A1 was sensitive to mutations in the central region or switching all mismatched to WC pairs.

      Page 21 now states:

      "This result indicates that the impact of supplementary binding may be greater for targets with weaker seeds, as has been observed earlier in a mutation study of miR-34a binding to SIRT1 (Kosek et al., 2023), although a larger sample size would be needed to confirm this observation."

      Furthermore, we found the relationship between seed type and the effect of supplementary pairing in our data intriguing. To further investigate this effect, we tested whether it exists in published microarray data from HCT116 cells transfected with six different miRNAs (Linsley et al., 2007; Argawal et al., 2015). Here we found that the for the two miRNAs (miR-103 and miR-106b) where we see an impact of supplementary pairing, the difference is primarily driven by 7mer-m8 seeds. We therefore plan to test whether the effect can be observed for miR-34a in a larger dataset. We have outlined our preliminary data and planned experiments in Section 2 - description of the planned revisions.

      I did not understand why the CIS system shown in 4A is a good test case for miR-34a-target binding. It appears very unnatural and artificial. This needs to be rationalized better. Otherwise it remains questionable, whether these data are meaningful at all.

      Thank you for pointing out the need for clearer rationalisation.

      The TRANS construct, where the scaffold carries the mRNA targeting sequence, provides reactivity information for the mRNA side only, while the microRNA is bound within RISC, with the backbone protected by AGO2. Therefore, to gain information on the miR-34a side of each complex we used the CIS construct, which provides reactivity information from both the miRNA and mRNA. We used the miRNA and mRNA reactivities to calculate all possible secondary structures for the binary complex, and then compared these structures to the mRNA reactivity in TRANS to find which structure fitted the reactivity patterns observed in the ternary complex.

      We have included an additional statement in the manuscript to clarify this point on pages 12-13:

      "Two RNA scaffolds were used for each mRNA target; i) a CIS-scaffold: RNA scaffold containing both mRNA target and miRNA sequence separated by a 10 nucleotide non-interacting closing loop, and ii) a TRANS-scaffold: RNA scaffold containing only the mRNA target sequence, to which free miR-34a or the miR-34a-AGO2 complex was bound (Figure 4A). The CIS constructs therefore provided reactivity information on the miRNA side, which is lacking in the TRANS construct, and was used to complement the TRANS data."

      It may be worthwhile noting that a non-interacting 10 nucleotide loop was inserted between then miRNA and mRNA of the CIS constructs, allowing the miRNA and mRNA strands to bind and release freely. The reactivity patterns of each mRNA:miRNA duplex were compared between CIS and TRANS, and showed similar base pairing (Figure 4D). Furthermore, we have previously compared the two scaffolds in our RABS methodology paper (Banijamali et al. 2022), where no differences were observed besides reduced end fraying in the CIS construct.

      For the TRANS experiments, only one specific scaffold structure is used. This structure might impact binding as well and thus at least one additional and independent scaffold should be selected for a generalized statement.

      For each construct, the potential of interaction with the scaffold was tested using the RNAstructure (Reuter & Mathews, 2010)package. Based on the results of this assessment, two different scaffolds were used for our TRANS experiments. The testing and use of scaffolds has now been clarified further on page 13:

      "The overall conformation of each scaffold with the inserted RNA was assessed using the RNAstructure (Reuter & Mathews, 2010) package to ensure that the sequence of interest did not interact with the scaffold. If any interaction was observed between the RNA of interest and the scaffold, then the scaffold was modified until no predicted interaction occurred. The different scaffolds and their sequence details are shown in supplementary information (Table S1)."

      We have previously examined the scaffold's effect on binding and structure during the development of the RABS method. We tested the same mRNA (SIRT1) in separate, independent scaffolds to verify the consistency of the results. An example of this can be found in the supplementary information (Figure S1a) of Banijamali et al. (2022).

      Generally, it would be nice to have some more information about the experiments also in the result section. Recombinant Ago2 is expressed in insect cells and re-loaded with miR-34a, luciferase reporters are transfected into tissue culture cells, I guess.

      We have now stated the cell types used for AGO2 expression and luciferase reporter assays in the results.

      On page 17 we have included:

      "Samples of each of the 12 mRNA targets, as well as miR-34a and AGO2, were synthesised in-house for biophysical and biological characterisation. Target mRNA constructs were produced via solid-phase synthesis while miR-34a was transcribed in vitro and cleaved from a tandem transcript (Feyrer et al., 2020), ensuring a 5' monophosphate group. AGO2 was produced in Sf9 insect cells."

      "To measure the affinity of each mRNA target binding to miR-34a, both within the binary complex (mRNA:miR-34a) and theternary complex (mRNA:miR-34a-AGO2), we optimised an RNA:RNA binding EMSA protocol to suit small RNA interactions. The protocol is loosely based on Bak et al. (2014)36, with major differences being use of a sodium phosphate buffering system so as not to disturb weaker interactions (James et al., 1996; Stellwagen et al., 2000), supplemented with Mg2+ as a counterion to reduce electrostatic repulsion between the two negatively charged RNAs (Misra & Draper, 1998), and fluorescently labelled probes."

      Page 19:

      " We successfully tested various RNA backgrounds, including polyuridine (pU) and total RNA extract (Figure S6B) to block any unspecific binding. Ultimately, we supplemented our binding buffer with pU at a fixed concentration of 20 µM for the ternary assays to achieve the greatest consistency."

      Page 20:

      "Repression efficacy for the 12 mRNA targets by miR-34a was assessed through a dual luciferase reporter assay6. Target mRNAs were cloned into reporter constructs and transfected into HEK293T cells."

      Page 22:

      "To infer base pairing patterns and secondary structure for each of the 12 mRNA:miR-34a pairs, we used the RABS technique (Banijamali et al., 2023) with 1M7 as a chemical probe. All individual reactivity traces are shown in Figure S9. Reactivity of each of the 22 miR-34a nucleotides was assessed upon binding to each of the 12 mRNA targets within a CIS construct, containing both miR-34a and the mRNA target site separated by a non-interacting 10-nucleotide loop. The two RNAs can therefore bind and release freely within the CIS construct and reactivity information is collected from both RNA strands."

      In the first sentence of the abstract, Argonaute 2 should be replaced by Argonaute only since other members bind to miRNAs as well.

      Thank you for recognising this. It has now been corrected.

      Significance

      This is an interesting manuscript that contributes to our mechanistic understanding of the miRNA-target pairing rules. The combination of affinity measurements, structural probing and luciferase reporters allow for a broad correlation of target binding and repression strength, which is a well-thought and highly conclusive approach. However, there are a number of shortcomings.

      We thank the reviewer for recognising the approach and impact of our work. In addition we thank the reviewer for identifying the need for further data to support our conclusions from the luciferase assays, which is something that we plan to address, as described in section 2.



      Reviewer #2

      Evidence, reproducibility and clarity

      Summary: Sweetapple et al. took the approaches of EMSA, SHAPE, and MD simulations to investigate target recognition by miR-34a in the presence and absence of AGO2. Surprisingly, their EMSA showed that guide unloading occurred even with seed-unpaired targets. Although previous studies reported guide unloading, they used perfectly complementary guide and target sets. The authors of this study concluded that the base-pairing pattern of miR-34a with target RNAs, even without AGO2, can be applicable to understanding target recognition by miR-34a-bound AGO2.

      Major comments:

      (Page 11 and Figure S4) The authors pre-loaded miR-34a into AGO2 and subsequently equilibrated the RISC with a 5' modified Cy5 target mRNA. Since properly loaded miR-34a is never released from AGO2, it is impossible for the miR-34a loaded into AGO2 to form the binary complex (mRNA:miR-34a) in the EMSA (guide unloading has been a long-standing controversy). However, they observed bands of the binary complex in Figure S4. The authors did not use ion-exchange chromatography. AGOs are known to bind RNAs nonspecifically on their positively charged surface. Is it possible that most miR-34a was actually bound to the surface of AGO2 instead of being loaded into the central cleft? This could explain why they observed the bands of the binary complex in EMSA.

      Thank you for mentioning this crucial point which has been a focus of our controls. We have addressed this point in four ways:

      Salt wash during reverse IMAC purification. Separation of unbound RNA and proteins via SEC. Blocking non-specific interactions using polyuridine. Observing both the presence and absence of duplex release among different targets using the same AGO2 preparation and conditions.

      Firstly, although we did not use a specific ion exchange column for purification, we believe the ionic strength used in our IMAC wash step was sufficient to remove non-specific interactions. We used A linear gradient with using buffer A (50 mM Tris-HCl, 300 mM NaCl, 10 mM Imidazole, 1 mM TCEP, 5% glycerol v/v) and buffer B (50 mM Tris-HCl, 500 mM NaCl, 300 mM Imidazole, 1 mM TCEP, 5% glycerol) at pH 8. The protocol followed recommendation by BioRad for their Profinity IMAC resins where it is stated that 300 mM NaCl should be included in buffers to deter nonspecific protein binding due to ionic interactions. The protein itself has a higher affinity for the resin than nucleic acids.

      A commonly used protocol for RISC purification follows the method by Flores-Jasso et al. (RNA 2013). Here, the authors use ion exchange chromatography to remove competitor oligonucleotides. After loading, they washed the column with lysis buffer (30 mM HEPES-KOH at pH 7.4, 100 mM potassium acetate, 2 mM magnesium acetate and 2 mM DTT). AGO was eluted with lysis buffer containing 500 mM potassium acetate. Competing oligonucleotides were eluted in the wash.

      As ionic strength is independent of ion identity or chemical nature of the ion involved (Jerermy M. Berg, John L. Tymoczko, Gregory J. Garret Jr., Biochemistry 2015), we reasoned that our Tris-HCl/NaCl/ imidazole buffer wash should have at comparable ionic strength to the Flores-Jasso protocol.

      Our total ionic contributions were: 500 mM Na+, 550 mM Cl-, 50 mM Tris and 300 mM imidazole. We recognise that Tris and imidazole are both partially ionized according the pH of the buffer (pH 8) and their respective pKa values, but even if only considering the sodium and chloride it should be comparable to the Flores-Jasso protocol.

      We have restated the buffer compositions below written the methods section more explicitly to describe this:

      "Following dialysis, any precipitate was removed by centrifugation, and the resulting supernatant was loaded onto a IMAC buffer A-equilibrated HisTrap-Ni2+ column to remove TEV protease, other proteins, and non-specifically bound RNA. A linear gradient was employed using IMAC buffers A and B."

      Secondly, after reverse HisTrap purification, AGO2 was run through size exclusion chromatography to remove any remaining impurities (shown Figure S2B).

      Thirdly, knowing that AGO2 has many positively charged surface patches and can bind nucleic acid nonspecifically (Nakanishi, 2022; O'Geen et al., 2018), we tested various blocking backgrounds to eliminate nonspecific binding effects in our EMSA ternary binding assays. We were able to address this issue by adding either non-homogenous RNA extract or homogenous polyuridine (pU) in our EMSA buffer during equilibration background experiments. This allowed us to eliminate non-specific binding of our target mRNAs, as shown previously in Supplementary Figure S6. We appreciate that the reviewer finds this technical detail important and have moved the panel C of figure S6 into the main results in Figure 2C, to highlight the novel conditions used and important controls needed to be performed. If miR-34a were non-specifically bound to the surface of AGO2 after washing, this blocking step would render any impact of surface-bound miR-34a negligible due to the excess of competing polyuridine (pU).

      Our EMSA results show that, using polyU, we can reduce non-specific interaction between AGO2 and RNAs that are present. And still, duplex release occurs despite the blocking step. It is therefore less likely that duplex release is caused by surface-bound miR-34a.

      Finally, the observation of distinct duplex release for certain targets, but not for others (e.g. MTA2, which bound tightly to miR-34a-AGO2 but did not exhibit duplex release; see Figure 2), argues against the possibility that the phenomenon was solely due to non-specifically bound RNA releasing from AGO2.

      In response to the reviewers statement "Since properly loaded miR-34a is never released from AGO2, it is impossible for the miR-34a loaded into AGO2 to form the binary complex (mRNA:miR-34a)" we would like to refer to the three papers, De et al. (2013) Jo MH et al. (2015), and Park JH et al. (2017), which have previously reported duplex release and collectively provide considerable evidence that miRNA can be unloaded from AGO in order to promote turnover and recycling of AGO. It is known that AGO recycling must occur, therefore there must be some mechanisms to enable release of miRNA from AGO2 to enable this. It is possible that AGO recycling proceeds via miRNA degradation (TDMD) in the cell, but in the absence of enzymes responsible for oligouridylation and degradation, the miRNA duplex may be released. As TDMD-competent mRNA targets have been observed to release the miRNA 3' tail from AGO2 (Sheu-Gruttadauria et al., 2019; Willkomm et al., 2022), there is a possible mechanistic similarity between the two processes, however, we do not have sufficient data to make any statement on this.

      (Page 18 and Figure S5) Previous studies (De et al., Jo MH et al., Park JH et al.) reported guide unloading when they incubated a RISC with a fully complementary target. However, neither MTA2, CCND1, CD44, nor NOTCH2 can be perfectly paired with miR-34a (Figure 1A). Therefore, the unloading reported in this study is quite different from the previously reported works and thus cannot be explained by the previously reported logic. The authors need to explain the guide unloading mechanism that they observed. Otherwise, they might misinterpret the results of their EMSA and RABS of the ternary complex.

      The three aforementioned studies have reported unloading/duplex release. However, they did not only report fully complementary targets in this process.

      De et al. (2013) reported that "highly complementary target RNAs promote release of guide RNAs from human Argonaute2".

      Subsequently, Park et al. (2017) reported: "Strikingly, we showed that miRNA destabilization is dramatically enhanced by an interaction with seedless, non-canonical targets."

      A figure extracted from Figure 5 of Park et al. is shown below illustrating the occurrence of unloading in the presence of seed mismatches in positions 2 and 3 (mm 2-3). Jo et al. (2015) also reported that binding lifetime was not affected by the number of base pairs in the RNA duplex.

      In addition to these three reports, a methodology paper focusing on miRNA duplex release was published recently titled "Detection of MicroRNAs Released from Argonautes" (Min et al., 2020).

      Therefore, we do believe that the previously observed microRNA release is similar to our observation. Here we also correlate it to structure and stability of the complex.

      (Page 20) The authors reported, "it is notable that the seed region binding does not appear to be necessary for duplex release." The crystal structures of AGO2 visualize that the seed of the guide RNA is recognized, whereas the rest is not, except for the 3' end captured by the PAZ domain. How do the authors explain the discrepancy?

      In this manuscript, we intend to present our observations of duplex release. There are many potential relationships between duplex release and AGO2 activity, which we do not have data to speculate upon. Previous studies, such as Park et al. (2017) have also observed non-canonical and seedless targets leading to duplex release, supporting our findings. Additionally, other publications including McGearly et al. (2019) report 3'-only miRNA targets, Lal et al. (2009) have documented seedless binding by miRNA and their downstream biological effects, and Duan et al. (2022) show that a large number of let-7a targets are regulated through 3′ non-seed pairing.

      It is also possible that duplex release is not coupled to classical repression outcomes, and does not need to proceed by the seed, but instead regulates AGO2 recycling before AGO2 enters the quality control mode of recognising the formed seed.

      (Pages 22) The authors mentioned, "It follows that the structure imparted via direct RNA:RNA interaction remains intact within AGO2, highlighting the role of RNA as the structural determinant." A free guide and a target can start their annealing from any nucleotide position. In contrast, a guide loaded into AGO needs to start annealing with targets through the seed region. Additionally, the Zamore group reported that the loaded guide RNA behaves quite differently from its free state (Wee et al., Cell 2012). How do the authors explain the discrepancy?

      The key point we would like to emphasise is that AGO does not seem to alter the underlying RNA:RNA interactions. The bound state in the ternary complex reflects the structure established in the binary complex. We do not aim to claim a specific sequence of events, as this interpretation is not possible from our equilibrium data. Our data indicates that the protein is flexible enough to accommodate the RNA structure that is favoured in the binary complex. This hypothesis is further supported by our MD simulation, which demonstrates the accommodation of a miRNA-bulge structure within AGO2.

      Targets lacking seeds have been identified previously (McGeary et al. 2019, Park et al. 2017, Lal et al. 2009) and can bind to miRNA within AGO. Therefore, there must be a mechanism by which these targets can anneal within AGO, such as via sequence-independent interactions (as discussed in question 3).

      With respect to Wee et al., (2012), which studied fly and mouse AGO2 and found considerable differences between the thermodynamic and kinetic properties of the two AGO2 species. Furthermore, they found different average affinities between the two species, with the fly AGO binding tighter the mouse. Following this logic, it is not unexpected that human AGO2 would have unique properties compared to those of fly and mouse.

      Below is an extract from Wee et al., (2012):

      "Our KM data and published Argonaute structures (Wang et al., 2009) suggest that 16-17 base pairs form between the guide and the target RNAs, yet the binding affinity of fly Ago2-RISC (KD = 3.7 {plus minus} 0.9 pM, mean {plus minus} S.D.) and mouse AGO2-RISC (KD = 20 {plus minus} 10 pM, mean {plus minus} S.D.) for a fully complementary target was comparable to that of a 10 bp RNA:RNA helix. Thus, Argonaute functions to weaken the binding of the 21 nt siRNA to its fully complementary target: without the protein, the siRNA, base paired from positions g2 to g17, is predicted to have a KD ∼3.0 × 10−11 pM (ΔG25{degree sign}C = −30.7 kcal mol−1). Argonaute raises the KD of the 16 bp RNA:RNA hybrid by a factor of > 1011."

      In the Wee et al. (2012) paper, affinity data on mouse and fly AGO2 was collected via filter binding assays, using a phosphorothioate linkage flanked by 2′-O-methyl ribose at positions 10 and 11 of the target to prevent cleavage. They then compared the experimentally determined mean KD and ΔG values for each species to predicted values of an RNA:RNA helix of 16-17 base-pairs. No comparison was made between individual targets, and no experimental data was collected for the RNA:RNA binding. The calculated energy values were made based on a simple helix without taking into account any possible secondary structure features. Considering the different AGO species, alternative experimental setup, modified nucleotides in the tested RNA, and the computationally predicted RNA values compared to the averaged experimental values, we believe there is considerable reason to observe differences compared to our findings.

      We have expanded our discussion on page 27 to the following:

      "An earlier examination of mRNA:miRNA binding thermodynamics by Wee and colleagues (2012) found that mouse and fly AGO2 reduce the affinity of a guide RNA for its target61. Our data indicate that the range of miR-34a binary complex affinities is instead constricted by human AGO2 in the ternary complex - strengthening weak binders while weakening strong binders. The 2012 study reported different average affinities between the two AGO2 species, with the fly protein binding tighter the mouse. Following this logic, it is not unexpected that human AGO2 would have unique properties compared to those of fly and mouse."

      The authors concluded that the range of binary complex affinities is constricted by human AGO2 in the ternary complex - strengthening weak binders while weakening strong binders. This may hold true for miR-34a, but it cannot be generalized. Other miRNAs need to be tested.

      That is true, we have now adjusted the wording to encompass this more clearly, shown below. Testing of further miRNAs is the likely content of future work from us and others.

      "Our data indicate that the range of miR-34a binary complex affinities is instead constricted by human AGO2 in the ternary complex - strengthening weak binders while weakening strong binders."

      Minor comments:

      (Figure S2) Why was the 34-nt 3'Cy3-labeled miR34a complementary probe shifted up in the presence of AGO?

      We believe this observation is also indicative of duplex release. At the time that these activity assays were collected, we were not as aware of the presence of duplex release so did not test it further, assuming it may be due to transient interactions. We plan to investigate this via EMSA and have included this in the planned revisions (section 2).

      2.(Page 17) Does the Cy3 affect the interaction of the 3' end of miR-34 with AGO2?

      miR-34a-3'Cy5 was used for binary experiments only and the reverse experiment was conducted as a control (where Cy5 was located on the mRNA) (Figure S3b), showing no change in affinity/interaction when the probe was switched to the target. For ternary experiments the mRNA target was labelled on the 5' terminus, to make sure there was no interference with loading miR-34a into AGO2.

      A Cy3 labelled RNA probe (fully complementary to miR-34a) was used to detect miR-34a in northern blots, but AGO2 interaction is not relevant here under denaturing conditions.

      Otherwise, the 34-nt slicing probe had Cy3 on the 5 nt 3' overhang and should therefore not interact with AGO.

      1. Several groups reported that overproduced AGOs loaded endogenous small RNAs. The authors should mention that their purified AGO2 was not as pure as a RISC with miR-34a. Otherwise, readers might think that the authors used a specific RISC.

      We have now improved our explanation of the loading efficiency to make it more clear to the reader that our AGO2 sample was not fully bound by miR-34a, and that all concentrations refer to the miR-34a-loaded portion of AGO2. The following text can be found in the results on page 18:

      "The mRNA:miR-34a-AGO2 assay had a limited titration range, reaching a maximum miR-34a-AGO2 concentration of 268 nM due to a 5% loading efficiency (see Figure S2D for loading efficiency quantification). The total AGO2 concentration was thus 20-fold higher than the miR-34a-loaded portion. Further increase in protein concentration was prevented by precipitation. Weaker mRNA targets (CD44, CCND1, and NOTCH2) did not reach a saturated binding plateau within this range, leading to larger errors in their estimated KD,app values. However, reasonable estimation of the KD,app was possible by monitoring the disappearance of the free mRNA probe. Note that we refer to the miR-34a-loaded portion of AGO2 when discussing concentration values for all titration ranges. To ensure AGO2 binding specificity despite low loading efficiency, a scrambled control was used (SCRall; lacking stable base pairing with miR-34a or other human miRNAs according to the miRBase database57). SCRall showed no interaction with miR-34a-AGO2 (Figure 2B)."

      (Figure legend of Figure S5) Binding was assessed "by."

      Thank you for pointing this out, it is now fixed.

      (Page 17) It would be great if the authors could even briefly describe the mechanism by which the sodium phosphate buffer with magnesium does not disturb weaker interactions by citing reference papers.

      We have now added a supplementary methods section to our manuscript and included the description below on page 10:

      "We found that a more traditional Tris-borate-EDTA (TBE) buffer disrupted weaker RNA:RNA binding interactions (Supplementary Methods Figure M1). Borate anions form stable adducts with carbohydrate hydroxyl groups (James et al., 1996) and can form complexes with nucleic acids, likely through amino groups in nucleic bases or oxygen in phosphate groups (Stellwagen et al., 2000). This makes TBE unsuitable for assessment of RNA binding, particularly involving small RNA molecules, which typically have weaker affinities. We therefore adapted our buffer system to a sodium phosphate buffer supplemented with magnesium. Magnesium acts as a counterion to reduce electrostatic repulsion between the two negatively charged backbones by neutralisation (Misra et al., 1998)."

      We have also clarified the buffer adaptions in our results section on page 17:

      The protocol is loosely based on Bak et al. (2014)36, with major differences being use of a sodium phosphate buffering system so as not to disturb weaker interactions(James et al., 1996; Stellwagen et al., 2000), supplemented with Mg2+ as a counterion to reduce electrostatic repulsion between the two negatively charged RNAs(Misra & Draper, 1998), and fluorescently labelled probes. Original gel images and quantification are shown in supplementary Figures S3 and S4. All KD,app values are shown in Supplementary Table 1, and represent the mean of three independent replicates.

      Figure M1. Comparison of Tris-borate EDTA (TBE) and sodium phosphate with magnesium (NaP-Mg2+) buffer systems for EMSA. Cy5-labelled miR-34a and unlabelled CD44 were equilibrated in the two different buffer systems, using the same titration range. No mobility shifts were observed in the TBE system, while clear binding shifts were observed in the NaP-Mg2+ system.

      6.(Page 22) The authors cited Figure 4C in the sentence, "Comparison between CIS and TRANS ..." Is this supposed to be Figure 4D?

      The reviewer was correct in their assumption, and this has now been corrected.

      7.(Figure 6) Readers would appreciate it if the guide and target were colored in red and blue. The color codes have been used in most papers reporting AGO structures. The current color codes are opposite.

      We have now adjusted the colour schemes throughout the manuscript, and Figure 6 has been modified to the following:

      __"Figure 6. The miRNA-bulge structure is readily accommodated by AGO2 as shown by molecular dynamics simulation. __Panel (A) displays a snapshot of the all-atom MD simulation of miR-34a (red) and NOTCH1 (blue) in AGO2. The NOTCH1:miR-34a duplex is shown with AGO2 removed for clarity and is rotated 90{degree sign} to show the miRNA bulge and bend in the duplex. This NOTCH1:miR-34a-AGO2 structure is compared with (B), which shows the crystal structure of miR-122 (orange) paired with its target (purple) via the seed and four nucleotides in the supplementary region (PDB-ID 6N4O17), and (C), which shows the crystal structure of miR-122 (orange) and its target (green) with extended 3' pairing, necessary for the TDMD-competent state (PDB-ID 6NIT19). AGO2 is depicted in grey, with the PAZ domain in green, and the N-terminal domain marked with N. The miRNA duplexes in (B) and (C) feature symmetrical 4-nucleotide internal loops, whereas the NOTCH1 structure in (A) has an asymmetrical miRNA bulge with five unpaired nucleotides on the miRNA side and a 3-nucleotide asymmetry."

      Significance

      This paper will have a significant impact on the field if seed-unpaired targets can indeed unload guide RNAs. The authors may want to validate their results very carefully.

      We thank the reviewer for recognising the significance of duplex release (or guide unloading) from AGO2. We agree that the observations should be tested rigorously and have outlined the actions we took to ensure validity in our AGO2 preparation.

      __Reviewer #3 __

      Evidence, reproducibility and clarity (Required):

      In this manuscript, the authors use a combination of biochemical, biophysical, and computational approaches to investigate the structure-function relationship of miRNA binding sites. Interestingly, they find that AGO2 weakens tight RNA:RNA binding interactions, and strengthens weaker interactions.

      Given this antagonistic role, I wonder: shouldn't there be an 'average' final binding affinity? Furthermore, if I understand correctly, not many trends were observed to correlate binding affinity with repression, etc.

      Overall, there was no 'average' final binding affinity observed, as the binary assays had a much higher maximum (NOTCH2binary affinity was within the micromolar range) skewing the mean average of the binary affinities to 657 nM, versus 111 nM for the ternary affinities. We also compare the variances of the binary and ternary affinity datasets using the F-test and found that F > F(critical one tail) and thus the variation of the two populations is unequal (binary variation is significantly larger than ternary).

      F-Test Two-Sample for Variances

      • *

      binary affinity

      ternary affinity

      Mean

      657.3

      110.971667

      Variance

      2971596.1

      24406.4012

      Observations

      12

      12

      df

      11

      11

      F

      121.754784

      P(F

      7.559E-10

      F(critical one-tail)

      2.81793047

      We agree that the overall correlation between affinity and repression was not strong, although we found a stronger correlation within the miRNA-bulge group (Figure 5C and S7C). A larger sample size of miRNA bulge-forming duplexes would be needed to test the generalizability of this observation.

      Given the context of the study - whereby structure is being investigated as a contributing factor to the interaction between the miRNA and mRNA, I find it interesting that the authors chose to use MC-fold to predict the structures of the mRNA, rather than using an experimental approach to assess / validate the structures. Thirty-seven RNAs were assessed; I think even for a subset (the 12 that were focused on in the study), the secondary structure should be validated experimentally (e.g., by chemical probing experiments, which the research group has demonstrated expertise in over the last several years). The validation should follow the in silico folding approach used to narrow down the region of interest. It is necessary to know whether an energy barrier (associated with the mRNA unfolding) has to occur prior to miRNA binding; this could help explain some of the unexplained results in the study. Indeed, the authors mention that there are many variables that influence miRNA regulation.

      Indeed, experimentally validated structures offer valuable insights that cannot be obtained solely through sequence-based predictions. This is why we opted to employ our RABS method to experimentally evaluate the binary and ternary complex binding of our 12 selected targets (as depicted in Figures 4 and S9 and discussed in the text on pages 23-24). While we (in silico) assessed all 37 RNA targets that were experimentally confirmed at the time, selecting 12 to represent both biological and predicted structural diversity, it would have been impractical to experimentally pre-assess all the targets not included in the final selection. Our in-silico assessment was designed to narrow down the regions of interest and evaluate predicted secondary structures present. The pipeline is shown in Figure 1. Details of the code used in the in-silico analysis are provided in Supplementary File 1.

      Regarding the energy of unfolding of mRNA, our constructs considered the isolated binding sites thus the effects of surrounding mRNA interactions were removed. We compared our affinities to dG as well as MFE and have now included this analysis in Figure S8A. Additionally, we have included the text on page 27-28 of the discussion:

      "Gibbs free energy (G), which is often included in targeting prediction models as a measure of stability of the miRNA:mRNA pair12,62, correlated with the log of our binary KD,app values, using ΔG values predicted by RNAcofold (R2 = 0.61). There was a weaker correlation with the free energy values derived from the minimum free energy (MFE) structures predicted by RNAcofold (R2 = 0.41) (Figure S8A). This result highlights the contribution of unfolding (in ΔG) as being an important in predicting KD. The differences between ΔG and KD,app are likely primarily due to inaccurately predicted structures used for energy calculations."

      Additionally, we assessed the free form of all mRNA targets via RABS (Figure S9) and observed that the seed of each free mRNA was available for miRNA binding (seeds of the free mRNA were not stably bound).

      Finally, when designing our luciferase plasmids we used RNAstructure (Reuter & Mathews, 2010) to check for self-folding effects which could interfere with target site binding and ensured that all plasmids were void of such effects.

      In the methods, T7 is italicized by accident in the T7 in vitro transcription section. Bacmid is sometimes written with a capital B and other times with a lower-cased b. The authors should be consistent. The concentration of TEV protease that was added (as opposed to the volume) should be described for reproducibility.

      Thank you for pointing out these overlooked points. They have now been corrected.

      In figure S2D, what is the second species in the gel on the right-hand side of the gel in the miR-34a:AGO lanes? The authors should mention this.

      We believe that the faint upper band corresponds to other longer RNA species loaded into AGO2. As AGO2 is loaded with a diversity of RNA species, it is likely that some of them may have a weak affinity for the miR-34a-complementary probe, and therefore show up on the northern blot.

      Figure S3B and S3A are referenced out of order in the text. In regard to S3A, what are the anticipated or hypothesized alternative conformations for NOTCH1, DLL1, and MTA2? There are really interesting things going on in the gels, also for HNF4a and NOTCH2. Can the authors offer some explanation for why the free RNA bands don't seem to disappear, but rather migrate slowly? Is this a new species?

      The order of the figure references have now been updated, thank you for alerting us to this.

      Figure S3A: For MTA2, the two alternative conformations are shown in Figure S9 and S10 (and shown below here, miR-34aseed marked in pink). It appears that a single conformation is favoured at high concentration (> 1 µM) while the two conformations are present at {less than or equal to} 1 µM. The RABS data for MTA2 also indicated multiple binding conformations, as the reactivity traces were inconsistent. We expect that the conformation shown on the left was most dominant within AGO2, based on the reactivity of the TRANS + AGO assays. However, we cannot exclude a possible G-quadruplex formation due to the high G content of MTA2 (shown below right).

      Regarding NOTCH1 and DLL1, a faint fluorescent shadow was observed beneath the miR-34a bound band. The RABS reactivity traces indicated a single dominant conformation for these targets, so it is possible that the lower shadow observed was due to more subtle differences in conformation, such as the opening/closing of one or a few base pairs at the terminus or bulge, (i.e. end fraying). HNF4α and NOTCH2 appear to never fully saturate the miR-34a, so a small un-bound population remains visible on the gel. For NOTCH2 this free miR-34a band appears to migrate upwards, possibly due to overloading the gel lane with excess NOTCH2 (which are not observed in the Cy5 fluorescence image).

      In the EMSA for Perfect, why does the band intensity for the bound complex increase then decrease? How many replicates were run for this? This needs to be reconciled.

      As for all EMSAs, three replicates were carried out for each mRNA target and all gels are shown in Supplementary Files 2 and 3, for the binary and ternary assays respectively.

      Uneven heat distribution across the gel can lead to bleaching of the Cy5 fluorophore. To address this, we we used a circulating cooler in our electrophoresis tank, as outlined in our methods (page 10). However, the aforementioned gel for one of thePERFECT sample replicates appears to have been evenly cooled. As the binding ratio (rather than total band volume) was used for quantification, the binding curve was unaffected, and this did not influence KD,app.

      We have now replaced the exemplary gel for PERFECT in Figure S3 with a more representative and evenly labelled gel from our replicates (Cy5 fluorescence image shown below). The binding curve for PERFECT is also shown here:

      The authors list that the RNA concentration was held constant at 10 nM; in EMSAs, the RNA concentration should be less than the binding affinity; what is the lowest concentration of protein used in the assays shown in S3A? Is this a serial dilution? It seems to me like the binding assays for MTA2, Perfect, and SRCseed might have too high of an RNA concentration. (Actually, now I see in the supplement the concentrations of proteins, and the RNA concentration is too high). Also, why is the intensity of bands for bound complex for SRCseed more intense than the free RNA?

      Why are the binding affinity error bars so large (e.g., for NOTCH2 with mir-34a) - 6 uM +/- 3 uM?

      No protein was used in the binary assays shown in Figure S3A. For the ternary assays in Figure S4, the maximum concentration of miR-34a-loaded AGO2 (miR-34a-AGO2) was 268 nM, with a serial dilution down to a minimum of 0.06 nM.

      Optimal EMSA conditions require a constant RNA concentration that is lower than the binding affinity to accurately estimate high-affinity interactions.

      For our tightest binders, such as SIRT1, we can confidently state that the KD,app is less than 10 nM, estimated at 0.4 {plus minus} 1.1 nM. Therefore, the accuracy of this estimation is reduced, and the standard deviation is larger than the estimated KD,app. As NOTCH2 bound miR-34a very weakly and did not reach a fully bound plateau, the resulting high error was expected. Consequently, we do not have the same level of certainty for extremely tight or weak binders. In this study, the relative affinities were of primary importance.

      We have included on page 18:

      As the Cy5-miR-34a concentration was fixed to 10 nM to give sufficient signal during detection, KD,app values below 10 nM have a lower confidence.

      Regarding the control samples PERFECT and SCRseed, our focus was not on determining the exact KD,app of these artificial constructs. Instead, we were primarily interested in whether they exhibited binding and under which conditions. For SCRseed, we neither adjusted the titration range nor calculated KD,app. For PERFECT, the concentration was adjusted to a lower range of 30 nM - 0.001 nM to give a relative comparison with the other tight binder SIRT1. However, further reduction in RNA concentration was not pursued, as it already fell well below the 10 nM sensitivity threshold.

      Regarding the intensity of the bound SCRseed band, we observed that the bound fluorophore often resulted in stronger intensity than for the free probe. This was observed for a number of the samples (PERFECT, BLC2, SCRseed). A previous publication reported that Cy5 is sequence dependent in DNA, that the effect is more sensitive to double-stranded DNA, and that the fluorophore is sensitive to the surrounding 5 base pairs (Kretschy, Sack and Somoza, 2016). It is likely that the same phenonenon exists in RNA.

      For MTA2, the two alternative conformations (shown in Figure S9 and S10) make assessment of KD,app more difficult. As the higher affinity conformation did not reach a fully-bound plateau before the weaker affinity conformation appeared, the binding curve plateau (where all miR-34a was bound) reflected the weaker conformation KD,app. We increased the range of titration tested by using a three-fold serial dilution, but further reduction in RNA concentration would not have been fruitful as it already dropped below well below the 10 nM sensitivity range. Therefore the MTA2 binary complex had a higher error at (944 {plus minus} 274 nM) and lower confidence.

      We then decided to run a competition assay to detect the weaker KD,app of MTA2. The assay was set up using the known binding affinity of CD44, which was labelled with Cy5 to track the reaction. MTA2 was titrated against a constant concentration of Cy5-CD44:miR-34a, and disruption of the CD44 and miR-34a binding was monitored. We fitted the data to a quadratic for competitive binding (Cheng and Prusoff., 1973) to calculate the KD,app for competitive binding, or KC,app.

      We validated our competition assay by comparing it with our direct binding assays, specifically assessing CD44 in a self-competition assay. The CD44 KC,app (168 {plus minus} 24 nM; mean and SD of three replicates) was found to be consistent with the KD,app obtained from the direct assay (165 {plus minus} 21 nM).

      As we wanted all affinity data to be directly comparable (using the same methodology), we compared the KD,app values obtained via direct assay in the manuscript. It appears that the competitive EMSA assay for MTA2 reflects the weaker affinity conformation observed in the direct assay.

      It would be very helpful if the authors wrote in the Kds in Figure 2A in green and blue (in the extra space in the plots). This would help the reader to better understand what's going on, and for me, as a reviewer, to better consider the analysis/conclusions presented by the authors.

      KD,app values are written in in green and blue in what is now Figure 2D (originally Figure 2A).

      The authors state on page 18 that 'Interestingly, however, we did not observe a correlation between binary or ternary complex affinity and seed type.' They should elaborate on why this is interesting.

      The prevailing view is that the miRNA seed type significantly influences affinity within AGO2. The largest biochemical studies of miRNA-target interactions to date, conducted by McGeary et al. (2019, 2022), used AGO-RBNS (RNA Bind-n-Seq) to reveal relative binding affinities. These studies demonstrated strong correlations between the canonical seed types and binding affinity. Therefore, we find it interesting that no such correlation was observed in our dataset (despite its small size).

      We have now added to the manuscript (page 20):

      "The largest biochemical studies of miRNA-target interactions to date (McGeary et al., 2019, 2022) used AGO-RBNS (RNA Bind-n-Seq) to extract relative binding affinities, demonstrating strong correlations between the canonical seed types and binding affinity. Therefore, it is intriguing that our dataset, despite its small size, showed no such correlation."

      Figure 2C is not referenced in the text (the authors should go back through the text to make sure everything is referenced and in order). The Kds should be listed alongside the gels in Figure 2C.

      Figure 2 has now been rearranged and updated, with KD,app values listed in what is now Figure 2D.

      Figure 3B is rather confusing to understand.

      We have now adapted Figure 3 to simplify readability. Panel B has now been moved to C, and we have introduced panel A (moved from Figure 2B). In Figure 3C (originally 3B) we have added arrows to indicate the direction of affinity change from binary to ternary complex, and moved the duplex release information to panel A. We thank the reviewer and think that the data is now much clearer.

      Figure 3. AGO2 moderates affinity by strengthening weak binders and weakening strong binders. (A) Correlation of relative mRNA:miR-34a with mRNA:miR-34aAGO2 binding affinities. No seed type correlation is observed, seeds coloured, where 8mer is pink, 7mer-m8 is turquoise, and 7-mer-A1 is mauve. The slope of the linear fit is 0.48, and intercept on the (log y)-axis is 7.11. The occurrence of miRNA duplex release from AGO2 is marked with diamonds. (B) miR-34a-mediated repression of dual luciferase reporters fused to the 12 mRNA targeting sites. Luciferase activity from HEK293T cells co-transfected with each reporter construct, miR-34a was measured 24 hours following transfection and normalised to the miR-34a-negative transfection control. Each datapoint represents the R/F ratio for an independent experiment (n=3) with standard deviations indicated. SCRseed is a scrambled seed control, SCRall is a fully scrambled control, and PERFECT is the perfect complement of miR-34a. Dotted horizontal lines represent the repression values for the 22-nucleotide seed-only controls6 for the respective seed types, in the absence of any other WC base pairing. (C) Comparison of relative target repression with relative affinity assessed by EMSA. Blue represents mRNA:miR-34a affinity (binary complex), while green represents mRNA:miR-34a-AGO2 affinity (ternary complex). Arrows indicate the direction of change in affinity upon binding within AGO2 compared to the binary complex. It is seen that AGO2 moderates affinity bi-directionally by strengthening weak binders and weakening strong binders.

      Page 20: Perfect should be italicized.

      Thank you for bringing this to our attention, this how now been adjusted.

      Have the authors considered using NMR to assess the base pair pattern formed between the miRNA:mRNA complexes (with / without AGO)? As a validation for results obtained by RABS? This could be helpful for the Asymmetric target binding section, the Ago increases flexibility section, and the three distinct structural groups section in the results. It is widely accepted that while chemical probing is insightful, results should be validated using alternative approaches. Distinguishing structural changes and protected reactivity in the presence of protein is challenging.

      NMR provides high-resolution information on RNA base-pairing patterns, allowing us to compare our RABS results for SIRT1with those obtained via NMR (Banijamali et al., 2022) for the binary complex. For SIRT1, the RNA:RNA structures identified were consistent between both methods. However, using NMR to measure RNA:RNA binding within AGO2 is challenging due to the protein's large size. Currently, there are no published complete NMR structures of RNA within AGO2. The largest solution-state NMR structures published that include AGO consist solely of the PAZ domain. Our group has been working on method development using DNP-enhanced solid-state NMR to obtain structural information within the complete AGO2 protein, but the current resolution does not allow us to fully reconstruct a complete NMR structure. We hope that in the coming years, this will be a method to evaluate RNA within AGO. This limitation highlights the advantage of RABS in providing RNA base-pairing information within the ternary complex in solution.

      Reviewer #3 (Significance (Required)):

      The work is helpful for understanding how microRNAs recognize and bind their mRNA targets, and the impact Ago has on this interaction. I think for therapeutic studies, this will be helpful for structure-based design. Especially given the three types of structures identified to be a part of the interaction.

      We thank the reviewer for their detailed remarks, especially concerning the importance of technical details the binding assays. We further thank the reviewer for recognising the potential impact of our work for rational design.

      4. Description of analyses that authors prefer not to carry out

      • *

      In response to Reviewer 2 - major comment 1, we prefer to not run an additional ion exchange purification on the AGO2 protein due to the reasoning discussed above, which is repeated here:

      We have addressed this point in three ways:

      Thank you for mentioning this crucial point which has been a focus of our controls. We have addressed this point in four ways:

      Salt wash during reverse IMAC purification. Separation of unbound RNA and proteins via SEC. Blocking non-specific interactions using polyuridine. Observing both the presence and absence of duplex release among different targets using the same AGO2 preparation and conditions.

      Firstly, although we did not use a specific ion exchange column for purification, we believe the ionic strength used in our IMAC wash step was sufficient to remove non-specific interactions. We used A linear gradient with using buffer A (50 mM Tris-HCl, 300 mM NaCl, 10 mM Imidazole, 1 mM TCEP, 5% glycerol v/v) and buffer B (50 mM Tris-HCl, 500 mM NaCl, 300 mM Imidazole, 1 mM TCEP, 5% glycerol) at pH 8. The protocol followed recommendation by BioRad for their Profinity IMAC resins where it is stated that 300 mM NaCl should be included in buffers to deter nonspecific protein binding due to ionic interactions. The protein itself has a higher affinity for the resin than nucleic acids.

      A commonly used protocol for RISC purification follows the method by Flores-Jasso et al. (RNA 2013). Here, the authors use ion exchange chromatography to remove competitor oligonucleotides. After loading, they washed the column with lysis buffer (30 mM HEPES-KOH at pH 7.4, 100 mM potassium acetate, 2 mM magnesium acetate and 2 mM DTT). AGO was eluted with lysis buffer containing 500 mM potassium acetate. Competing oligonucleotides were eluted in the wash.

      As ionic strength is independent of ion identity or chemical nature of the ion involved (Jerermy M. Berg, John L. Tymoczko, Gregory J. Garret Jr., Biochemistry 2015), we reasoned that our Tris-HCl/NaCl/ imidazole buffer wash should have at comparable ionic strength to the Flores-Jasso protocol.

      Our total ionic contributions were: 500 mM Na+, 550 mM Cl-, 50 mM Tris and 300 mM imidazole. We recognise that Tris and imidazole are both partially ionized according the pH of the buffer (pH 8) and their respective pKa values, but even if only considering the sodium and chloride it should be comparable to the Flores-Jasso protocol.

      Secondly, after reverse HisTrap purification, AGO2 was run through size exclusion chromatography to remove any remaining impurities (shown Figure S2B).

      Thirdly, knowing that AGO2 has many positively charged surface patches and can bind nucleic acid nonspecifically (Nakanishi, 2022; O'Geen et al., 2018), we tested various blocking backgrounds to eliminate nonspecific binding effects in our EMSA ternary binding assays. We were able to address this issue by adding either non-homogenous RNA extract or homogenous polyuridine (pU) in our EMSA buffer during equilibration background experiments. This allowed us to eliminate non-specific binding of our target mRNAs, as shown previously in Supplementary Figure S6. We appreciate that the reviewer finds this technical detail important and have moved the panel C of figure S6 into the main results in Figure 2C, to highlight the novel conditions used and important controls needed to be performed. If miR-34a were non-specifically bound to the surface of AGO2 after washing, this blocking step would render any impact of surface-bound miR-34a negligible due to the excess of competing polyuridine (pU).

      Our EMSA results show that, using polyU, we can reduce non-specific interaction between AGO2 and RNAs that are present. And still, duplex release occurs despite the blocking step. It is therefore less likely that duplex release is caused by surface-bound miR-34a.

      Finally, the observation of distinct duplex release for certain targets, but not for others (e.g. MTA2, which bound tightly to miR-34a-AGO2 but did not exhibit duplex release; see Figure 2), argues against the possibility that the phenomenon was solely due to non-specifically bound RNA releasing from AGO2.

      In response to the reviewers statement "Since properly loaded miR-34a is never released from AGO2, it is impossible for the miR-34a loaded into AGO2 to form the binary complex (mRNA:miR-34a)" we would like to refer to the three papers, De et al. (2013) Jo MH et al. (2015), and Park JH et al. (2017), which have previously reported duplex release and collectively provide considerable evidence that miRNA can be unloaded from AGO in order to promote turnover and recycling of AGO. It is known that AGO recycling must occur, therefore there must be some mechanisms to enable release of miRNA from AGO2 to enable this. It is possible that AGO recycling proceeds via miRNA degradation (TDMD) in the cell, but in the absence of enzymes responsible for oligouridylation and degradation, the miRNA duplex may be released. As TDMD-competent mRNA targets have been observed to release the miRNA 3' tail from AGO2 (Sheu-Gruttadauria et al., 2019; Willkomm et al., 2022), there is a possible mechanistic similarity between the two processes, however, we do not have sufficient data to make any statement on this.

    1. Reviewer #3 (Public Review):

      Wu et al. present cryo-EM structures of the potassium channel Kv1.2 in open, C-type inactivated, toxin-blocked and presumably sodium-bound states at 3.2 Å, 2.5 Å, 2.8 Å, and 2.9 Å. The work builds on a large body of structural work on Kv1.2 and related voltage-gated potassium channels. The manuscript presents a plethora of structural work, and the authors are commended on the breadth of the studies. The structural studies are well-executed. Although the findings are mostly confirmatory, they do add to the body of work on this and related channels. Notably, the authors present structures of DTx-bound Kv1.2 and of Kv1.2 in a low concentration of potassium (which may contain sodium ions bound within the selectivity filter). These two structures add considerable new information. The DTx structure has been markedly improved in the revised version and the authors arrive at well-founded conclusions regarding its mechanism of block. Regarding the Na+ structure, the authors claim that the structure with sodium has "zero" potassium - I caution them to make this claim. It is likely that some K+ persists in their sample and that some of the density in the "zero potassium" structure may be due to K+ rather than Na+. This can be clarified by revisions to the text and discussion. I do not think that any additional experiments are needed. Overall, the manuscript is well-written, a nice addition to the field, and a crowning achievement for the Sigworth lab.

      Most of this reviewer's initial comments have been addressed in the revised manuscript. Some comments remain that could be addressed by revisions of the text.

      Specific comments on the revised version:<br /> Quotations indicate text in the manuscript.<br /> (1) "While the VSD helices in Kv1.2s and the inactivated Kv1.2s-W17'F superimpose very well at the top (including the S4-S5 interface described above), there is a general twist of the helix bundle that yields an overall rotation of about 3o at the bottom of the VSD."

      Comment: This seemed a bit confusing. I assume the authors aligned the complete structures - the differences they indicate seem to be slight VSD repositioning relative to the pore rather than differences between the VSD conformations themselves. The authors may wish to clarify. As they point out in the subsequent paragraph, the VSDs are known to be loosely associated with the pore.

      (2) Comment: The modeling of DTx into the density is a major improvement in the revision. Figure 3 displays some interactions between the toxin and Kv1.2 - additional side views of the toxin and the channel might allow the reader to appreciate the interactions more fully. The overall fit of the toxin structure into the density is somewhat difficult to assess from the figure. (The authors might consider using ChimeraX to display density and model in this figure.)

      (3) "We obtained the structure of Kv1.2s in a zero K+ solution, with all potassium replaced with sodium, and were surprised to find that it is little changed from the K+ bound structure, with an essentially identical selectivity filter conformation (Figure 4B and Figure 4-figure supplement 1)."

      Comment: It should be noted in the manuscript that K+ and Na+ ions cannot be distinguished by the cryo-EM studies - the densities are indistinguishable. The authors are inferring that the observed density corresponds to Na+ because the protein was exchanged from K+ into Na+ on a gel filtration (SEC) column. It is likely that a small amount of K+ remains in the protein sample following SEC. I caution the authors to claim that there is zero K+ in solution without measuring the K+ content of the protein sample. Additionally, it should be considered that K+ may be present in the blotting paper used for cryo-EM grid preparation (our laboratory has noted, for example, a substantial amount of Ca2+ in blotting paper). The affinity of Kv1.2 for K+ has not been determined, to my knowledge - the authors note in the Discussion that the Shaker channel has "tight" binding for K+. It seems possible that some portion of the density in the selectivity filter could be due to residual K+. This caveat should be clearly stated in the main text and discussion. More extensive exchange into Na+, such as performing the entire protein purification in NaCl, or by dialysis (as performed for obtaining the structure of KcsA in low K+ by Y. Zhou et al. & Mackinnon 2001), would provide more convincing removal of K+, but I suspect that the Kv1.2 protein would not have sufficient biochemical stability without K+ to endure this treatment. One might argue that reduced biochemical stability in NaCl could be an indication that there was a meaningful amount of K+ in the final sample used for cryo-EM (or in the particles that were selected to yield the final high-resolution structure).

      (4) Referring to the structure obtained in NaCl: "The ion occupancy is also similar, and we presume that Kv1.2 is a conducting channel in sodium solution."

      Comment: Stating that "Kv1.2 is a conducting channel in sodium solution" and implying that conduction of Na+ is achieved by an analogous distribution of ion binding sites as observed for K+ are strong statements to make - and not justified by the experiments provided. Electrophysiology would be required to demonstrate that the channel conducts sodium in the absence of K+. More complete ionic exchange, better control of the ionic conditions (Na+ vs K+), and affinity measurements for K+ would be needed to determine the distribution of Na+ in the filter (as mentioned above). At minimum, the authors should revise and clarify what the intended meaning of the statement "we presume that Kv1.2 is a conducting channel in sodium solution". As mentioned above, it seems possible/likely that a portion of the density in the filter may be due to K+.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript by Wu et al., the authors present the high-resolution cryoEM structures of the WT Kv1.2 voltage-gated potassium channel. Along with this structure, the authors have solved several structures of mutants or experimental conditions relevant to the slow inactivation process that these channels undergo and which is not yet completely understood. 

      One of the main findings is the determination of the structure of a mutant (W366F) that is thought to correspond to the slow inactivated state. These experiments confirm results in similar mutants in different channels from Kv1.2 that indicate that inactivation is associated with an enlarged selectivity filter. 

      Another interesting structure is the complex of Kv1.2 with the pore-blocking toxin Dendrotoxin 1. The results show that the mechanism of the block is different from similar toxins, in which a lysine residue penetrates the pore deep enough to empty most external potassium binding sites. 

      The quality of the structural data presented in this manuscript is very high and allows for the unambiguous assignment of side chains. The conclusions are supported by the data. This is an important contribution that should further our understanding of voltagedependent potassium channel gating. Specific comments are appended below. 

      (1) In the mains text's reference to Figure 2d residues W18' and S22' are mentioned but are not labeled in the insets. 

      Now labeled in Fig. 2D

      (2) On page 8 there is a discussion of how the two remaining K+ ions in binding sites S3 and S4 prevent permeation K+ in molecular dynamics. However, in Shaker, inactivated W434F channels can sporadically allow K+ permeation with normal single-channel conductance but very reduced open times and open probability at not very high voltages. 

      Addressed in the Discussion, lines 480-490.

      (3) The structures of WT in the absence of K+ show a narrower selectivity filter, however, Figure 4 does not convey this finding. In fact, the structure in Figure 4B is constructed at such an angle that it looks as if the carbonyl distances are increased, perhaps this should be fixed. Also, it is not clear how the distances between carbonyls given in the text on page 12 are measured. Is it between adjacent or kitty-corner subunits? 

      We decided to remove mention of carbonyl distances, because at our resolutions the atoms are not resolved.

      (4) It would be really interesting to know the authors' opinions on the driving forces behind slow inactivation. For example, potassium flux seems to be necessary for channels to inactivate, which might indicate a local conformational change is the trigger for the main twisting events proposed here. 

      We cite Sauer et al. (2011) for the idea that the intact selectivity filter is a strained conformation, and its relaxation yields the wide vestibule seen in NaK2K and Kv channels.  Lines 434-439.

      Reviewer #2 (Public Review): 

      There are four Kv1.2 channel structures reported: the open state, the C-type inactivated state, a dendrotoxin-bound state, and a structure in Na+. 

      A high-resolution crystal structure of the open state for a chimeric Kv1.2 channel was reported in 2007 and there is no new information provided by the cryoEM structure reported in this study. 

      The cryo-EM structure of the C-type inactivated state of the Kv1.2 channel was determined for a channel with the W to F substitution in the pore helix. A cryo-EM structure of the Shaker channel and a crystal structure of a chimeric Kv1.2 channel with an equivalent W to F mutation were reported in 2022. Cryo-EM structures of the C-type inactivated Kv1.3 channel are also available. All these previous structures have provided a relatively consistent structural view of the C-type inactivated state and there is no significant new information that is provided by the structure reported in this study. 

      A structure of the Kv1.2 channel blocked by dendrotoxin is reported. A crystal structure of charybdotoxin and the chimeric Kv1.2 channel was reported in 2013. Density for dendrotoxin could not be clearly resolved due to symmetry issues and so the definitive information from the structure is that dendrotoxin binds, similarly to charybdotoxin, at the mouth of the pore. A potential new finding is that there is a deeper penetration of the blocking Lys residue in dendrotoxin compared to charybdotoxin. It will however be necessary to use approaches to break the symmetry and resolve the electron density for the dendrotoxin molecule to support this claim and to make this structure significant.  

      We have now succeeded in breaking the symmetry and present in Fig. 3 a C1 structure of the toxin-channel complex. In the improved map we now see that our previous conclusion was wrong: the penetration of Lys5 cannot be much deeper than that seen in CTx and ShK structures. However for some reason the pattern of ion-site occupancies in the blocked state is different in this structure than in the others. Fig. 3, Fig. 4E; text lines 559-568.

      The final structure reported is the structure of the Kv1.2 channel in K+ free conditions and with Na+ present. The structure of the KcsA channel by the MacKinnon group in 2001 showed a constricted filter and since then it has been falsely assumed by the K channel community that the lowering of K concentration leads to a construction of the selectivity filter. There have been structural studies on the MthK and the NaK2K channels showing a lack of constriction in the selectivity filter in the absence of K+. These results have been generally ignored and the misconception of filter constriction/collapse in the absence of K+ still persists. The structure of the Kv1.2 channel in Na+ provided a clear example that loss of K+ does not necessarily lead to filter constriction. 

      We are grateful to the reviewer for pointing out this serious omission. We now cite other work including from the Y. Jiang and C. Nichols labs showing examples of outer pore expansion and destabilization. Page p. 4, lines 90-104; lines 421-439.

      The structure in Na+ is significant while the other structures are either merely reproductions of previous reports or are not resolved well enough to make any substantial claims. 

      We now state more clearly the confirmatory nature of our Kv1.2 open structure (lines 71-74) and the similarities of the inactivated-channel structures (lines 193196).

      Reviewer #3 (Public Review): 

      Wu et al. present cryo-EM structures of the potassium channel Kv1.2 in open, C-type inactivated, toxin-blocked and presumably sodium-bound states at 3.2 Å, 2.5 Å, 2.8 Å, and 2.9 Å. The work builds on a large body of structural work on Kv1.2 and related voltage-gated potassium channels. The manuscript presents a large quantity of structural work on the Kv1.2 channel, and the authors should be commended on the breadth of the studies. The structural studies seem well-executed (this is hard to fully evaluate because the current manuscript is missing a data collection and refinement statistics table). The findings are mostly confirmatory, but they do add to the body of work on this and related channels. Notably, the authors present structures of DTXbound Kv1.2 and of Kv1.2 in a low concentration of potassium (with presumably sodium ions bound within the selectivity filter). These two structures add new information, but the studies seem somewhat underdeveloped - they would be strengthened by accompanying functional studies and further structural analyses. Overall, the manuscript is well-written and a nice addition to the field. 

      The data collection and refinement table has been added (Fig. 4 supplement 3.)

      We agree and regret the lack of functional studies. We have not been able to carry them out because work in our laboratory is winding down and the lab soon will be closing.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is not obvious from the data shown how well the side chain positions in the inactivated state are defined by the electron density. These figures should be redone. Maybe the use of stereo would be useful. This will be particularly useful for the reader to decide if the small changes in, for example, the positioning of the carbonyl oxygens are believable. 

      Figure 2 – figure supplement 4 shows the stereo views.

      (2) The authors note the changes observed (though small) in the VSD which were not observed in other structures. The relevance of this observation is not described. Do these changes arise due to the different environments of detergents versus nanodisc etc. in the different structures?

      We’ve now inserted a note about variety of environments and how this might be a cause of the difference: lines 280-285.  

      Are there changes in the pore-VSD interface in the inactivated and the open channel structures and if yes, then do mutations at these residues affect inactivation?

      There is surprisingly little movement at the S4-S5 interface residues identified by Bassetto et al. (2022) as having effects on inactivation. Lines 262-267.

      (3) For the structures in Na+, it is important to provide analytical data showing the biochemical behavior of the channel. This is also true for the wild type and the W to F mutant channel. Size exclusion profiles should be included. 

      The SEC profile (noisy, but showing a clear peak) of the channel in Na+ is now shown in Fig. 4 supplement 1. Low expression of the W366F mutant produced even worse SEC results, but we include a representative micrograph of W366F in Na+ to show the monodispersed protein prep. In Figure 5 – figure supplement 1.

      Reviewer #3 (Recommendations For The Authors): 

      Portions of text from the manuscript are indicated by quotations. 

      Introduction: "One goal of the current study was to examine the structure of the native Kv1.2 channel." 

      Comment, minor points: The authors refer to the Kv1.2 construct used for the structural studies as "native Kv1.2". I found this somewhat confusing because the word "native" suggests derived from a native source. The phrasing above also gives the impression that the structure by Wu et al is the first structure of Kv1.2. The Kv1.2 construct is essentially identical to the one used by Long et al in 2005 to determine the initial structure of Kv1.2 (PDB 2A79). The authors discuss a subsequent paddle-chimera Kv1.2-2.1 structure from 2007 (PDB 2R9R) in the introduction, but it would be prudent to mention the 2005 one of Kv1.2 as well. The open structure determined by Wu et al. is an improvement on the 2A79 structure in that the 2A79 structure was modeled as a poly-alanine model within the voltage sensor domain. Nevertheless, the Kv1.2-2.1 structure (2R9R) is highly similar to the 2A79 structure of Kv1.2. The 2007 structure indicated that Kv1.2-2.1 recapitulates structural features of Kv1.2. It is therefore not surprising that the open structure presented here is highly similar to that of both PDB 2A79 (Kv1.2) and PDB 2R9R (Kv1.2-2.1).  

      We failed to point out the high quality of the original Long et al. 2005 structure and its comparisons with the chimeric structure in Long et al. 2007. We now have tried to correct this: lines 70-74.

      Comment: The cryo-EM analyses suggest that a large percentage (most?) of the particles are missing the beta subunit. This should be commented on somewhere.      

      Now noted on lines 120-132, we pooled particles with and without beta subunits. 

      Regarding ions in the selectivity filter, one-dimensional plots of the density would strengthen the analysis.

      Now included in Fig. 4.

      Also, one should mention caveats associated with identifying ions in cryo-EM maps and the added difficulty/uncertainty when the density is located along a symmetry axis (C4 axis, due to the possible build-up of noise). C1 reconstructions, showing density within the filter, if possible, would strengthen the analyses.

      You are correct. However local resolution is highest in the selectivity filter region. So I think that since the CTF-based filtering is constant over all the structure I think the SNR will be good on axis. 

      Comment: The section on channel inactivation could be simplified by stating that the structure is highly similar to W17'F structures of other Kv channels. (And then discussing possible differences).  

      We now note, “overall conformational difference is identical…” p. 7, lines 193-196.

      "Salt bridges involving the S4 Arg and Lys residues are shifted slightly (Figure 2-figure supplement 3A-D). Arg300 (R3) is in close proximity to Glu226 on the S2 helix for the open channel, while R3 is closer to Glu183 in the S2 helix. The Glu226 side chain adopts a visible interaction with R4 in the inactivated state." 

      Comment: The density for these acidic amino acids seems weak, especially in the inactivated state. It seems like a stretch to make much of their possible conformational changes. 

      We’ve included stereo pairs in Fig. 2 – figure supplement 4.

      "By adding 100 nM α-DTx to detergent solubilized Kv1.2 protein we obtained a cryo-EM structure at 2.8 Å resolution of the complex." 

      Comment: 100 nm. might be lower than the Kv concentration. The current methods are ambiguous on the concentration of Kv channel used for the DTx sample. From the methods, it seems possible that 100 nM DTX is a sub-stoichiometric amount relative to the channel. Regardless, the cryo-EM data seems to suggest that a large percentage of particles do not have DTx bound. This surely complicates the interpretation of density within the filter (which has partly been ascribed to a lysine side chain from DTx).

      The reviewer correctly points a potentially serious problem. It turns out that the 100nM figure we quoted was incorrect, and the actual concentration of toxin, >400 nM, was substantially greater than the protein concentration. This is confirmed by the small fraction (<1%) of 3D class particles that do not show the toxin density (lines 303-306).

      Comment: The methods on atomic structure building/refinement (Protein model building, refinement, and structural analysis) are sparse. A table is needed showing data collection and refinement statistics for each of the structures. This data should also provide average B factors for the ions in the filter. An example can be found in PMID 36224384. 

      Data collection and statistics are now in Fig. 4 – figure supplement 3.

      "In the selectivity filter of the toxin-bound channel (Figure 3E) a continuous density is seen to extend downward from the external site IS0 through to the boundary between IS1 and IS2. This density is well modeled by an extended Lys side chain from the bound toxin, with the terminal amine coordinated by the carbonyls of G27”.

      Comment: While there seems to be extra density in site IS0 from the figures, the density ascribed to lysine in the filter doesn't seem that distinct from those of ions in the open structure. 1-dimensional density plots and some degree of caution may be prudent. Could there, for example, be a mixture of toxin-bound and free channels in the dataset?

      Could the lysine penetrate to different depths? If the toxin binds with nM affinity, why are any channels missing the toxin? Have the authors modeled an atomic structure of the entire toxin bound to the channel to evaluate how plausible the proposed binding of the lysine is? Can the toxin be docked onto Kv1.2 with the deep positioning of the lysine and not clash with the extracellular surface of Kv1.2? 

      We also were concerned about these issues. We have been able to obtain a C1 reconstruction of the toxin-channel complex. In building the atomic model we found that indeed the Lys5 side chain could not penetrate as far as we had thought, and appears to be coordinated by the first carbonyl pair. Fig. 3; text lines 331-332. 

      "Toxin binding shrinks the distances between opposing carbonyl oxygens in the selectivity filter, forming a narrower tunnel into which the Lys side chain fits (Figure 3F). The second and fourth carbonyl oxygen distances are substantially reduced from 4.7 Å and 4.6 Å in an open state to 3.7 Å and 3.9 Å, respectively (Figure 4E). In a superposition of Kv1.2 open-state and α-DTX-bound P-loop structures, there is also an upward shift of the first three carbonyl groups by 0.7~1.0 Å (Figure 4F). " 

      Comment: I suspect the authors intend to refer to Figure 3F rather than 4. I would be cautious here. The refined positions of the carbonyl oxygens are almost certainly affected by the presence or absence of ions in the atomic model during refinement. The density and the resolution of the map may not be able to distinguish small changes to the positions of the carbonyl oxygens (and these differences/uncertainties are compounded by the C4 symmetry). 

      "On the other hand, the terminal amine of lysine in α-DTX is deeply wedged at the second set of carbonyls, narrowing both IS1 and IS2 while displacing ions from the sites (Figure 3-figure supplement 2A). CTX does not cause narrowing of the selectivity filter or displacements of the carbonyls (Figure 3-figure supplement 2B). "

      Comment: Again, caution would be prudent here.  

      We are very grateful to the reviewer for pointing out these problems. We have removed these statements that are weakly supported at our resolution level.

      "Shaker channels are able to conduct Na+ in the absence of K+ (Melishchuk et al., 1998)." 

      Comment: How about the Kv1.2 channel? Is Kv1.2 able to conduct Na+ in the absence of K+ ? This would certainly be relevant for interpreting the conformation of the filter and the density ascribed to Na+ for the structure in sodium.  

      We agree wholeheartedly, but unfortunately we are no longer capable of doing the measurements as our lab will soon close.

      "Ion densities are seen in the IS1, IS3, and IS4 ion binding sites, but the selectivity filter shows a general narrowing as would be expected for binding of sodium ions. The second, third, and fourth carbonyl oxygen distances are reduced from 4.7 Å, 4.7 Å, and 4.6 Å in the open state to 4.4 Å, 3.9 Å, and 4.5 Å, respectively. The rest of the channel structure is very little perturbed. " 

      Comment: The density for IS4 seems weak. To me, it looks like IS1 and IS3 are occupied, whereas IS2 and IS4 are much weaker. 1-dimensional density plots would be helpful. I would suggest caution in commenting too strongly on the "general narrowing" since the resolution of the maps, the local density, and the atomic structure refinement would be consistent with coordinate errors of 0.5 Å or more - and would be compounded (~ doubled) by measuring between symmetry-related atoms.  

      We present 1D plots in Fig. 4E. We no longer comment on “narrowing”

      "Finally, the snake toxin a-Dendrotoxin (DTx) studied here is seen to block Kv1.2 by insertion of a lysine residue into the pore." 

      Comment: Discussion (and references) should be given regarding what was known prior to this study on the mode of inhibition by DTx. 

      Discussion and references now added, lines 287-301.

      "On the other hand, a lengthy molecular-dynamics simulation of deactivation in the Kv1.2-2.1..." 

      Comment: I don't think mentioning this personal communication adds to the manuscript. 

      Actually the original “personal communication” reference was there because the situation is complicated. The movie S3 accompanying the Jensen et al. paper shows deactivation and dewetting of the channel during a 250 us simulation. In the movie there are ions visible in the selectivity filter for the first 50 us, but after that the SF appears empty. Puzzled by this we contacted Dr. Jensen who explained that the movie was in error, ions remain in the SF throughout the entire 250 us. We now cite Jensen (2012) along with the personal communication.

      "The difference between the open and inactivated Kv1.2 structures, like the difference in Kv1.2-2.1 (Reddi et al., 2022) and Shaker (Tan et al., 2022) can be imagined as resulting from a two-step process." 

      Comment: Confusing phrasing because the authors mean to compare their structure to inactivated structures of Kv1.2-2.1 and shaker. 

      Fixed, lines 220-222.

      "Molecular dynamics simulations by Tan et al. based on the Shaker-W17'F structure show that IS3 and IS4 are simultaneously occupied by K+ ions in the inactivated state." 

      Comment: I think that the word "show" is too strong. Perhaps "suggest" 

      The MD result seems to us to be unequivocal, that most of the time the two sites are occupied by ions.

      References are needed for the following statements:  

      -  "as well as the charge-transfer center phenylalanine"

      Now citing Tao et al. 2010, line 156.

      - "total gating charge movement in Shaker channels is larger, about 13 elementary charges per channel" 

      Now citing the review by Islas, 2015 (line 166-169).

      "The selectivity filter of potassium channels consists of an array of four copies of the extended loop (the P-loop) formed by a highly conserved sequence, in this case, TTVGYGD. Two residues anchor the outer half of the selectivity filter and are particularly important in inactivation mechanisms (Figure 2B, right panels). Normally, the tyrosine Y28' (Y377 in Kv1.2) is constrained by hydrogen bonds to residues in the pore helix and helix S6 and is key to the conformation of the selectivity filter. The final aspartate of the P-loop, D30' (D379 in Kv1.2) is normally located near the extracellular surface and has a side chain that also participates in H-bonds with W17' (W366 in Kv1.2) on the pore helix." 

      Citations added (Pless 2013, Sauer 2011) lines 211-214.

      - "During normal conduction, ion binding sites in the selectivity filter are usually occupied by K+ and water molecules in alternation." 

      Added Morais-Cabral et al. 2001, p. 17, lines 463-465.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for their thoughtful evaluation of our work. Our point-by-point responses to reviewer critiques follow below. Please note that any referenced changes to the manuscript are highlighted in yellow in the revised manuscript text.

      Response to Common Critiques

      1. Reviewers 1 and 2 state that some elements of this study confirm previously published results (many in murine systems). However, the reviewers also acknowledge that the mouse and human rDNA repeats may be subject to quite distinct regulation because of the much denser CG content of the human rDNA promoter (26 CpGs) vs. the mouse rDNA promoter (only 2 CpGs); these potential differences in regulation motivated this study in human cells. We evaluate the functions of rDNA methylation in human cells, which is directly relevant to understanding the regulation of rDNA function in human aging, and to understanding the functional implications of DNA methylation "aging clocks" more generally. We also apply a recently developed technology (dCas9-mediated epigenome editing) to directly test the function of rDNA methylation. Novel findings reported in this study include:
      2. Pol I - engaged rDNA repeats are hypomethylated at sites both in the promoter and the gene body; this contrasts with Pol II transcription, which is coincident with gene body methylation.
      3. rDNA copy number remains stable with age in mammals, in striking contrast to findings in other eukaryotes. rDNA copy number instability has been proposed to be a universal feature of the aging genome, and this finding refutes that possibility.
      4. Induction of DNA methylation by an average of ~20% along 7-11 of the 26 CpGs in the human rDNA repeat does not measurably inhibit rDNA transcription.
      5. Human Pol I and UBTF remain bound to rDNA promoters in the presence of elevated CpG methylation, in contrast to the murine Pol I machinery.

      Reviewers 1 and 2 questioned our strategy of mapping sequencing data to the consensus ribosomal DNA (rDNA) repeat alone. We followed the approach of Wang & Lemos Genome Research 2019, who initially described the rDNA methylation clock. Wang & Lemos also mapped genomic data to rDNA consensus sequences alone due to the computational efficiency of this approach, and describe a head-to-head comparison of mapping performance outcomes in their Methods section. Importantly, their analysis indicated that the vast majority (>98%) of sequencing reads can be mapped uniquely to the consensus human rDNA repeat (U13369.1). When we launched our study, we also initially compared the performance of mapping to the rDNA repeat consensus sequence alone versus to the whole human genome. We noted very similar performance in both cases, with the possible exception of a modest increase in simple repeat sequences being erroneously mapped to the intergenic spacer (IGS) region of the rDNA when we mapped to the rDNA repeat alone. As the reviewers pointed out, the IGS contains simple repeat sequences that are also found at numerous other non-rDNA sites in the genome. However, the minor mis-mapping of simple repeats to the IGS did not affect our analyses of non-IGS sequences, which were the focus of this study. We therefore proceeded with mapping to the rDNA consensus sequence only.

      Reviewers 1 and 2 pointed out that our dCas9-DNMT strategy induced only a 15-20% increase in rDNA methylation and questioned whether we could expect to detect downstream effects in rDNA transcription. While Reviewer 2 suggested that multiple sgRNAs could enhance methylation efficiency, it turns out that this has already been tested for other target genes and shown that multiple sgRNAs cannot increase efficiency of CpG methylation by dCas9-DNMTs (Stepper et al., Nucleic Acids Research 2017). Separately, the goal of this study was to model the effects of age-linked rDNA hypermethylation, which increases by 15-20% over mammalian lifespan (Wang & Lemos 2019; see also our Figure 1). Importantly for interpreting these data, induction of promoter methylation to a similar extent on the mouse rDNA repeat was able to direct detectable repression of rDNA transcription (Santoro et al., 2011). Further, dCas9-DNMT has been previously shown to induce a ~20% increase in CpG methylation of the Pol II target gene EpCAM and cause measurable transcriptional repression that was detectable by qPCR (Stepper et al., 2017). In contrast, we were able to induce rDNA methylation to a similar extent and observed no change in the levels of either pre-rRNA or mature rRNA. Because we see that UBF and Pol I remain bound to rDNA in spite of higher CpG methylation (Fig. 7 and Fig. S4), we interpret these data together to indicate that the human Pol I machinery can continue to engage with rDNA in the presence of intermediate levels of CpG methylation.

      Reviewer 1

      1. inactivation of rDNA transcription per se does not affect chromatin accessibility, to date only depletion or deletion of UBTF has been found to do this and even this does not enhance CpG methylation, these published findings should be referenced.

      Our analyses in Figure 2 focus on defining the relationships between chromatin accessibility, transcriptional activity, and CpG methylation throughout the human rDNA repeat. We cannot determine causation from this analysis - meaning whether chromatin accessibility influences CpG methylation or vice versa - and this point is beyond the scope of our study. Our major goal was to test whether induced CpG methylation affects transcription output.

      The authors overstate their results by writing "actively transcribed rDNA repeats are hypomethylated at their promoter" despite only one SmaI site but many CpG sites exist in the human promoter, the latter having not been assayed.

      We analyzed several pieces of data to come to this conclusion. First, ATAC-Me indicates that ATAC-accessible rDNA repeats are completely devoid of methylation both in their promoter and throughout the gene body; as UBTF binding controls rDNA accessibility (Sanij et al., JCB 2008; Hamdane et al., PLoS Genet 2014), we infer that ATAC-accessible repeats are engaged with the Pol I transcription machinery and hypomethylated. To more directly probe this question, we evaluated the methylation status of Pol I-bound rDNA repeats at five separate sites by ChIP-chop: two sites in the 5' regulatory region (5' ETS and core promoter, pooled together as "promoter" in Figure 2F) and three sites within the gene body (18S, 5.8S, and 28S, pooled together as "gene body" in Figure 2F). These data clearly indicate that Pol I preferentially binds to these regions when they are hypomethylated, as the extent of CpG methylation at these same sites is higher in input DNA and lower in Pol I-ChIPped DNA. While we do not comprehensively profile CpG methylation status of Pol I-bound DNA, these ChIP-chop analyses are consistent with our interpretation that "actively transcribed (that is, Pol I-engaged) rDNA repeats are hypomethylated at their promoter".

      Pol I's preference for binding hypomethylated promoters has been previously described in mouse cells (Santoro & Grummt 2001) and human cells (Brown & Szyf Mol Cell Biol 2007). We confirm this and also report the novel finding that rDNA gene bodies bound by Pol I are hypomethylated. This contrasts with known relationships between Pol II and CpG methylation, where genes actively transcribed by Pol II often have dense gene body CpG methylation.

      While we think it is reasonable to infer from ATAC-Me data and ChIP-chop data together that accessible and hypomethylated rDNA repeats reflect transcriptionally active repeats, we appreciate the reviewer's point that we analyzed only a select few CpG sites by Pol I ChIP-chop. We have adjusted the text to make our interpretation more parsimonious (see highlights).

      The human rDNA promoter contains many CpGs which may not affect transcription when methylated. RRBS and WGBS data can't tell us much if we don't understand which sites, when methylated, affect transcription*. *

      We agree, and this ambiguity is what motivated us to induce methylation and evaluate the consequences. In plasmid reporter experiments where the human rDNA promoter was fused to a luciferase reporter, it was shown that in vitro methylation of the plasmid potently inhibited transcription in human cells (Ghoshal et al., J Biol Chem 2004). In this study, methylation of 7/26 CpGs was sufficient to induce >75% inhibition of reporter plasmid transcription, while methylation at single sites could induce ~50% inhibition. We neglected to site this relevant study and have included a reference to it in the revised manuscript. Importantly, this plasmid reporter assay does not assess the effects of CpG methylation on the full rDNA repeat in its endogenous genomic context. We were able to induce significant CpG hypermethylation on 11/26 promoter CpGs with one guide (P+G) and on 7/26 CpGs with a second guide (P+A) (Figure 3D). This level of methylation did not induce detectable silencing of rRNA transcription. Instead, we found that both UBF (Fig. 7) and Pol I (Fig. S4) remained bound to rDNA in the presence of CpG hypermethylation.

      The argument that the mouse rDNA Pol I machinery is "exquisitely sensitive" to CpG methylation is a little misleading as there are only two CpGs in the mouse rDNA promoter. Which of the 26 human CpGs are the critical ones?

      Immediately following this statement in the Discussion, we state that "the human rDNA promoter is significantly more CG-rich than the mouse rDNA promoter". We have revised this section to emphasize the difference (26 CpGs in human vs. only 2 in the mouse) and discuss this point raised by the reviewer: which are the critical CpGs in the human rDNA? Here again it is relevant to cite the human rDNA promoter reporter assays performed by Ghoshal et al., J Biol Chem 2004. These data indicate that CpG methylation of 7/26 promoter CpGs interferes with transcription from an rDNA reporter plasmid. Notably, it is unclear how generalizable findings from reporter assays are to the genomic context of the endogenous full length rDNA sequence. Our data indicate that partial methylation of 7-11 CpGs in the human rDNA promoter causes no detectable rDNA inhibition, and indeed does not displace UBF or Pol I (Fig. 7; Fig. S4).

      Antibody SC13125 used for UBF ChIP sees nearly exclusively the shorter transcriptionally inactive UBF2 variant. These data need to be repeated with an antibody that detects both UBF forms.

      We thank the reviewer for raising the important issue of UBTF splice isoforms. Relevant citations demonstrating that the SC13125 antibody recognizes only UBF2 would have been very helpful. The human UBTF gene is alternatively spliced into full-length UBF1 (exon 8 retained) and UBF2 (exon 8 spliced out). The deletion of exon 8 results in a 37 amino acid deletion in UBF2 corresponding to residues 221-268 in HMG box 2 of UBF1 (see Ensembl entry ENSG00000108312.16). The truncation of HMG box 2 makes UBF2 a far less potent transcriptional activator than UBF1. Because of the small molecular weight difference between these two isoforms, preference of an antibody for one vs. another isoform is not readily apparent by Western blotting. However, according to the manufacturer of the UBTF antibody used in this study, the immunogen corresponds to residues 1-220 of UBTF1, which is immediately N-terminal to the residues deleted in UBF2 (AAs 221-268, encoded by exon 8). The antibody's immunogen is thus entirely sequence that is shared between UBF1 and UBF2. Further, a previous study performed immunoprecipitation followed by mass spectrometry using this antibody and reported detection of UBF1-specific peptides (Drakas et al., PNAS 2004). Therefore, absent our knowledge of any evidence to the contrary, we conclude that this antibody recognizes UBF1 and possibly also UBF2.

      We thank the reviewer for raising this point and have adjusted the text to avoid the misleading implication that we are unambiguously detecting only the UBF1 isoform; all mentions of "UBF1" in the revised text have been replaced with "UBTF".

      Setting aside the question about the UBTF antibody reagent used, we observe consistent results by evaluating both UBTF (Figure 7) and Pol I (Figure S4) binding to rDNA in spite of CpG methylation; therefore, we conclude that the human Pol I machinery is not displaced from the human rDNA promoter by intermediate levels of CpG methylation.

      Reviewer 2

      1. There is very little discussion concerning the methylation status of the IGS...the Kobayashi lab has convincingly demonstrated that rDNA repeats fall into 2 classes. Those in which the supposedly active repeats lack methylation on promoters and coding regions and those in which both promoters and coding regions are heavily methylated. In both cases the IGS is fully methylated.

      We cite this study in the Discussion (reference 18 in bibliography) and agree that this work is relevant to ours; we have adjusted the text to emphasize this point. Notably, this previous analysis of CpG methylation patterns by long-read sequencing implied that active repeats may be entirely hypomethylated along their coding sequence; our data more directly demonstrate this both by ATAC-Me and by Pol I ChIP-chop (Fig. 2).

      There is no description of how rRNA levels were assessed. I suggest this could be further complemented by in vivo incorporation studies such as EU labeling.

      We apologize for this lack of clarity. rRNA levels were assessed by qPCR of the 45S pre-rRNA (Fig. 3A) and of mature 28S rRNA (Fig. 3B), and these data are presented as a fold change in each rDNA-targeting sgRNA compared to a non-targeting control sgRNA. The primersets used are listed in Supplementary Table 1.

      While we agree that EU labeling could be useful for detecting nucleolar transcription, qPCR detection of the 45S rRNA also sensitively reports nascent transcription and we think is sufficient to address this question.

      Reviewer 3

      1. The study points to differences between mouse and human rDNA and the effect of DNA methylation on transcriptional output. Did the mouse rDNA dataset also measure transcription output to correlate with DNA methylation age differences?

      The original study that defined the rDNA methylation clock (Wang & Lemos Genome Research 2019) did not evaluate rDNA transcription in parallel. More generally, the relationship of age-linked "clock" CpG methylation sites to expression / function of CpG methylated loci is very unclear, and testing the potential relationship between age-linked rDNA methylation and function was the major goal of this study.

      Did the spacer promoter also get methylated and did that affect UBF and Pol I binding?

      While the existence and function of a spacer promoter has been more clearly defined in the mouse rDNA repeat, recent evidence indicates that the Pol I transcription machinery also binds a second location about 800 bp upstream of the core promoter in the human rDNA repeat (Mars et al G3 2018). The guides that we used to direct CpG methylation recognize single unique sites in the core rDNA promoter and do not recognize sequences in this putative spacer promoter, and we did not analyze methylation at the spacer promoter. Analysis of the spacer promoter is generally beyond the scope of this study, as it is unknown whether there is any relationship between spacer promoter methylation and aging progression.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present evidence suggesting that MDA5 can substitute as a sensor for triphosphate RNA in a species that naturally lacks RIG-I. The key findings are potentially important for our understanding of the evolution of innate immune responses, but the evidence is incomplete, as additional biochemical and functional experiments are needed to unambiguously assign MDA5 as a bona fide sensor of triphosphate RNA in this model. This also leaves the title as overstating its case.

      We would like to thank the editorial team for these positive comments on our manuscript and the constructive suggestions to improve our manuscript. According to the suggestions and valuable comments of the referees, we have added substantial amounts of new data and analysis to substantiate our claims, and the manuscript, including the title, has been carefully revised to better reflect our conclusions. We are now happy to send you our revised manuscript, we hope the modified manuscript addresses your and the reviewers’ concerns satisfactorily and is suitable for publication in eLife now.

      Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.

      However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts.

      We concur with the viewpoint that virus-host coevolution complicates the derivation of universal conclusions. To address this challenge, incorporated additional experiments and data based on the suggestions of the reviewers. These experiments were carried out across diverse models, including two distinct vertebrate species (M. miiuy and G. gallus), two different viruses (SCRV and VSV), and the synthesis of corresponding 5’ppp-RNA probes. We believe that these supplementary data bolster the evidence supporting the immune replacement role of MDA5 in the recognition of 5'ppp-RNA in RIG-I deficient species (Figure 1C-1E, Figure 2O and 2P, Figure 4). Moreover, we have duly incorporated references in both the introduction and discussion sections to further support our conclusion that MDA5 in T. belangeri, a mammal lacking RIG-I, possesses the ability to detect RNA viruses posed as RIG-I agonists (doi: 10.1073/pnas.1604939113). Lastly, meticulous revisions have been undertaken in the manuscript, including adjustments to the title, to ensure harmonization with our research outcomes.

      Reviewer#2 (Public Review):

      This manuscript by Geng et al. aims to demonstrate that MDA5 compensates for the loss of RIG-I in certain species, such as teleost fish miiuy croaker. The authors use siniperca cheats rhabdovirus (SCRV) and poly(I:C) to demonstrate that these RNA ligands induce an IFN response in an MDA5-dependent manner in M. miiuy derived cells. Furthermore, they show that MDA5 requires its RD domain to directly bind to SCRV RNA and to induce an IFN response. They use in vitro synthesized RNA with a 5'triphosphate (or lacking a 5'triphosphate as a control) to demonstrate that MDA5 can directly bind to 5'-triphosphorylated RNA. The second part of the paper is devoted to m6A modification of MDA5 transcripts by SCRV as an immune evasion strategy. The authors demonstrate that the modification of MDA5 with m6A is increased upon infection and that this causes increased decay of MDA5 and consequently a decreased IFN response.

      The key message of this paper, i.e. MDA5 can sense 5'-triphosphorylated RNA and thereby compensate for the loss of RIG-I, is novel and interesting, yet there is insufficient evidence provided to prove this hypothesis. Most importantly, it is crucial to test the capacity of in vitro synthesized 5'-triphosphorylated RNA to induce an IFN response in MDA5-sufficient and -deficient cells. In addition, a number of important controls are missing, as detailed below.

      To further support the notion that MDA5 is capable of detecting 5'ppp-RNA in species lacking RIG-I, we conducted additional experiments. Initially, we isolated the RNA from SCRV and VSV viruses. Subsequently, we synthesized 5'ppp-RNA probes that corresponded to the genome termini of SCRV and VSV in vitro. Then, these RNAs were treated with Calf intestinal phosphatase (CIAP) to generate dephosphorylated derivatives. Next, we separately tested the activation ability of various RNAs on IRF3 dimer and IFN response in MKC (M. miiuy kidney cell line) and DF-1 (G. gallus fibroblast cell line) cells, and determined that the immune activation ability of SCRV/VSV viruses depends on their triphosphate structure (Figure 1C-1E, Figure 4C and 4J). In addition, the knockdown of MDA5 inhibited the immune response mediated by SCRV RNA (Figure 2P and 2Q). Finally, we incorporated essential experimental controls (Figure 4B and 4I). We think that the inclusion of these supplementary experimental data significantly enhances the credibility and further substantiates our hypothesis.

      The authors describe an interaction between MDA5 and STING which, if true, is very interesting. However, the functional implications of this interaction are not further investigated in the manuscript. Is STING required to relay signaling downstream of MDA5?

      To better explore the role of STING in MDA5 signal transduction, we constructed a STING expression plasmid and synthesized specific siRNA targeting STING. Next, we found that co-expression of STING and MDA5 significantly enhance MDA5-mediated IFN-1 response during SCRV virus infection (Figure 2N). Conversely, silencing of STING expression restored the MDA5-mediated IFN-1 response (Figure 2O). These findings provide important evidence for the critical involvement of STING in the immune signaling cascade mediated by MDA5 in response to 5'ppp-RNA viruses.

      The second part of the paper is quite distinct from the first part. The fact that MDA5 is an interferon-stimulated gene is not mentioned and complicates the analyses (i.e. is there truly more m6A modification of MDA5 on a per molecule basis, or is there simply more total MDA5 and therefore more total m6A modification of MDA5).

      For the experimental data analysis in Figure 5E and 5F, we first compared the m6A-IP group to the input group, and then normalized the control group (IgG group of 5E and Mock group of 5F) to a value of “1”. Given the observed variability in MDA5 expression levels within the input group of Mock and SCRV virus-infected cells, our analysis represents the actual m6A content of each MDA5 molecule. To enhance clarity, we have updated the label on the Y-axis in Figure 5E and 5F.

      Finally, it should be pointed out that several figures require additional labels, markings, or information in the figure itself or in the accompanying legend to increase the overall clarity of the manuscript. There are frequently details missing from figures that make them difficult to interpret and not self-explanatory. These details are sometimes not even found in the legend, only in the materials and methods section. The manuscript also requires extensive language editing by the editorial team or the authors.

      We acknowledge the valuable feedback from the reviewer and have made significant improvements to our manuscript based on the recommendations provided in the "Recommendation for the authors" section. Furthermore, we have conducted a thorough review of the entire article, resulting in substantial enhancements to the format, clarity, and overall readability of our manuscript.

      Reviewer#3 (Public Review):

      Summary: In this manuscript, the authors investigated the interaction between the pattern recognition receptor MDA5 and 5'ppp-RNA in a teleost fish called Miiuy croaker. They claimed that MDA5 can replace RIG-I in sensing 5'ppp-RNA of Siniperca cheats rhabdovirus (SCRV) in the absence of RIG-I in Miiuy croaker. The recognition of MDA5 to 5'ppp-RNA was also observed in the chicken (Gallus gallus), a bird species that lacks RIG-I. Additionally, they reported that the function of MDA5 can be impaired through m6A-mediated methylation and degradation of MDA5 mRNA by the METTL3/14-YTHDF2/3 regulatory network in Miiuy croaker under SCRV infection. This impairment weakens the innate antiviral immunity of fish and promotes the immune evasion of SCRV.

      Strengths:<br /> These findings provide insights into the adaptation and functional diversity of innate antiviral activity in vertebrates.

      Weaknesses:<br /> However, there are some major and minor concerns that need to be further addressed. Addressing these concerns will help the authors improve the quality of their manuscript.One significant issue with the manuscript is that the authors claim to be investigating the role of MDA5 as a substitute for RIG-I in recognizing 5'ppp-RNA, but their study extends beyond this specific scenario. Based on my understanding, it appears that sections 2.2, 2.3, 2.5, 2.6, and 2.7 do not strictly adhere to this particular scenario. Instead, these sections tend to investigate the functional involvement of Miiuy croaker MDA5 in the innate immune response to viral infection. Furthermore, the majority of the data is focused on Miiuy croaker MDA5, with only a limited and insufficient study on chicken MDA5. Consequently, the authors cannot make broad claims that their research represents events in all RIG-I deficient species, considering the limited scope of the species studied.

      We agree with the reviewer's perspective that functional analysis of MDA5 in M. miiuy may not adequately represent all species lacking RIG-I. To address this concern, we have incorporated additional experimental data utilizing different model systems, including two different vertebrate species (M. miiuy and G. gallus), two distinct viruses (SCRV and VSV), and the synthesis of two corresponding 5’ppp-RNA probes. While the functional characterization of G. gallus MDA5 remains relatively limited compared to M. miiuy, our current experimental findings provide support for two key observations. Firstly, the triphosphate structure of the VSV virus is pivotal in activating the innate immune response in G. gallus against the virus (Figure 1D and 4J). Secondly, G. gallus MDA5 can recognize 5’ppp-RNA (Figure 4I, 4K and 4L). Consequently, although we cannot definitively establish the immune surrogate function of MDA5 in all RIG-I-deficient species, our research data further substantiates this hypothesis. Moreover, we have adopted a more cautious attitude in summarizing our experimental conclusions, thereby enhancing the rigor of our manuscript language.

      The current title of the article does not align well with its actual content. It is recommended that the focus of the research be redirected to the recognition function and molecular mechanism of MDA5 in the absence of RIG-I concerning 5'ppp-RNA. This can be achieved through bolstering experimental analysis in the fields of biochemistry and molecular biology, as well as enhancing theoretical research on the molecular evolution of MDA5. It is advisable to decrease or eliminate content related to m6A modification.

      Following the reviewer's recommendations, we have revised the title to emphasize that our main research focus is a teleost fish devoid of RIG-I. Furthermore, we have conducted additional molecular experiments to further elucidate the 5'ppp-RNA recognition function of MDA5 in RIG-I-deficient species. In an attempt to analyze the potential molecular evolution of MDA5 resulting from RIG-I deficiency, we collected MDA5 coding sequences from diverse vertebrates. However, due to multiple independent loss events of RIG-I in fish, fish with or without RIG-I genes in the phylogenetic tree cannot be effectively clustered separately, making it extremely difficult to perform this aspect of analysis. Consequently, we have regrettably opted to forgo the molecular evolution analysis of MDA5.

      Our article topic is to reveal an antagonistic phenomenon between fish receptor and RNA viruses. The MDA5 of RIG-I-lost fish has evolved the ability to recognize 5’ppp-RNA virus and mediate IFN response to resist SCRV infection. Conversely, the m6A methylation mechanism endows the SCRV virus with a means to weaken the immune capacity of MDA5. Therefore, we believe that the latter part is an important part of the arms race between the virus and its host, and should be retained.

      Additionally, the main body of the writing contains several aspects that lack rigor and tend to exaggerate, necessitating significant improvement.

      We appreciate the reviewer’s comment and have improved the manuscript addressing the points raised in the “Recommendation for the authors”. We have added corresponding experiments to strengthen the verification of the conclusions, and in addition, we are more cautious in summarizing the language of the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The evidential foundation within the Result 1 section appears somewhat tenuous.

      Firstly, the author derives conclusions regarding the phenomenon of RIG-I loss in lower vertebrates by referencing external literature and conducting bioinformatics analyses. It is pertinent to inquire whether the author considered fortifying these findings through additional WB/PCR experiments, particularly for evaluating RIG-I expression levels across diverse vertebrates, encompassing both lower and higher orders.

      Firstly, the species we analyzed are mostly model species with excellent genomic sequence information in the database. Secondly, the RIG-I protein sequences (at least some domain sequences) are relatively conserved in vertebrates. Therefore, the credibility of evaluating the existence of RIG-I in these species through homology comparison is high. Therefore, we do not intend to conduct additional PCR/WB experiments to confirm this.

      Additionally, following the identification of RIG-I loss, the author postulates MDA5 as a substitute of RIG-I, grounding this speculation in the analysis of MDA5 and LGP2 protein structures. It is imperative to address whether the author could enhance the manuscript by supplying expression data for MDA5 and LGP2 across different vertebrates and elucidating further why MDA5 is posited as the compensatory mechanism for RIGI loss.

      Like MDA5, LGP2 is also an interferon-stimulating gene, so they both likely exhibit high sensitivity to viral infections. Therefore, we think that comparing the expression data of these two genes is difficult to evaluate their function. In mammals, the regulatory mechanisms of LGP2 to RIG-I and MDA5 were complicated and ambiguous. To evaluate the potential function of LGP2 in M. miiuy, we further constructed LGP2 plasmid and synthesized siRNA targeting LGP2. Then, our results indicate that mmiLGP2 can enhance the antiviral immune response mediated by mmiMDA5 (Figure 1H and 1I), further indicating the regulatory role of mmiLGP2 in RLR signaling, rather than acting as a compensatory receptor for RIG-I.

      Also, is it conceivable that other receptors contribute to this compensatory effect in lower vertebrates?

      5’ triphosphate short blunt-end double-strand RNA is the ligand of RIG-I as contained in the panhandle of negative-strand viral genomes. We mainly focus on the immune recognition and compensatory effects of other receptors on RIG-I loss, and MDA5, as the protein with the most similar structure, first attracted our attention. In addition, IFIT proteins have been reported to recognize triphosphate single-stranded RNA (doi: 10.1038/nature11783). However, we used SCRV and VSV RNA as viral models, both of which have negative stranded genomes and meet the ligand standards of RIG-I, rather than IFIT. Therefore, we excluded the IFIT protein from our research scope.

      (2) The article exclusively employs a singular type of 5'PPP-RNA virus and one specific lower vertebrate species, thereby potentially compromising the robustness of the assertion that this phenomenon is prevalent in lower vertebrates. To bolster this claim, could the author consider incorporating data from an alternative 5'PPP-RNA virus and a different lower vertebrate species?

      To address this concern, we have incorporated additional experimental data utilizing different model systems, including two different vertebrate species (M. miiuy and G. gallus) and two distinct viruses (SCRV and VSV). While the functional characterization of G. gallus MDA5 remains relatively limited compared to M. miiuy, our current experimental findings provide support for two key observations. Firstly, the triphosphate structure of the VSV virus is pivotal in activating the innate immune response in G. gallus against the virus (Figure 1D and 4J). Secondly, G. gallus MDA5 can recognize 5’ppp-RNA (Figure 4I, 4K and 4L). Consequently, these experimental results further confirmed the conservatism of this immune compensation mechanism.

      (3) A nuanced consideration of the statement in Result 5 is warranted. Examination of the results under SCRV infection conditions suggests dynamic fluctuations in MDA5 expression levels, challenging the veracity of the statement implying "increased expression", which contradicts the proposed working model of this article.

      Because MDA5 acts as a receptor and plays a recognition immune role in the early stages of virus infection, the expression of MDA5 in the early stage of SCRV infection rapidly increases. In the later stage of infection, the expression of MDA5 may gradually decrease again due to the negative feedback mechanism in the host body to prevent excessive inflammation. However, compared to the uninfected group, the expression of MDA5 was significantly increased in the SCRV-infected group, so we believe that the term "increased expression" is not a problem. In addition, the m6A mechanism can weaken the function of MDA5, but it still cannot prevent the overall increase of MDA5 expression, which is not contradictory to the working model in this article.

      Additionally, the alterations in m6A levels in miiuy croaker under SCRV infection conditions warrant clarification. Could the author employ m6A dot blotting to supplement the findings related to total m6A levels?

      Our previous studies (doi: 10.4049/jimmunol.2200618) have suggested that the total m6A level is increased after SCRV infection in miiuy croaker. We cited this conclusion in the discussion of our manuscript.

      (4) It would be beneficial if the editors could assist the author in enhancing the language of the manuscript.

      We have carefully checked the full article and modified it with Grammarly tools, and we believe that the grammar, format, and readability of our articles have been greatly improved.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1

      (1) Figure 1B - some clarification needs to be added about this figure in the text. It is unclear what the main point is that the authors would like to convey.

      What we want to emphasize is that some species with RIG-I, such as zebrafish, have also experienced RIG-I loss events, but have undergone whole genome replication events before the loss, thus preserving a copy of RIG-I. This indicates that loss events of RIG-I are very common in vertebrates and do not occur randomly. We have elaborated on this point in the results and discussion.

      (2) Figure 1C - is not very informative other than showing Mm MDA5 and LGP2 side-by-side. It would be more useful to show a comparison of human RIG-I/MDA5 alongside Mm and Gg MDA5. Are there any conserved/shared key residues between hRIG-I/hMDA5 versus mmMDA5?

      Homologous proteins are often known to adopt the same or similar structure and function. We have added human RIG-I domain information to this figure (Figure 1F). By comparing the domain information of human RIG-I with M. miiuy MDA5 and LGP2, M. miiuy MDA5 has a similar structure to human RIG-I, making it most likely to compensate for the missing RIG-I. While M. miiuy LGP2 lacks the CARD domain, which is crucial for signal transduction, so we will shift our focus to M. miiuy MDA5. In addition, we collected protein sequences of MDA5 and RIG-I from various vertebrates to identify key residues evolved in recognizing 5'ppp-RNA by M. miiuy MDA5. However, unfortunately, no potential residues were found during the comparison process.

      Figure 2

      (1) Figure 2B - It would be important to demonstrate MDA5-Flag expression by immunoblot and compare MDA5-Flag overexpression to endogenous MDA5 expression using the anti-MDA5 antibody from panel 2A. If IF is used, more cells need to be visible in the field.

      After transfecting the MDA5 plasmid into MKC, endogenous MDA5 expression was detected using MDA5 antibodies. The results showed a significant increase in MDA5 protein levels, indicating that MDA5 antibodies can specifically recognize MDA5 protein. In addition, we retained the original immunofluorescence images to better demonstrate the subcellular localization of MDA5.

      (2) Figure 2C - The 1:1 stoichiometry of MDA5:MAVS (in the absence of any stimulus) is quite surprising. How does the interaction between MDA5 and MAVS change upon stimulation with an RNA ligand (SCRV, poly(I:C))?

      We do not believe that the actual stoichiometry between MDA5 and MAVS is what you described as 1:1. In fact, the proportion of proteins in the complex depends on many factors in the experimental results with Co-IP. Firstly, the MDA5 plasmid in this study has a 3 × Flag tag, while the MAVS only has a 1x Myc tag, which makes the antibody more sensitive for detecting MDA5-Flag. In addition, the Co-IP results are also affected by multiple factors such as the type of antibody and the number of recoveries, making it difficult to estimate the actual ratio of MDA5 to MAVS. Based on the above reasons and the fact that the detection of the interaction strength between MDA5 and MAVS after infection seems to be off-topic, we did not continue to explore this point.

      (3) Figure 2D - The interaction between MDA5 and STING is a very interesting finding but is not elaborated on in the paper (even though the interaction between MDA5 and STING is mentioned in the abstract). The manuscript would be strengthened if the interaction between MDA5 and STING is further investigated. For example, does the IFN response that is reported in panels 2E to 2H require the presence of STING? Does mmMDA5 signal via STING in response to a DNA ligand?

      We appreciate the referee's suggestion to study the mutual influence between MDA5 and STING. We found that co-expression of STING and MDA5 can enhance MDA5-mediated IFN-1 response during SCRV virus infection, while knocking down STING can restore MDA5-mediated IFN-1 (Figure 2N and 2O). This indicates that STING plays an important signaling role in the immune response of MDA5 to RNA viruses. We understand the importance of cGAS/STING pathways in identifying exogenous DNA, so exploring the MDA5 pathway for DNA ligand recognition is an interesting and meaningful perspective. But this seems to be detached from the theme of our article, so we didn't continue to explore this point.

      (4) Figures 2F and 2H - the authors demonstrate that SCRV induces a type I IFN response in an MDA5-dependent manner. While SCRV is a single-stranded negative-sense RNA virus that contains 5'ppp-RNA, it cannot be excluded that MDA5 is activated here in response to a double-stranded RNA intermediate of viral origin or even a host-derived RNA whose expression or modification is altered during infection. To demonstrate in an unambiguous manner that MDA5 senses 5'ppp-RNA, it is crucial to use the in vitro synthesized 5'ppp-RNA (and its dephosphorylated derivative as a control) from Fig. 4 in these experiments.

      We transfected 5 'ppp SCRV and 5' ppp VSV (and their dephosphorylated derivatives) synthesized in vitro into MKC cells and DF-1 cells, respectively. The results showed that 5’ppp-RNAs significantly promoted the formation of IRF3 dimers, while their dephosphorylated derivatives did not (Figure 4C and 4J). In addition, we extracted virus RNA from the SCRV and VSV viruses and dephosphorylated them with Calf intestinal phosphatase (CIAP). These RNAs were transfected into MKC and DF-1 cells and found that the immune response mediated by virus RNAs was much higher than the dephosphorylated form (Figure 1C-1E). The above results indicate that the immune response activated by SCRV and VSV is indeed dependent on their triphosphate structure. Finally, the IRF3 dimer and IFN induction activated by SCRV RNA can be inhibited by si-MDA5 (Figure 2P and 2Q), further demonstrating the involvement of MDA5 in the immune response mediated by 5’ppp-RNA ligands.

      (5) In mice and humans, MDA5 is known to collaborate with LGP2 to jointly induce an IFN response. Does M.miiuy express LGP2? If so, it would be informative to include a siRNA targeting LGP2 in the experiments in panel F. In mammals, LGP2 potentiates the response via MDA5 while it may inhibit RIG-I activation.

      M.miiuy express LGP2. We constructed an LGP2 plasmid and synthesized si-LGP2 to investigate the impact of LGP2 on MDA5-mediated immune processes (Figure 1G-1I). The results showed that LGP2 can enhance the IFN response mediated by MDA5 during SCRV virus infection, similar to that in mammals.

      (6) Minor comment - Is the poly(I:C) used in this figure high or low molecular weight poly(I:C)? HMW poly(I:C) preferentially stimulates MDA5, while LMW poly(I:C) preferentially stimulates RIG-I.

      We used poly(I:C)-HMW as a positive control for activating MDA5. We have modified the relevant information in Figure 2 and its legend.

      Figure 3

      (1) Figure 3F/G - The normalization in this Figure is difficult to interpret. It would be better to split Figure 3G into 4 separate graphs and include the mock-infected cells alongside the infected samples (as done in Figure 2).

      To better demonstrate the function of the RD domain of MDA5 in M. miiuy, we have changed the experimental plan, as shown in figure 3F. We detected the induction of antiviral factors by overexpression of MDA5 and MDA5-△RD under poly (I:C)-HMW stimulation. This can indicate that the RD domain of MDA5 has a conserved function in the recognition of poly(I:C)-HMW in M. miiuy, and can serve as a positive control for the recognition of SCRV virus by the RD domain.

      Figure 4

      (1) Figure 4B - A number of important controls are missing. Was the immunoprecipitation of RNA successful? This could be shown by running a fraction of the immunoprecipitated material on an RNA gel and/or by showing that the input RNA was depleted after IP. In addition, a control IP (Streptavidin beads without biotinylated RNA) is missing to ensure that MDA5 does not stick non-specifically to the Streptavidin resin.

      We appreciate the referee's suggestions. We rerun this experiment and added a non-biomarker RNA IP control group, and the results showed that MDA5 did not adsorb non-specific onto the beads (Figure 4B). In addition, based on the referee's suggestion, we tested the consumption of RNA before and after immunoprecipitation, and the results showed that biotin-labeled RNA, rather than non-biotin-labeled RNA, could be adsorbed by beads, indicating the success of RNA precipitation. However, we think that this is not necessary for the final presentation of the experimental results, so we did not show this in the figure.

      (2) Figure 4B - It is unclear why there is such a large molecular weight difference between endogenous MDA5 and MDA5-Flag (110 kDa versus 130/140 kDa). Why is there less MDA5-Flag retrieved than endogenous MDA5?

      After careful analysis, we believe that the significant difference in molecular weight between endogenous MDA5 and MDA5 Flag may be due to three reasons. Firstly, MDA5 flag has a 3× Flag tag. Secondly, as shown in the primer table, we constructed MDA5 between the NotI and XbaI cleavage sites in the pcDNA3.1 vector, which are located at the posterior position in the vector. This means that the Flag tag has a certain distance from the starting codon of MDA5, and these sequences on the vector can also be translated and increase the molecular weight of the exogenous MDA5 protein. Finally, in order to facilitate the amplification of the primers, the F-terminal primers of MDA5 contain a small portion of the 3'UTR sequence (excluding the stop codon). These above reasons may have led to significant differences in molecular weight. In addition, in order to supplement important experimental controls, we have conducted a new RNA pull-down experiment as shown in Figure 4B.

      (3) Minor point: Figure 4B - please clarify in the figure whether RNA or protein is immunoprecipitated and via which tags.

      We have conducted a new RNA pull-down experiment as shown in Fig 4B, and we have clearly labeled the relevant information in the figure.

      (4) Figure 4E - the fraction of MDA5 that binds 5'ppp-RNA seems incredibly minor. And why is this experiment done using 5'OH-RNA as a competitor, rather than simply incubating MDA5 and 5'OH-RNA together and demonstrating that these do not form a complex?

      The proportion of MDA5 combined with 5’ppp-RNA is influenced by many conditions, including the concentration and purity of the probe and purified protein. In addition, the dosage ratio between the RNA probe and MDA5 protein in the EMSA experiment can also have a significant impact on the results. Therefore, it is not possible to accurately determine the actual binding force between MDA5 and RNA. In the EMSA experimental program, both cold probes (5’ppp-RNA) and mutated cold probes (5’OH-RNA and 5’pppGG-RNA) are crucial for demonstrating the specific binding between MDA5 and 5’ppp-RNA, as they can exclude false positive errors caused by factors such as the presence of biotin in the purified MDA5 protein itself.

      (5) Figure 4B/4C/4F - These experiments would be strengthened by including an MDA5 mutant that cannot bind to RNA. These mutants are well-described in mammals. If these residues are conserved, it is straightforward to generate this mutant.

      As shown in Figure 3, the MDA5 of M. miiuy has an RD domain that can recognize the SCRV virus. We constructed MDA5-△RD mutant plasmids with 6x His-tags and purified them for EMSA experiments (Figure 4E). The experimental results further indicate that MDA5, rather than MDA5-△RD, can bind to 5’ppp-SCRV (Figure 4G). This further confirms the crucial role of the RD domain in recognizing the 5'ppp-RNA virus.

      (6) Minor point: Figure 4E: please clarify in which lanes MDA5 has been added.

      Thank you for the referee's suggestion. We have synthesized new 5'ppp-RNA probes (5’ppp-SCRV and their dephosphate derivatives) and rerun this experiment, and relevant information has been added in the Figure (Figure 4F).

      Figure 5

      (1) Figure 5C - As MDA5 is an interferon-stimulated gene (as shown in panel G/H/I)) the increased MDA5 expression could simply explain the increase in the amount of m6A-MDA5 that is immunoprecipitated after infection. Could this figure be improved by doing a fold change between input vs m6A-IP OR uninfected vs SCRV-infected conditions? This would reveal whether the modification of MDA5 with m6A is really increased after infection.

      As shown in Figure 5F below, our data indicates that the proportion of m6A-modified MDA5 does indeed increase after SCRV infection, rather than solely due to the increased expression of MDA5 itself.

      (2) Figure. 5E/F - The y-axis is unclear: relative MDA5 m6A levels. Relative to what? Input? Mock infected?

      For experiments in Figure 5E/F, we first compared the m6A-IP group with the input group, and then normalized the control group (IgG group of 5E and Mock group of 5F) to “1”. We have replaced the Y-axis name with a clearer one (Figure 5E and 5F).

      (3) General comment - It is not mentioned in the text that MDA5 is an interferon-stimulated gene. This would account for the increase in expression (qPCR) after viral infection or poly(I:C) transfection, hence there is no novelty in this finding. In addition, the authors suggest that MDA5 increases at the protein level (by immunoblot) but the increase on these blots is not convincing (figure 5H/5I).

      We understand that the increase in expression of MDA5 as an interferon-stimulated gene after viral infection is a common phenomenon. We present this to further validate the m6A sequencing transcriptome data, and to demonstrate that although m6A modification interferes with MDA5 expression during viral infection, it cannot prevent the increase of mRNA level of MDA5. In addition, we rerun the experiment and the results showed that the expression of MDA5 protein can indeed be specifically activated by the SCRV virus and poly(I:C)-HMW.

      Figure 6

      (1) Figure 6E - What was the MOI of the virus used in this experiment? It is not mentioned in the figure legend.

      MOI=5, we have added this point in the figure legend.

      Figure 7

      (1) Figure 7J - This graphic is somewhat misleading and should be altered to better reflect the conclusions that are drawn in the manuscript. The graphic suggests that MAVS and STING interact, but this is not demonstrated in the paper. In addition, the paper does not demonstrate whether MAVS or STING (or both) are needed downstream of MDA5 to relay signalling. Finally, please draw an arrow from type I IFNs to increased expression of MDA5 to illustrate that MDA5 is an ISG.

      Thank you for the referee's suggestion. We have revised the images to more accurately match the conclusions of the manuscript (Figure 7J). Firstly, we have separated the STING protein from the MAVS protein. Secondly, arrows have been used to indicate that MDA5 is an IFN-stimulated gene. Finally, as we have added relevant experiments to demonstrate the importance of MITA protein in the signaling process of MDA5-activated IFN response. In addition, the function of MAVS binding to MDA5 protein and promoting its signal transduction is very conserved, and there is a good research background even in fish with RIG-I deficiency (10.1016/j.dci.2021.104235). Therefore, in Figure 7J, we still chose to bind MAVS to MDA5 protein and use it as a downstream signal transducer of MDA5.

      Discussion<br /> (1) There is very little discussion about METTL and YTHDF proteins in the discussion despite the fact that the last 2 figures are entirely devoted to these proteins.

      Based on the referee's suggestion, we have added relevant content about METTL and YTHDF proteins in the discussion. In addition, the basic mechanism and function of METTL and YTHDF proteins were briefly described in the introduction.

      Reviewer #3 (Recommendations For The Authors):

      Please refer to the specific suggestions and recommendations. They include proposals for experimental additions, improved methodologies, and suggestions to resolve writing-related concerns.

      Major concerns

      (1) I suggest changing the article title to "Functional Replacement of RIG-I with MDA5 in Fish Miiuy Croaker", or a similar title, to make it more focused and closely aligned with the content of the article.

      Following the reviewer's recommendations, we have revised the title to emphasize our primary research subject is a teleost fish that lacks RIG-I. In addition, we have changed “5’ppp-RNA” to “5’ppp-RNA virus” to emphasize the interaction between the virus and the receptor. We believe that the revised title is more in line with the content of the article.

      (2) Due to the inherent limitations in genome sequencing, assembly, and annotation for the Miiuy croaker, comprehensive annotation of immune-related genes remains incomplete. To address this critical gap, it is recommended that authors establish experimental protocols, such as Fluorescence In Situ Hybridization (FISH), to confirm the absence of RIG-I in the Miiuy croaker. They should simultaneously employ MDA5 probes as a positive control for validation purposes.

      The miiuy croaker has good genomic information at the chromosomal level (doi: 10.1016/j.aaf.2021.06.001). In addition, studies have shown that RIG-I is absent in the orders of Perciformes (doi: 10.1016/j.fsirep.2021.100012), while miiuy croaker belongs to the order Perciformes, so it does indeed lose the RIG-I gene. Therefore, we do not intend to use FISH technology to prove this.

      (3) Similarly, it is recommended that the authors first provide evidence of the presence of 5'ppp at the 5' terminus of the genome RNA of SCRV, as demonstrated in the study by Goubau et al. (doi: 10.1038/nature13590, Supplementary figure 1). This evidence is crucial before drawing conclusions about the compensatory role of MDA5 in recognizing 5'ppp RNA viruses, using SCRV as the viral model.

      As suggested by the referee, we extracted SCRV RNA from SCRV virus particles and assessed the 5’-phosphate-dependence of stimulation by SCRV RNA. Calf intestinal phosphatase (CIAP) treatment substantially reduced the stimulatory activity of SCRV RNA in MKC cells of M. miiuy (Figure 1C and 1E). In addition, similar results were obtained by transfecting VSV-RNA isolated from VSV virus into DF-1 cells of G. gallus (Figure 1D). The above evidences confirm the presence of triphosphate molecular features between SCRV and VSV viruses, and indicating that birds and fish lacking RIG-I have other receptors that can recognize 5’ppp-RNA.

      (4) The 62-nucleotide (nt) 5'ppp-RNA utilized in this study was obtained from Vesicular Stomatitis Virus (VSV). In order to provide direct evidence, it is necessary to include a 62-nt 5'ppp-RNA that is directly derived from SCRV itself.

      We adopted this suggestion and synthesized a 67-nucleotide 5’ppp-SCRV RNA probe. We found that 5’ppp-SCRV activates dimerization of IRF3 and binds to MDA5 of M. miiuy in a 5’-triphosphate-dependent manner (Figure 4A-4F).

      (5) Given that RNAs with uncapped diphosphate (PP) groups at the 5′ end also activate RIG-I, similar to RNAs with 5′-PPP moieties, and the 5′-terminal nucleotide must remain unmethylated at its 2′-O position to allow RNA recognition by RIG-I, it is necessary for the authors to conduct additional experiments to supplement and validate these two distinguishing features of RIG-I in RNA recognition. This will provide more reliable evidence for the replacement of RIG-I by MDA5 in RNA recognition.

      Thank you for the reviewer's professional suggestions. We understand that exploring the combination of 5’pp-RNA and 2′-O-methylated RNA with MDA5 can further demonstrate the alternative function of MDA5. But we think that the use of 5’ppp-RNA and their dephosphorylation derivatives can fully demonstrate that the MDA5 of M. miiuy and G. gallus have evolved to recognize 5’triphosphate structure like human RIG-I. Therefore, we do not intend to conduct any additional experiments

      (6) In section 2.3, the authors assert that Miiuy croaker recognizes SCRV through its RD domain. This claim is supported by their data showing that cells overexpressed with the MDA5 ΔRD mutant lost the ability to inhibit SCRV replication. As a result, the authors draw the conclusion that "these findings provide evidence that MDA5 may recognize 5'-triphosphate-dependent RNA (5'ppp-RNA) through its RD domain." However, to strengthen their argument, the authors should first demonstrate that during SCRV infection, MDA5-mediated antiviral immune response is indeed initiated by recognizing the 5'ppp part of the SCRV RNA, rather than the double-strand part (which can exist in ssRNA virus) of the viral RNA, as this is naturally a ligand for MDA5. Additionally, the authors should treat the isolated SCRV RNA with CIP to remove the phosphate group and examine the binding of MDA5 with SCRV RNA before and after treatment. They should also transfect CIP-treated or untreated SCRV RNA into MDA5 knockdown and wild-type MKC cells to investigate the induction of antiviral signaling and levels of viral replication. Finally, the authors should verify the binding ability of the mutants with isolated SCRV RNA, with or without CIP treatment, to determine which domain of MDA5 is responsible for SCRV 5'ppp-RNA recognition.

      We understand the reviewer's concern that MDA5 may be identified by binding to dsRNA in the SCRV virus. Based on the reviewer's suggestion, we extracted SCRV RNA and obtained its dephosphorylated RNA using Calf intestinal phosphatase (CIAP). Next, we transfected them into MDA5-knockdown and wild-type MKC cells, and detected the dimerization of IRF3 and IFN reaction. The results indicate that SCRV RNA does indeed activate immunity in a triphosphate-dependent manner, and knockdown of MDA5 prevents immune activation of SCRV RNA (Figure 1C and 1E, Figure 2P and 2Q). Finally, we synthesized a 5'ppp-SCRV RNA probe and demonstrated that MDA5 binds to 5'ppp-SCRV through the RD domain (Figure 4E-4G). We believe that these results can better demonstrate that MDA5 recognizes 5’ppp-RNA through its RD domain and addresses the concerns of the reviewers.

      (7) Similarly, merely presenting Co-IP data demonstrating the interaction between Miiuy croaker MDA5 and STING in overexpressed EPC cells does not justify the claim that "in vertebrates lacking RIG-I, MDA5 can utilize STING to facilitate signal transduction in the antiviral response". This is because interactions observed through overexpression may not accurately reflect the events occurring during viral infection or their actual antiviral functions. To provide more robust evidence, it is essential to conduct functional experiments after STING knockout (or at least knockdown). Furthermore, it is important to note that Miiuy Croaker alone cannot adequately represent all "vertebrates lacking RIG-I".

      We found that co-expression of STING and MDA5 can enhance MDA5-mediated IFN-1 response during SCRV virus infection, while knocking down STING can restore MDA5-mediated IFN-1 response (Figure 2N and 2O). This indicates that STING plays an important signaling role in the immune response of MDA5 to RNA viruses. In addition, loss of RIG-I is a common phenomenon in vertebrates, and STING of birds such as chickens (doi: 10.4049/jimmunol.1500638) and mammalian tree shrews (doi: 10.1073/pnas.1604939113) can also bind to MDA5, indicating that STING can indeed play a crucial role in MDA5 signaling in species with RIG-I deficiency. We have added this section to our discussion and elaborated on our observations in more cautious language.

      (8) In the manuscript, a series of experiments were conducted using an antibody (Beyotime Cat# AF7164) against endogenous MDA5. The corresponding immunogen for this MDA5 antibody is a recombinant fusion protein containing amino acids 1-205 of human IFIH1/MDA5 (NP_071451.2). However, the amino acid sequences of IFIH1/MDA5 differ substantially between humans and Miiuy croaker, which could introduce errors in the results. Therefore, it is essential to employ antibodies specifically designed for targeting Miiuy croaker's own MDA5 in the experiments.

      As shown in Figure 2B, endogenous MDA5 antibodies can detect the MDA5 portion that is forcibly overexpressed by plasmids, suggesting that the MDA5 antibody can indeed specifically recognize the MDA5 protein of M. miiuy.

      (9) It is recommended to investigate the phosphorylation of IRF3 in order to confirm the downstream signaling pathway during viral infection when MDA5 is knocked down or overexpressed.

      Due to the lack of available phosphorylation antibodies for fish IRF3, we used IRF3 dimer experiments to detect downstream signaling (Figure 1C and 1D, Figure 2P, Figure 4C and 4J).

      (10) The use of poly I:C as a mimic for dsRNA to investigate MDA5's recognition of 5'ppp-RNA in hosts lacking RIG-I, as well as the examination of the regulatory role of MDA5 m6A methylation upon activation by 5'ppp-RNA, may be inappropriate. Poly I:C does not possess 5'ppp, and while it has been identified as a ligand for MDA5 in various studies, MDA5 cannot serve as a substitute for RIG-I in recognizing poly (I:C). Therefore, the authors should utilize 5'ppp-dsRNA as the mimic and include the corresponding 5'ppp-dsRNA control without a 5'triphosphate as the negative control (both available from InvivoGen). This approach will specifically elucidate the mechanisms involved when MDA5 functions similarly to RIG-I in the recognition of 5'ppp-RNA.

      In our study, we used poly(I:C)-HMW, a known dsRNA mimetic that can be preferentially recognized by MDA5 rather than RIG-I, as a positive control for activating MDA5. What we want to demonstrate is that, like poly(I:C)-HMW (positive control), SCRV can also promote MDA5-mediated IFN immunity, further indicating the important role of MDA5 in 5’ppp-RNA virus invasion. We have clearly labeled the type of poly(I:C) in the figures and legends to avoid misunderstandings for readers.

      (11) In Figure 2, Figure 3, and Figure 6, the appearance of virus plaques is not readily apparent, and it is necessary to replace these images with clearer photographs. It appears that MKC or MPC cells are not appropriate for conducting plaque assays. To accurately assess viral proliferation, the authors should measure key indicators throughout the process, such as the production of positive-strand RNAs (+RNAs), replication intermediates (RF), and transcription of subgenomic RNAs. This approach is preferable to solely measuring the M and G protein genes from the virus genome as positive results can still be observed in contaminated cells.

      As pointed out by the reviewer, we also think that the virus plaque images in Figure 2K and Figure 3D are not clear enough, so we have replaced them with new clear images (Figure 2J and Figure 3D). But we think that other images can clearly display the proliferation of the SCRV virus, so we did not replace them. In addition, the primers we currently use do measure +RNA, so the replication level of the SCRV virus can be accurately evaluated without being affected by virus contamination. Because the regions where the two pairs of primers are located belong to the SCRV-M and SCRV-G protein genes, we label them as SCRV-M and SCRV-G to distinguish between the two pairs of genes. To avoid reader misunderstanding, we have modified the Y-axis label in the figures (Figure 2I and 2K, Figure 3E, Figure 6E and 6O).

      (12) There is a substantial disparity in the molecular size of M. miiuy MDA5 between endogenous and exogenously expressed proteins, as shown in Figure 2A and 2C-D. Please provide clarification.

      Please refer to the response to Reviewer 2's question regarding Figure 4B above.

      (13) The manuscript incorporates the evolutionary perspective, but lacks specific evolutionary analysis. Thus, it is essential to include relevant analysis to comprehend the evolutionary dynamics and positive selection on MDA5 and LGP2 in the absence of RIG-I in Miiuy croaker. This can be achieved through theoretical calculations using appropriate algorithms, such as the branch models and branch-site models based on the maximum-likelihood method implemented in the phylogenetic analysis by maximum likelihood (PAML) package.

      In fact, we have analyzed the molecular evolution of MDA5 and LGP2. Unfortunately, even when analyzing only the MDA5/LGP2 CDS sequences in fish, we found that the topologies of gene trees of MDA5/LGP2 were largely consistent with the species tree. Thus, species with or without RIG-I in the gene trees cannot effectively separate clusters, making it extremely difficult to analyze the molecular evolution of MDA5/LGP2 caused by RIG-I deficiency. Consequently, we gave up this aspect of analysis.

      (14) If the narrative regarding m6A methylation goes beyond the activation of MDA5 through recognition of 5'ppp-RNA and represents a regulatory mechanism for all MDA5 activation events, it is not relevant to the theme of "An arms race under RIG-I loss: 5'ppp-RNA and its alternative recognition receptor MDA5." Therefore, all investigations in this paper should focus solely on events when MDA5 recognizes 5'ppp-RNA. Any data associated with the broader regulatory mechanisms and m6A methylation of MDA5 should be excluded from this manuscript and instead be included in a separate study dedicated to exploring this specific topic.

      Our theme aims to showcase RNA viruses, rather than an interaction between 5'ppp-RNA and host virus receptors, which our current topic cannot accurately express. Therefore, we made two main changes: firstly, we limited the study species to M. miiuy, although some studies on the functional substitution of MDA5 for RIG-I involved birds. Secondly, change “5’ppp-RNA” to “5’ppp-RNA virus”. We believe that the revised title is more in line with our current research contents.

      (15) The running title appears to be hastily done.

      We modified it to “MDA5 recognizes 5’ppp-RNA virus in species lacking RIG-I”.

      (16) There are many descriptions that are not strongly related to the main theme of the article in the introduction section, making it lengthy and fragmented. Please focus on the research background of RIG-I and MDA5, including their structures, functions, and regulatory mechanisms, as well as the research progress on the compensatory effect of MDA5 in the absence of RIG-I and its evolutionary adaptation mechanism in other species.

      Based on the suggestions of the reviewers, we have removed some of the less relevant content in the introduction and added research progress on the compensatory effect of MDA5 in the evolutionary adaptation mechanism of tree shrews in the absence of RIG-I.

      (17) Lines 149-156 in the "Results" section include content that resembles an "Introduction" It is important to avoid duplicating information in the results section. Therefore, the authors are encouraged to revise this paragraph to ensure conciseness in the article.

      We have streamlined this section to enhance the article's conciseness and clarity.

      (18) In the "Results" section, at line 177, the authors assert, "As depicted in Figure 1F-1H," which should be corrected to Figure 2F-2H. Furthermore, the y-axis of the two figures on the right-hand side of Figure 2H represents the ISG15 genes. At line 182, "as demonstrated in Figure 1I-1L," should be revised as "as illustrated in Figure 2I-2L". The authors demonstrated a lack of attention to detail.

      Thank you to the reviewer for pointing out our errors, and we have made the necessary corrections.

      (19) In lines 197-198, the authors stated that "MDA5-ΔRD showed an inability to interact with SCRV." However, Figure 3D did not reveal any significant difference, thus it is advisable to repeat this experiment at least once.

      We have replaced this virus spot image with a new one (Figure 3D).

      (20) In lines 200-201 of the "2.3 RD domain is required for MDA5 to recognize SCRV" section, the authors report that the expression of antiviral genes was induced by the overexpression of both MDA5 and MDA5-ΔRD, even in the absence of infection (Figure 3F). Why does the expression of antiviral genes increase in the absence of viral RNA stimulation? Please provide a reasonable explanation.

      In the absence of viral infection, overexpression of viral receptor proteins may still transmit erroneous signaling, affecting the body's immunity. We speculate that due to the preservation of the CARD domain by MDA5 and MDA5-ΔRD, they can still induce the expression of antiviral factors without ligands, although this induction effect is much smaller than that of viral infection. However, in order to better demonstrate the function of the RD domain of MDA5 in M. miiuy, we have changed the experimental plan, as shown in the figure 3F. We detected the induction of antiviral factors by overexpression of MDA5 and MDA5-△RD under poly (I:C)-HMW stimulation. This can indicate that the RD domain has a conserved function in the recognition of poly(I:C)-HMW in M. miiuy, and can serve as a positive control for the recognition of SCRV virus invasion by the RD domain of MDA5.

      (21) Please provide the GeneBank accession number of M. miiuy MDA5.

      The GeneBank accession number of M. miiuy MDA5 was added in the section 4.5 plasmids construction.

      (22) The content of lines 228-233 in the "Results" section bears resemblance to that of the "Introduction." To ensure the avoidance of information duplication, it is recommended to remove this paragraph from the results section.

      This section has been streamlined.

      (23) The bands of mmiMDA5 in the 5'ppp-RNA and dsRNA lanes in Figure 4B are weak and almost unobservable. Please replace them with clear images.

      We have rerun this experiment and replaced the images (Figure 4B).

      (24) In Figure 5G and at line 253, there are only results presented for the SCRV infection group, while no results are shown for the control group. This raises the question of why the control group results are missing. It is necessary to provide a reasonable explanation or correction for this issue.

      The "0 h" infection time point of the SCRV virus is the control group, and we have replaced it with a more intuitive image (Figure 5G).

      (25) In Figure 7C, it would be necessary to include the western blot result of YTHDF protein expression in order to verify the efficiency of YTHDF siRNA.

      In fact, we have attempted to detect the endogenous expression of YTHDF protein using available commercial antibodies. Unfortunately, only the YTHDF2 antibody can specifically recognize the endogenous protein expression of YTHDF2 in M. miiuy. In addition, the knockdown effect of si-YTHDF2 has been validated by YTHDF2 antibody (doi: 10.4049/jimmunol.2200618).

      (26) In line 422 of the "4.3 Cell culture and treatment" section, the paragraph raises a question regarding the nature of Miiuy croaker kidney cells (MKCs) and spleen cells (MPCs) - whether they are cell lines or freshly isolated cells (or primary cultures) derived from kidney and spleen tissues. If these cells are indeed cell lines, it is requested to provide detailed information about the sources and properties of the cells (such as whether they are epithelial cells or other mixed cell types) and the generations of propagation. Alternatively, if the cells were freshly isolated or primary cultures obtained from fish, the method for cell isolation should be provided. The source and stability of cells are extremely important for ensuring the repeatability and reliability of experimental outcomes.

      M. miiuy kidney cells (MKCs) and spleen cells (MPCs) are cell lines derived from the kidney and spleen tissues of M. miiuy, with passages ranging from 20 to 40 times. These details have been incorporated into section 4.3.

      (27) There are many inaccurate descriptions in the text, which employ concepts that are too broad. These descriptions need to be narrowed down to specific species or objects. Here are a few examples, along with the necessary revisions. Other similar instances should also be revised accordingly. For instance, in line 119, "fish MDA5" should be changed to "Miiuy croaker MDA5." Similarly, in line 166, "fish MDA5-mediated signaling pathway" should be changed to "Miiuy croaker MDA5-mediated signaling pathway." In line 174, "fish MDA5" should be revised to "Miiuy croaker MDA5." Additionally, in line 185, "antiviral responses of teleost" should be changed to "antiviral responses of Miiuy croaker." In line 197, "interact with SCRV" should be revised to "interact with 5'ppp-RNA of SCRV." In line 337, "loss of RIG-I in the vertebrate" should be modified to "loss of RIG-I in Miiuy croaker and chicken." Similarly, in line 338, "MDA5 of fish" should be changed to "MDA5 of Miiuy croaker." Lastly, in line 348, "RIG-I deficient vertebrates" should be revised to "RIG-I deficient Miichthys miiuy and Gallus gallus."

      Thank you for the reviewer's suggestions. We have made revisions to these inaccurate descriptions and reviewed the entire manuscript to address similar statements with broad concepts.

      (28) Finally, it should be noted that a similar discovery has already been reported in tree shrews (Ling Xu, et al., Proc Natl Acad Sci., 2016, 113(39):10950-10955). This article shares similarities with that research report, therefore it is necessary to discuss in detail the relationship between the two in the discussion and compare and analyze the evolutionary patterns of MDA5 from it.

      Based on the reviewer's suggestions, we have compared the similarities and differences between these two reports during the discussion and analyzed the evolutionary dynamics of MDA5 in these vertebrates lacking RIG-I.

      Minor concerns:

      Thank you to the reviewer for their meticulous examination to our manuscript, we have made revisions to the following suggestions.

      (1) At line 120, the sentence "SCRV(one 5'ppp-RNA virus)" should have a space between "SCRV" and "(one 5'ppp-RNA virus)". Please make this correction.

      Corrected.

      (2) At lines 147-148, the sentence "However, the downstream gene of TOPORSa is missing a RIG-I" is not accurate and needs modification.

      We have modified this sentence.

      (3) At line 184, "findings indicate" should be corrected to "findings indicated".

      Corrected.

      (4) At line 189, "a 5'ppp-RNA virus" should be deleted and the text seems redundant.

      Deleted.

      (5) At line 198, "replication. (Figure 3C-3E)", please remove the punctuation between "replication" and "(Figure 3C-3E)".

      Corrected.

      (6) At line 416 in "Materials and methods" section, "4.2 Sample and challenge" should be corrected to "4.2 Fish and challenge".

      Corrected.

      (7) At line 419, the authors state that "The experimental procedure for SCRV infection was performed as described", please briefly describe the SCRV infection method and the infectious dose.

      Based on the reviewer's suggestions, we have added relevant descriptions of SCRV infection in section 4.2.

      (8) There are several formatting issues in the "Materials and Methods" section. For instance, in line 424, there is no space between the number and letter in "100 μg/ml" and "26 ℃" should be corrected to "26℃". Additionally, in line 430, "Cells" should be corrected to "cells".

      Corrected.

      (9) At line 446, "50 ng/ul" and "100 mU/ul" should be corrected to "50 ng/μl" and "100 mU/μl".

      Corrected.

      (10) At line 459, "primers 1)" should be corrected to "primers".

      Corrected.

      (11) At lines 461-464, the description "For protein purification, MDA5 plasmids with 6× His tag was constructed based on pcDNA3" seems to be no direct logical connection between protein purification and the plasmid construction. Please make the necessary corrections.

      Corrected.

      (12) At line 548, "cytoplasmic" should be corrected to "Cytoplasmic".

      Corrected.

      (13) At line 549, "5× 107" should be corrected to "5 × 107".

      Corrected.

      (14) At line 557, "MgCl2" should be corrected to "MgCl2".

      Corrected.

      (15) At line 558, "6 %" should be corrected to "6%".

      Corrected.

      (16) At line 565, "50μg" should be corrected to "50 μg".

      Corrected.

      (17) At line 571, "300{plus minus}50 bp." should be corrected to "300 {plus minus} 50 bp."

      Corrected.

      (18) At lines 592-593, the sentence "After several incubations, the m6A level was quantified colorimetrically at a wavelength of 450 nm" does not read smoothly, please improve it.

      Revised.

      (19) At line 786, "MDA5 recognize" should be corrected to "MDA5 recognized".

      Corrected.

      (20) At lines 788 and 798, "Pulldown" should be corrected to "Pull-down".

      Corrected.

      (21) At lines 790 and 796, "bluestaining" should be corrected to "blue staining".

      Deleted.

      (22) At line 825, "SCRV and infection" should be corrected to "SCRV infection".

      Corrected.

      (23) At lines 826-827, "SCRV (H) and poly(I:C) (I) infection" should be corrected to "SCRV infection (H) and poly(I:C) stimulation (I)".

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful for the comments and suggestions from the Editor and Reviewers about our manuscript submitted to the eLife Journal. We have addressed all the comments, and we think these modifications will help bring clarity to our message and be helpful to your readership. Here we include an outline of the corrections performed, as well as a detailed response to each of the reviewer’s comments.

      As per the Editor and Reviewers suggestions, outline of corrections:

      ·        The title of the manuscript has been changed to reflect a more conservative conclusion.

      ·        Changes in the main manuscript text were made to enhance clarity, including the use genetic terminology and naming.

      ·        Specific responses to some comments from the reviewers are included in this document. We combined some comments that would be better addressed together.

      Accompanied to this letter is an updated version of our manuscript with the track changes feature enabled. Again, we are thankful of the comments and suggestions we received, and we hope this revised version of our manuscript will be accompanied by an updated assessment and public reviews and a final eLife Version of Record.

      Response to the public review and minor recommendations.

      From Reviewer #1:

      The major inference of the work is that SIV infection of gorillas drove the observed diversity in gorilla CD4. This is supported by the majority of SNPs being localized to the CD4 D1, which directly interacts with the envelope, and the demonstrated functional consequences of that diversity for viral entry. However, SIVgor (to the best of my knowledge) only infects Western lowland gorillas (Gorilla gorilla gorilla), and one Gorilla gorilla diehli and three Gorilla beringei graueri individuals were included in the haplotype and allele frequency analyses. The presence of these haplotypes or the presence of similar allele frequencies in Eastern lowland and mountain gorillas would impact this conclusion. It would be helpful for the authors to clarify this point.

      From Reviewer #1 (minor comment):

      Which subspecies of gorilla are the nsSNPs coming from? Gorilla gorilla diehli [n =1]; Gorilla beringei graueri [n = 3]) are not extant reservoirs of SIV and to my knowledge are not thought to have been, and so it's important to point out where the diversity is coming from if the authors are asserting that SIVgor drove this population-level diversity in gorilla CD4.

      We initially included genomic data from all the gorilla individuals available to maximize sensitivity to identify allelic variants. Although evidence points to eastern gorillas not being currently infected with SIV, our results show that all allelic variants identified have differential susceptibility to the HIV-1 and SIVcpz strains tested. The allelic variants we identified with this genomic data set match the variants identified by Russell et al (doi.org/10.1073/pnas.2025914118), including the ones found in eastern gorillas, and recapitulate that those variants have differential susceptibility to lentiviral entry, similar to the variants of western populations. Whether eastern gorillas have been exposed to lentiviruses in the past remains unknown.

      From Reviewer #1:

      The authors appear to use a somewhat atypical approach to assess intra-population selection to compensate for relatively small numbers of NHP sequences (Fig. 6). However, they do not cite precedence for the robustness of the approach or the practice of grouping sequences from multiple species for the endemic vs other comparison. They also state in the methods that some genes encoded in the locus were removed from the analysis "because they have previously been shown to directly interact with a viral protein." This seems to undercut the analysis and prevents alternative explanations for the observed diversity in CD4 (e.g., passenger mutations from selection at a neighboring locus).

      Given the nature of our samples, to detect any influence of natural selection acting on CD4, we chose to compare patterns of molecular evolution of CD4 to its neighboring loci. Comparisons of molecular evolution signatures across genomic regions are the basis of methods to detect positive selection (e.g., Sabeti DOI: 10.1038/nature01140). For our comparison, the neighboring loci represent our neutral standard for the genomic region CD4 resides. Our rationale is that demographic and neutral influences on the number and frequency of polymorphic sites in a region would equally affect all loci in a genomic region. Because these neighboring loci are our neutral benchmark, we excluded before analysis other genes in this genomic region that interact with viruses. The logic is that these loci may be evolving under the influence of positive selection and would decrease the power of our comparison. None of the excluded loci are direct neighbors to CD4. This, and given that the CD4 genomic region in humans is of average recombination rate, dampens the possibility that what we are observing at CD4 is due to selection acting at a neighboring locus. In addition, the classic population genetic method to detect positive selection, the McDonald-Kreitman test (McDonald DOI: 10.1038/351652a0), was originally presented combining polymorphism data across species. We assume that any effect on levels of diversity created by combining variability between species would equally affect all loci included in the study, not just CD4.

      From Reviewer #1:

      Data in Figure 5 is graphed as % infected cells instead of virus titer (TDU/mL). It's unclear why this is the case, and prevents a comparison to data in Figure 2 and Figure 4.

      From Reviewer #1 (minor comment):

      Figure 5: the data presentation is now shown as % infected cells instead of viral titer. This makes it difficult to compare data from Figure 5 to other figures. Can the authors please either justify this change, display data consistently or provide matched data displays as a Supplemental Figure?

      For the experiments presented in figures 2 and 4 we used different volumes of infecting pseudoviruses, which allowed us to identify the linear range of infection. Then, based on the number of cells plated per experimental replicate, we calculated a virus titer. In follow-up experiments (Fig. 5), we used fixed volumes of virus that would infect ~10-20% of control (wild-type; wt) CD4-expressing cells. Comparisons were then made between wt and mutated CD4s, and these data are best presented in their raw forms as percent cells infected.  Although this change in method prevents direct comparison between the figures, we focused on the differences observed between the experimental conditions per experimental panel.

      From Reviewer #1:

      The lack of pseudotyping with SIVgor envelope is a surprising omission from this study, that would help to contextualize the findings.

      From Reviewer #2 (minor comment):

      The inclusion of HIV-1 but not SIVgor strains in Figures 2D/E is somewhat conspicuous since chimpanzee alleles certainly differ in susceptibility to SIVcpz (and SIVgor) strains per Russell et al. 2021. The authors should either test some SIVgor infections, cite published data on at least extant human/chimpanzee/gorilla CD4 susceptibility to SIVgor, or address why they did not include it.

      We agree the data of host susceptibility to SIVgor strains would have been an interesting question to explore. However, we opted to focus on the transmission of SIVcpz strains into gorilla populations for this study. It is worth mentioning that we have cloned SIVgor envelope genes from some strains into our expression system, but we were unable to recover infectious pseudoviruses using an HIV-1DEnv-GFP backbone. This suggests that HIV-1 may be incompatible with incorporating SIVgor Env into virus particles. Recently, Russell et al (DOI: 10.1073/pnas.2025914118) managed to generate SIVgor Env pseudotyped virions using a different backbone (SIVcpzDEnv-GFP) that was unavailable to us at the time of this study.

      From Reviewer #1:

      Similarly, building gorilla CD4 haplotype SNPs onto the hominin ancestor (as opposed to extant human CD4) may provide additional insights that are meaningful toward understanding the evolutionary trajectory of gorilla CD4.

      We decided to use the extant human CD4 as a backbone to test the effects on the individual amino acid variants found in the allelic diversity of the gorilla population since the human protein is highly susceptible to all the HIV-1 and SIV strains tested, and the expected phenotype is a loss-of-function. Since the D1 of the human and ancestral sequences for CD4 are almost identical (except for a change that is fixed in gorillas), and they showed similar levels of susceptibility to lentivirus entry, we expect that the phenotypes found would be the same if the gorilla SNPs were built into the ancestral CD4 backbone.

      From Reviewer #2:

      To bolster the argument that lentiviruses are indeed the causative driver of this diversification, which seems likely from a logical perspective but is difficult to prove, Warren et al. pursue two novel lines of evidence. First, the authors reconstruct ancestral CD4 genes that predate lentiviral infection of hominid populations. They then demonstrate that resistance to lentiviral infection is a derived trait in chimpanzees and gorillas, which have been co-evolving with endemic lentiviruses, but not in humans, which only recently acquired HIV. Nevertheless, the derived resistance could be stochastic or due to drift. This argument would be strengthened by demonstrating that bonobo and orangutan CD4, which also do not have endemic lentiviruses, resemble the ancestral and human susceptibility to great-ape-infecting lentiviruses.

      From Reviewer #2 (minor comment):

      The data presented in Figure 2, showing that chimp and gorilla (but not human) CD4 resistance to lentiviral infection is a derived trait, is very intriguing for suggesting that endemic lentiviruses are the causative driver of CD4 evolution. Nevertheless, this could be stochastic or due to genetic drift. Given the later emphasis on several other non-endemically infected species, the authors should at the very least include the sequences for bonobo and orangutan CD4 in the presented alignment (Fig 2B). Ideally, they would also test these orthologs to demonstrate that they are not resistant to lentiviruses infecting great apes (SIVcpz / HIV-1 / SIVgor). If they have also derived resistance, this would suggest a possible other evolutionary driver or genetic drift.

      Based on our analysis on polymorphic sites using available data from populations of apes, we strongly believe the accumulation of resistant polymorphisms in CD4 did not arise in a stochastic manner. The frequency and accumulation of these changes strongly correlate with the function of CD4 as a receptor for lentivirus entry. We agree that experimentally testing the CD4 protein from bonobo and orangutan would strengthen our conclusions; however, based on our genomic analyses, we decided to focus on the species that would present a higher level of variability of susceptibility to the lentivirus tested, namely gorillas and chimpanzees.

      From Reviewer #2:

      Warren et al. provide a population genetic argument that only endemically infected primates exhibit diversifying selection, again arguing for endemic lentiviruses being the evolutionary driver. The authors compare SNP occurrence in CD4 to neighboring genes, demonstrating that non-synonymous SNP frequency is only elevated in endemically infected species. Moreover, these amino-acid-coding changes are significantly concentrated in the CD4 domain that binds the lentiviral envelope. This is a creative analysis to overcome the problem of very small sample sizes, with very few great ape individuals sequenced. The additional small number of species compared (2-3 in each group) also limits the power of the analysis; the authors could consider expanding their analysis to Old World Monkey species that do or do not have endemic lentiviruses, as well as great apes.

      The scope of this project was to evaluate the differential phenotype of the accumulated polymorphisms found in the ape branch of the primates. Although evaluating the accumulation of polymorphisms in a broader range of primates would generate interesting observations, this would likely require increasing the total number of primate species to include sampling along the speciation tree, many of which lack population level data.

      From Reviewer #1 (minor comment):

      Ancestral reconstruction methods and associated data tables should be included to indicate statistical support for assigned codons. A comment on ambiguity at relevant positions is needed. Similarly, given the polymorphic nature of gorilla and chimpanzee CD4, how confident are the authors in their ancestral reconstructions based on a single representative genome per species? Does this change when you include the broader panel of gorilla sequences? Is the ancestral reconstruction robust to other methods besides PAML?

      We used the PAML software package to reconstruct the ancestral hominin and hominid sequence of CD4 because it is a standard and well recognized method for this purpose. For this analysis, we used the set of primate sequences selected for positive selection analyses (see methods), namely the longest isoform sequences for each of the available species that best aligned with human CD4. We feel that the best way to perform to the ancestral state reconstruction was to use only these curated sequences instead of the population level sequences, removing potential biases introduced by having different numbers of variants per species. 

      From Reviewer #1 (minor comment):

      Page 10: "It seems that allele 2, which doesn't have this glycan, would be at a fitness disadvantage. In support of this, allele 2 is one of the least frequent alleles in the gorilla population that we surveyed (Figure 3B)." - this inference depends on the gorilla species that encode allele 2 and allele frequencies. There are statistical tests to address this inference.

      Population genetic statistics that test for skews in sample allele frequencies are not appropriate here due to the nature of the samples in this study. However, the reviewer is correct that our inference in allele frequency is dependent on the gorilla species that we find this allele in. Allele 2 is found in the Gorilla beringei graueri subspecies of gorilla included in this study.  We only have data for three individuals (six alleles) from this subspecies compared to 51 individual (102 alleles) from Gorilla gorilla gorilla. As such, genetic subdivision between the gorilla subspecies could also produce the low frequency of allele 2 observed in our sample.

      From Reviewer #1 (minor comment):

      Page 11: "These results imply that the resistance to SIVcpz found in gorilla individuals is not dependent on single amino acids, but rather the cumulative effect of multiple SNPs." Would it be more relevant (or relevant in other ways) to test this statement by putting those mutations into the hominid ancestor? Testing individual residues in the context of human CD4 may be subject to epistasis or several other factors.

      We agree that constructing multiple of the resistant SNPs in the susceptible human background would have strengthened our hypothesis, as all these amino acid changes are associated with increased resistance to at least one of the lentiviruses tested. However, the number of CD4 variants to test would increase significantly and we feel that this approach was out of the scope of this manuscript.

      From Reviewer #1 (minor comment):

      Figure 6: If you perform this analysis on chimpanzee CD4 alone do you get the same result? Just gorillas? If you remove eastern/mountain gorillas? The very small numbers of non-human non-SIV-reservoir great apes may preclude a strong conclusion.

      We agree that our study is limited by the small number of available sequences from individuals of the studied species. If we remove a whole species or subspecies the statistical power would be greatly reduced. Removing all chimpanzees or gorillas (or a subspecies) would still show that only each of those species accumulate SNPs in the D1 region of CD4, although with less statistical significance.

      From Reviewer #2 (minor comment):

      Related to Figure 2: It would strengthen the argument that resistance is a derived trait if the authors mapped the causative mutations from gorilla CD4 onto the ancestral hominin CD4. However, this experiment is not particularly critical, merely a suggestion.

      We appreciate this suggestion. We decided to use the human CD4 backbone as it is widely susceptible to lentiviral entry. The hominid and hominin ancestral sequences are almost identical to the human sequence in domain 1, except for a fixed mutation shared with the gorilla CD4. We expect that the SNPs observed in the gorilla population would also reduce susceptibility to lentivirus entry in the ancestral CD4 reconstructions.

      From Reviewer #2 (minor comment):

      Related to Figure 3B: It is difficult to make much of the allele frequency for 8 alleles in 32 individuals. Can the authors collate this with allele frequency for the referenced 100 individuals from Russell et al. 2021, to give a better sense of population frequency? This may allow the authors to better correlate allele frequency with SIVcpz resistance patterns in Figure 4, strengthening their argument that more resistant alleles should be over-represented in the population.

      At the time of our analysis the data from Russell (DOI: 10.1073/pnas.2025914118) was not available to collate or compare. When that data became available, we immediately compared the existence of the alleles found and confirmed that the ones we found were also detected in the samples used in that study.

      From Reviewer #2 (minor comment):

      Related to Figure 6: As written, several methodological details should be clarified. How were human genomes selected to limit the sample size to 50?

      We selected a total of 50 human individuals in order to size-match the sample size of the largest group in Fig 6B (chimpanzee, n=50). We randomly selected 10 individuals for each of the 5 superpopulations [Africans (AFR), Admixed Americans (AMR), East Asians (EAS), Europeans (EUR) and South Asians (SAS)] defined by the 1000 Genome Project.

      From Reviewer #2 (minor comment):

      Related to Figure 6: What comparison is being reported for the Mann-Whitney U test (CD4 vs. which gene)? Are the means shown in A an average of 2 (endemic) or 3 (non-endemic) species - if so, the authors should show the individual data points to give a clearer depiction of the data spread. In addition, it is not clear that a statistical test with sample sizes of 2 is meaningful, since Mann Whitney typically assumes n > 5. To strengthen this statistical argument, it may be necessary to include additional species that have (a) multiple genomes (or at least this locus) sequenced, and (b) have or lack lentiviral sequences. This may necessitate expanding the analysis to include Old World Monkeys (e.g. Rhesus Macaque Genome Project).

      In the Figure 6 we use the Mann-Whitney U test to compare variation between CD4 and the neighboring loci. The average and SEM are for two endemic and four non-endemic species (two orangutan datasets are from two distinct species vs the gorilla subspecies). It is true our sample size is small for any statistical testing. For the Mann-Whitney U-test it is generally preferred to have n > 5 in each group. So, we do run into problems with the endemically infected comparisons as we only have two data points (chimpanzee and gorilla) for the CD4 group. For the uninfected species, CD4 has four data points.

      From Reviewer #1 (minor comment):

      Page 6. "This suggests that the ancestral versions of CD4 in apes were susceptible to primate lentivirus entry" - The data show that tested virus pseudotyped with SIV/HIV envs can engage ancestral CD4 in the context of a canine cell line expressing human CCR5, but not necessarily that this interaction was sufficient for the process of entry per se, especially in the context of a gorilla (or hominid) cell. Some additional context would be useful for a broad readership.

      From Reviewer #1 (minor comment):

      Page 6: "but that selective pressures exerted by SIVs in the chimpanzee and gorilla lineages have led to the retention of mutations that confer resistance to primate lentivirus infection. This has not happened in humans where selective pressure by HIV-1 is too new" - this cannot be concluded from the data in Figure 1. It would be more appropriate as a Discussion point.

      From Reviewer #1 (minor comment):

      Page 14: "Natural tolerance is often required before a virus can establish itself long term in a host reservoir, and thus understanding it is key to understanding virus reservoirs in nature" - please provide a reference. This is one among several theories of long-term host-virus evolution dynamics/outcomes, and further discussion may benefit the broad readership of eLife.

      From Reviewer #1 (minor comment):

      Page 15: "There is a surprising outcome of virus-driven host evolution in that the divergence and diversity of these host genes ultimately comes at a detriment to the very viruses that drove this evolution." - it is not clear to this reviewer why this is surprising.

      From Reviewer #2 (minor comment):

      Related to Figure 5A: The authors suggest that the gorilla glycosylation site provides resistance to SIVcpz, based on TAN1.910, but in fact the glycosylated allele is no more resistant than the un-glycosylated allele to most SIVcpz strains (in Figure 4). The authors should acknowledge this more clearly in the text.

      From Reviewer #2 (minor comment):

      The title of this article (that infection "has driven selection") is somewhat overstated - though it seems very likely that lentiviruses are driving CD4 diversification, this is difficult to prove. The arguments presented here rely on very few data points: modern chimp and gorilla compared to ancestral CD4, and a population genetic analysis relying on 2 or 3 species with 10-50 individuals each. The authors should either bolster these arguments (see the above suggestions) and/or soften the claim in the title.

      Modifications to the main text of the manuscript have been made to enhance clarity on the subjects stated above.

    1. Author response:

      eLife assessment

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control. 

      We thank the editors and reviewers for the constructive comments. In light of the questions mentioned by the reviewers, we plan to perform additional analyses in our revision, particularly aiming to address issues related to single-cell scalability, and effects of motivation and movement. We believe these additional data will greatly improve the rigor and clarity of our study. We are grateful for the review process of eLife.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity. 

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. 

      We thank the reviewer for the comment. We would like to mention that the 18 cells plotted in Supplementary figure 1 were only from the duration cell category. To improve the clarity of our results, we are going to provide information regarding the number of cells from each rat in our revision. In general, we imaged more than 50 cells from each rat. We would also like to point to the data from individual trials in Supplementary figure 1B showing robust sequentiality.

      In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials.

      We thank the reviewer for the suggestions. We are going to conduct the analysis as the reviewer recommended. We agree with the reviewer that better presentation of the neural activity will be helpful for the readers.

      In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors. 

      We would like to mention that the prediction errors plotted in this graph were calculated from two types of trials. The correct trials tended to show positive time estimation errors while the incorrect trials showed negative time estimation errors. We believe that the polarity switch between these two types suggested a possible use of this neural mechanism to time the action of the rats.

      In addition, we are going to perform the analysis suggested by the reviewer in our revision. We agree that different ways of analyzing the data would provide better characterization of the scaling effect.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions." 

      We agree with the reviewer and we have mentioned this caveat in our original manuscript. We are going to rephrase the sentence as the reviewer suggested during our revision.

      Reviewer #2 (Public Review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions. 

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues. 

      Main Concerns 

      (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available). 

      We would like to respond to the reviewer’s comments 1, 2 and 4 together since they all focus on the same issue. We thank the reviewer for the very thoughtful comments and for sharing his detailed reasoning from a recently published review (Robbe, 2023). A lot of the discussion goes beyond the scope of this study and we agree that whether there is an explicit representation of time (an internal clock) in the brain is a difficult question to answer, particularly by using animal behaviors. In fact, even with fully conscious humans and elaborated task design, we think it is still questionable to clearly dissociate the neural substrate of “timing” from “motor”. In the end, it may as well be that as the reviewer cited from Bergson’s article, the experience of time cannot be measured.

      Studying the neural representation of any internal state may suffer from the same ambiguity. With all due respect, however, we would like to limit our response in the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist: 1, duration cells may represent fidgeting or orofacial movements and 2, duration cells may represent motivation or motion plan of the rats. To test the first alternative interpretation, we will perform a more comprehensive analysis of the behavior data at all the limbs and visible body parts of the rat during nose poke and analyze its periodicity among different trials, although the orofacial movements may not be visible to us.

      Regarding the second alternative interpretation, we think our data in the original Figure 4G argues against it. In this graph, we plotted the decoding error of time using the duration cells’ activity against the actual duration of the trials. If the sequential activity of durations cells only represents motivation, then the errors should distribute evenly across different trial times, or linearly modulated by trial durations. The unimodal distribution we observed (Figure 4G and see Author response image 1 below for a re-plot without signs) suggests that the scaling factor of the sequential activity represents information related to time. And the fact that this unimodal distribution centered at the time threshold of the task provides strong evidence for the active use of scaling factor for time estimation. In order to further test the relationship to motivation, we will measure the time interval between exiting nose poke to the start of licking water reward as an independent measurement of motivation for each trial. We will analyze and report whether this measurement correlates with the nose poking durations in our data in the revision.

      Author response image 1.

      Furthermore, whether the scaling sequential activity we report represents behavioral timing or true time estimation, the reviewer would agree that these activities correlate with the animal’s nose poking durations, and a previous study has showed that PFC silencing led to disruption of the mouse’s timing behavior (PMID: 24367075). The main surprising finding of the paper is that these duration cells are different from the start and end cells in terms of their coding stability. Thus, future studies dissecting the anatomical microcircuit of these duration cells may provide further clue regarding whether they receive inputs from thirst or reward-related brain regions. This may help partially resolve the “time” vs. “motor” debate the reviewer mentioned.

      (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (3)The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      We thank the reviewer for the comment, yet we are not quite sure what specifically was asked by the reviewer. There is undoubtedly variance among individual animals. One of the core reasons for statistical comparison is to compare the group difference with the variance due to sampling. It appears that the reviewer would like to require we conduct our analysis using each rat individually. We will conduct and report analysis with individual rat in Figure 1C, Figure 2C, G, K, Figure 4F in our revised manuscript.

      (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke. 

      (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task. 

      We thank the reviewer for the suggestion. We have originally mentioned the caveat of calcium imaging in the interpretation of our results. We will incorporate more texts for this purpose during our revision. In terms of behavioral dynamics (start and end of nose poke in this case), we think calcium imaging could provide sufficient kinetics. However, the more refined dynamics related to the reproducibility of the sequential activity or the precise representation of individual cells on the scaled duration may be benefited from improved time resolution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please refer explicitly to the three types of cells in the abstract. 

      We will modify the abstract as suggested during revision.

      (2) Please refer to the work of Betancourt et al., 2023 Cell Reports, where a trial-by-trail analysis on the correlation between neural trajectory dynamics in MPC and timing behavior is reported. In that same paper the stability of neural sequences across task parameters is reported. 

      We will cite and discuss this study in our revised paper.

      (3) Please state the number of studied animals at the beginning of the results section. 

      We will provide this information as requested. The number of animals were also plotted in Figure 1D for each analysis.

      (4) Why do the middle and right panels of Figure 2E show duration cells. 

      Figure 2E was intended to show examples of duration cells’ activity. We included different examples of cells that peak at different points in the scaled duration. We believe these multiple examples would give the readers a straight forward impression of these cells’ activity patterns.

      (5) Which behavioral sessions of Figure 1B were analyzed further. 

      We will label the analyzed sessions in Figure 1B during our revision.

      (6) In Figure 3A-C please increase the time before the beginning of the trial in order to visualize properly the activation patterns of the start cells. 

      We thank the reviewer for the suggestion and will modify the figure accordingly during revision.

      (7) Please state what could be the behavioral and functional effect of the ablation of the cortical tissue on top of mPFC. 

      We thank the reviewer for the question. In our experience, mice with lens implanted in mPFC did not show observable different to mice without surgery regarding the acquisition of the task and the distribution of the nose-poke durations. Although we could not rule out the effect on other cognitive process, the mice appeared to be intact in the scope of our task. We will provide these behavior data during our revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Lines 40-42: The sentence "The coupling of structural connectome (SC) and functional connectome (FC) varies greatly across different cortical regions reflecting anatomical and functional hierarchies as well as individual differences in cognitive function, and is regulated by genes" is a misstatement. Regional variations of structure-function coupling do not really reflect differences in cognitive function among individuals, but inter-subject variations do.

      Thank you for your comment. We have made revisions to the sentence to correct its misstatement. Please see lines 40-43: “The coupling of structural connectome (SC) and functional connectome (FC) varies greatly across different cortical regions reflecting anatomical and functional hierarchies[1, 6-9] and is regulated by genes[6, 8], as well as its individual differences relates to cognitive function[8, 9].”

      (2) In Figure 1, the graph showing the relation between intensity and cortical depth needs explanation.

      Thank you for your comment. We have added necessary explanation, please see lines 133-134: “The MPC was used to map similarity networks of intracortical microstructure (voxel intensity sampled in different cortical depth) for each cortical node.”

      (3) Line 167: Change "increased" to "increase".

      We have corrected it, please see lines 173-174: “…networks significantly increased with age and exhibited greater increase.”

      (4) Line 195: Remove "were".

      We have corrected it, please see line 204: “…default mode networks significantly contributed to the prediction…”

      (5) Lines 233-240, Reproducibility analyses: Comparisons of parcellation templates were not made with respect to gene weights. Is there any particular reason?

      Thank you for your comment. We have quantified the gene weights based on HCPMMP using the same procedures. We identified a correlation (r \= 0.25, p<0.001) between the gene weights in HCPMMP and BNA. Given that this is a relatively weak correlation, we need to clarify the following points.

      Based on HCPMMP, we produced an averaged gene expression profile for 10,027 genes covering 176 left cortical regions[1]. The excluding 4 cortical regions that had an insufficient number of assigned samples may lead to different templates having a relatively weak correlation of gene associations. Moreover, the effect of different template resolutions on the results of human connectome-transcriptome association is still unclear.

      In brain connectome analysis, the choice of parcellation templates can indeed influence the subsequent findings to some extent. A methodological study[2] provided referenced correlations about 0.4~0.6 for white matter connectivity and 0.2~0.4 for white matter nodal property between two templates (refer to Figure 4 and 5 in [2]). Therefore, the age-related coupling changes as a downstream analysis was calculated using multimodal connectome and correlated with gene expression profiles, which may be influenced by the choice of templates. 

      We have further supplemented gene weights results obtained from HCPMMP to explicitly clarify the dependency of parcellation templates.

      Please see lines 251-252: “The gene weights of HCPMMP was consistent with that of BNA (r = 0.25, p < 0.001).”

      Author response image 1.

      The consistency of gene weights between HCPMMP and BNA.

      Please see lines 601-604: “Finally, we produced an averaged gene expression profile for 10,027 genes covering 176 left cortical regions based on HCPMMP and obtained the gene weights by PLS analysis. We performed Pearson's correlation analyses to assess the consistency of gene weights between HCPMMP and BNA.”

      Reviewer #2 (Recommendations For The Authors):

      Your paper is interesting to read and I found your efforts to evaluate the robustness of the results of different parcellation strategies and tractography methods very valuable. The work is globally easy to navigate and well written with informative good-quality figures, although I think some additional clarifications will be useful to improve readability. My suggestions and questions are detailed below (I aimed to group them by topic which did not always succeed so apologies if the comments are difficult to navigate, but I hope they will be useful for reflection and to incorporate in your work).

      * L34: 'developmental disorder'

      ** As far as I understand, the subjects in HCP-D are mostly healthy (L87). Thus, while your study provides interesting insights into typical brain development, I wonder if references to 'disorder' might be premature. In the future, it would be interesting to extend your approach to the atypical populations. In any case, it would be extremely helpful and appreciated if you included a figure visualising the distribution of behavioural scores within your population and in relationship to age at scan for your subjects (and to include a more detailed description of the assessment in the methods section) given that large part of your paper focuses on their prediction using coupling inputs (especially given a large drop of predictive performance after age correction). Such figures would allow the reader to better understand the cognitive variability within your data, but also potential age relationships, and generally give a better overview of your cohort.

      We agree with your comment that references to 'disorder' is premature. We have made revisions in abstract and conclusion. 

      Please see lines 33-34: “This study offers insight into the maturational principles of SC-FC coupling in typical development.”

      Please see lines 395-396: “Further investigations are needed to fully explore the clinical implications of SC-FC coupling for a range of developmental disorders.”

      In addition, we have included a more detailed description of the cognitive scores in the methods section and provided a figure to visualize the distributions of cognitive scores and in relationship to age for subjects. Please see lines 407-413: “Cognitive scores. We included 11 cognitive scores which were assessed with the National Institutes of Health (NIH) Toolbox Cognition Battery (https://www.healthmeasures.net/exploremeasurement-systems/nih-toolbox), including episodic memory, executive function/cognitive flexibility, executive function/inhibition, language/reading decoding, processing speed, language/vocabulary comprehension, working memory, fluid intelligence composite score, crystal intelligence composite score, early child intelligence composite score and total intelligence composite score. Distributions of these cognitive scores and their relationship with age are illustrated in Figure S12.”

      Author response image 2.

      Cognitive scores and age distributions of scans.

      * SC-FC coupling

      ** L162: 'Regarding functional subnetworks, SC-FC coupling increased disproportionately with age (Figure 3C)'.

      *** As far as I understand, in Figure 3C, the points are the correlation with age for a given ROI within the subnetwork. Is this correct? If yes, I am not sure how this shows a disproportionate increase in coupling. It seems that there is great variability of SC-FC correlation with age across regions within subnetworks, more so than the differences between networks. This would suggest that the coupling with age is regionally dependent rather than network-dependent? Maybe you could clarify?

      The points are the correlation with age for a given ROI within the subnetwork in Figure 3C. We have revised the description, please see lines 168-174: “Age correlation coefficients distributed within functional subnetworks were shown in Figure 3C. Regarding mean SC-FC coupling within functional subnetworks, the somatomotor (𝛽𝑎𝑔𝑒\=2.39E-03, F=4.73, p\=3.10E-06, r\=0.25, p\=1.67E07, Figure 3E), dorsal attention (𝛽𝑎𝑔𝑒\=1.40E-03, F=4.63, p\=4.86E-06, r\=0.24, p\=2.91E-07, Figure 3F), frontoparietal (𝛽𝑎𝑔𝑒 =2.11E-03, F=6.46, p\=2.80E-10, r\=0.33, p\=1.64E-12, Figure 3I) and default mode (𝛽𝑎𝑔𝑒 =9.71E-04, F=2.90, p\=3.94E-03, r\=0.15, p\=1.19E-03, Figure 3J) networks significantly increased with age and exhibited greater increase.” In addition, we agree with your comment that the coupling with age is more likely region-dependent than network-dependent. We have added the description, please see lines 329-332: “We also found the SC-FC coupling with age across regions within subnetworks has more variability than the differences between networks, suggesting that the coupling with age is more likely region-dependent than network-dependent.” This is why our subsequent analysis focused on regional coupling.  

      *** Additionally, we see from Figure 3C that regions within networks have very different changes with age. Given this variability (especially in the subnetworks where you show both positive and negative correlations with age for specific ROIs (i.e. all of them)), does it make sense then to show mean coupling over regions within the subnetworks which erases the differences in coupling with age relationships across regions (Figures 3D-J)?

      Considering the interest and interpretation for SC-FC coupling, showing the mean coupling at subnetwork scales with age correlation is needed, although this eliminates variability at regional scale. These results at different scales confirmed that coupling changes with age at this age group are mainly increased.

      *** Also, I think it would be interesting to show correlation coefficients across all regions, not only the significant ones (3B). Is there a spatially related tendency of increases/decreases (rather than a 'network' relationship)? Would it be interesting to show a similar figure to Figure S7 instead of only the significant regions?

      As your comment, we have supplemented the graph which shows correlation coefficients across all regions into Figure 3B. Similarly, we supplemented to the other figures (Figure S3-S6).

      Author response image 3.

      Aged-related changes in SC-FC coupling. (A) Increases in whole-brain coupling with age. (B) Correlation of age with SC-FC coupling across all regions and significant regions (p<0.05, FDR corrected). (C) Comparisons of age-related changes in SC-FC coupling among functional networks. The boxes show the median and interquartile range (IQR; 25–75%), and the whiskers depict 1.5× IQR from the first or third quartile. (D-J) Correlation of age with SC-FC coupling across the VIS, SM, DA, VA, LIM, FP and DM. VIS, visual network; SM, somatomotor network; DA, dorsal attention network; VA, ventral attention network; LIM, limbic network; FP, frontoparietal network; DM, default mode network.

      *** For the quantification of MPC.

      **** L421: you reconstructed 14 cortical surfaces from the wm to pial surface. If we take the max thickness of the cortex to be 4.5mm (Fischl & Dale, 2000), the sampling is above the resolution of your anatomical images (0.8mm). Could you expand on what the interest is in sampling such a higher number of surfaces given that the resolution is not enough to provide additional information?

      The surface reconstruction was based on state-of-the-art equivolumetric surface construction techniques[3] which provides a simplified recapitulation of cellular changes across the putative laminar structure of the cortex. By referencing a 100-μm resolution Merkerstained 3D histological reconstruction of an entire post mortem human brain (BigBrain: https://bigbrain.loris.ca/main.php), a methodological study[4] systematically evaluated MPC stability with four to 30 intracortical surfaces when the resolution of anatomical image was 0.7 mm, and selected 14 surfaces as the most stable solution. Importantly, it has been proved the in vivo approach can serve as a lower resolution yet biologically meaningful extension of the histological work[4]. 

      **** L424: did you aggregate intensities over regions using mean/median or other statistics?

      It might be useful to specify.

      Thank you for your careful comment. We have revised the description in lines 446-447: “We averaged the intensity profiles of vertices over 210 cortical regions according to the BNA”.

      **** L426: personal curiosity, why did you decide to remove the negative correlation of the intensity profiles from the MPC? Although this is a common practice in functional analyses (where the interpretation of negatives is debated), within the context of cortical correlations, the negative values might be interesting and informative on the level of microstructural relationships across regions (if you want to remove negative signs it might be worth taking their absolute values instead).

      We agree with your comment that the interpretation of negative correlation is debated in MPC. Considering that MPC is a nascent approach to network modeling, we adopted a more conservative strategy that removing negative correlation by referring to the study [4] that proposed the approach. As your comment, the negative correlation might be informative. We will also continue to explore the intrinsic information on the negative correlation reflecting microstructural relationships.

      **** L465: could you please expand on the notion of self-connections, it is not completely evident what this refers to.

      We have revised the description in lines 493-494: “𝑁𝑐 is the number of connection (𝑁𝑐 = 245 for BNA)”.

      **** Paragraph starting on L467: did you evaluate the multicollinearities between communication models? It is possibly rather high (especially for the same models with similar parameters (listed on L440-444)). Such dependence between variables might affect the estimates of feature importance (given the predictive models only care to minimize error, highly correlated features can be selected as a strong predictor while the impact of other features with similarly strong relationships with the target is minimized thus impacting the identification of reliable 'predictors').

      We agree with your comment. The covariance structure (multicollinearities) among the communication models have a high probability to lead to unreliable predictor weights. In our study, we applied Haufe's inversion transform[5] which resolves this issue by computing the covariance between the predicted FC and each communication models in the training set. More details for Haufe's inversion transform please see [5]. We further clarified in the manuscript, please see in lines 497-499: “And covariance structure among the predictors may lead to unreliable predictor weights. Thus, we applied Haufe's inversion transform[38] to address these issues and identify reliable communication mechanisms.”

      **** L474: I am not completely familiar with spin tests but to my understanding, this is a spatial permutation test. I am not sure how this applies to the evaluation of the robustness of feature weight estimates per region (if this was performed per region), it would be useful to provide a bit more detail to make it clearer.

      As your comment, we have supplemented the detail, please see lines 503-507: “Next, we generated 1,000 FC permutations through a spin test[86] for each nodal prediction in each subject and obtained random distributions of model weights. These weights were averaged over the group and were investigated the enrichment of the highest weights per region to assess whether the number of highest weights across communication models was significantly larger than that in a random discovery.”

      **** L477: 'significant communication models were used to represent WMC...', but in L103 you mention you select 3 models: communicability, mean first passage, and flow graphs. Do you want to say that only 3 models were 'significant' and these were exactly the same across all regions (and data splits/ parcellation strategies/ tractography methods)? In the methods, you describe a lot of analysis and testing but it is not completely clear how you come to the selection of the final 3, it would be beneficial to clarify. Also, the final 3 were selected on the whole dataset first and then the pipeline of SC-FC coupling/age assessment/behaviour predictions was run for every (WD, S1, S2) for both parcellations schemes and tractography methods or did you end up with different sets each time? It would be good to make the pipeline and design choices, including the validation bit clearer (a figure detailing all the steps which extend Figure 1 would be very useful to understand the design/choices and how they relate to different runs of the validation).

      Thank you for your comment. In all reproducibility analyses, we used the same 3 models which was selected on the main pipeline (probabilistic tractography and BNA parcellation). According to your comment, we produced a figure that included the pipeline of model selection as the extend of Figure 1. And the description please see lines 106-108: “We used these three models to represent the extracortical connectivity properties in subsequent discovery and reproducibility analyses (Figure S1).” 

      Author response image 4.

      Pipeline of model selection and reproducibility analyses.

      **** Might the imbalance of features between structural connectivity and MPC affect the revealed SC-FC relationships (3 vs 1)? Why did you decide on this ratio rather than for example best WM structural descriptor + MPC?

      We understand your concern. The WMC communication models represent diverse geometric, topological, or dynamic factors. In order to describe the properties of WMC as best as possible, we selected three communication models after controlling covariance structure that can significantly predict FC from the 27 models. Compared to MPC, this does present a potential feature imbalance problem. However, this still supports the conclusion that coupling models that incorporate microarchitectural properties yield more accurate predictions of FC from SC[6, 7]. The relevant experiments are shown in Figure S2 below. If only the best WM structural descriptor is used, this may lose some communication properties of WMC.

      **** L515: were intracranial volume and in-scanner head motion related to behavioural measures? These variables likely impact the inputs, do you expect them to influence the outcome assessments? Or is there a mistake on L518 and you actually corrected the input features rather than the behaviour measures?

      The in-scanner head motion and intracranial volume are related to some age-adjusted behavioural measures, as shown in the following table. The process of regression of covariates from cognitive measures was based on these two cognitive prediction studies [8, 9]. Please see lines 549-554: “Prior to applying the nested fivefold cross-validation framework to each behaviour measure, we regressed out covariates including sex, intracranial volume, and in-scanner head motion from the behaviour measure[59, 69]. Specifically, we estimated the regression coefficients of the covariates using the training set and applied them to the testing set. This regression procedure was repeated for each fold.”

      Author response table 1.

      ** Additionally, in the paper, you propose that the incorporation of cortical microstructural (myelin-related) descriptors with white-matter connectivity to explain FC provides for 'a more comprehensive perspective for characterizing the development of SC-FC coupling' (L60). This combination of cortical and white-matter structure is indeed interesting, however the benefits of incorporating different descriptors could be studied further. For example, comparing results of using only the white matter connectivity (assessed through selected communication models) ~ FC vs (white matter + MPC) ~ FC vs MPC ~ FC. Which descriptors better explain FC? Are the 'coupling trends' similar (or the same)? If yes, what is the additional benefit of using the more complex combination? This would also add strength to your statement at L317: 'These discrepancies likely arise from differences in coupling methods, highlighting the complementarity of our methods with existing findings'. Yes, discrepancies might be explained by the use of different SC inputs. However, it is difficult to see how discrepancies highlight complementarity - does MCP (and combination with wm) provide additional information to using wm structural alone?~

      According to your comment, we have added the analyses based on different models using only the myelin-related predictor or WM connectivity to predict FC, and further compared the results among different models. please see lines 519-521: “In addition, we have constructed the models using only MPC or SCs to predict FC, respectively. Spearman’s correlation was used to assess the consistency between spatial patterns based on different models.” 

      Please see lines 128-130: “In addition, the coupling pattern based on other models (using only MPC or only SCs to predict FC) and the comparison between the models were shown in Figure S2A-C.” Please see lines 178-179: “The age-related patterns of SC-FC coupling based other coupling models were shown in Figure S2D-F.”

      Although we found that there were spatial consistencies in the coupling patterns between different models, the incorporation of MPC with SC connectivity can improve the prediction of FC than the models based on only MPC or SC. For age-related changes in coupling, the differences between the models was further amplified. We agree with you that the complementarity cannot be explicitly quantified and we have revised the description, please see line 329: “These discrepancies likely arise from differences in coupling methods.”

      Author response image 5.

      Comparison results between different models. Spatial pattern of mean SC-FC coupling based on MPC ~ FC (A), SCs ~ FC (B), and MPC + SCs ~ FC (C). Correlation of age with SC-FC coupling across cortex based on MPC ~ FC (D), SCs ~ FC (E), and MPC + SCs ~ FC (F).

      ** For the interpretation of results: L31 'SC-FC coupling is positively associated with genes in oligodendrocyte-related pathways and negatively associated with astrocyte-related gene'; L124: positive myelin content with SC-FC coupling...and similarly on L81, L219, L299, L342, and L490:

      ***You use a T1/T2 ratio which is (in large part) a measure of myelin to estimate the coupling between SC and FC. Evaluation with SC-FC coupling with myeline described in Figure 2E is possibly biased by the choice of this feature. Similarly, it is possible that reported positive associations with oligodendrocyte-related pathways and SC-FC coupling in your work could in part result from a bias introduced by the 'myelin descriptor' (conversely, picking up the oligodendrocyte-related genes is a nice corroboration for the T1/T2 ration being a myelin descriptor, so that's nice). However, it is possible that if you used a different descriptor of the cortical microstructure, you might find different expression patterns associated with the SCFC coupling (for example using neurite density index might pick up neuronal-related genes?). As mentioned in my previous suggestions, I think it would be of interest to first use only the white matter structural connectivity feature to assess coupling to FC and assess the gene expression in the cortical regions to see if the same genes are related, and subsequently incorporate MPC to dissociate potential bias of using a myelin measure from genetic findings.

      Thank you for your insightful comments. In this paper, however, the core method of measuring coupling is to predict functional connections using multimodal structural connections, which may yield more information than a single modal. We agree with your comment that separating SCs and MPC to look at the genes involved in both separately could lead to interesting discoveries. We will continue to explore this in the future.

      ** Generally, I find it difficult to understand the interpretation of SC-FC coupling measures and would be interested to hear your thinking about this. As you mention on L290-294, how well SC predicts FC depends on which input features are used for the coupling assessment (more complex communication models, incorporating additional microstructural information etc 'yield more accurate predictions of FC' L291) - thus, calculated coupling can be interpreted as a measure of how well a particular set of input features explain FC (different sets will explain FC more or less well) ~ coupling is related to a measure of 'missing' information on the SC-FC relationship which is not contained within the particular set of structural descriptors - with this approach, the goal might be to determine the set that best, i.e. completely, explains FC to understand the link between structure and function. When you use the coupling measures for comparisons with age, cognition prediction etc, the 'status' of the SC-FC changes, it is no longer the amount of FC explained by the given SC descriptor set, but it's considered a descriptor in itself (rather than an effect of feature selection / SC-FC information overlap) - how do you interpret/argue for this shift of use?

      Thank you for your comment. In this paper, we obtain reasonable SC-FC coupling by determining the optimal set of structural features to explain the function. The coupling essentially measures the direct correspondence between structure and function. To study the relationship between coupling and age and cognition is actually to study the age correlation and cognitive correlation of this direct correspondence between structure and function. 

      ** In a similar vein to the above comment, I am interested to hear what you think: on L305 you mention that 'perfect SC-FC coupling may be unlikely'. Would this reasoning suggest that functional activity takes place through other means than (and is therefore somehow independent of) biological (structural) substrates? For now, I think one can only say that we have imperfect descriptors of the structure so there is always information missing to explain function, this however does not mean the SC and FC are not perfectly coupled (only that we look at insufficient structural descriptors - limitations of what imaging can assess, what we measure etc). This is in line with L305 where you mention that 'Moreover, our results suggested that regional preferential contributions across different SCs lead to variations in the underlying communication process'. This suggests that locally different areas might use different communication models which are not reflected in the measures of SC-FC coupling that was employed, not that the 'coupling' is lower or higher (or coupling is not perfect). This is also a change in approach to L293: 'This configuration effectively releases the association cortex from strong structural constraints' - the 'release' might only be in light of the particular structural descriptors you use - is it conceivable that a different communication model would be more appropriate (and show high coupling) in these areas.

      Thank you for your insightful comments. We have changed the description, please see lines 315317: “SC-FC coupling is dynamic and changes throughout the lifespan[7], particularly during adolescence[6,9], suggesting that perfect SC-FC coupling may require sufficient structural descriptors.” 

      *Cognitive predictions:

      ** From a practical stand-point, do you think SC-FC coupling is a better (more accurate) indicator of cognitive outcomes (for example for future prediction studies) than each modality alone (which is practically easier to obtain and process)? It would be useful to check the behavioural outcome predictions for each modality separately (as suggested above for coupling estimates). In case SC-FC coupling does not outperform each modality separately, what is the benefit of using their coupling? Similarly, it would be useful to compare to using only cortical myelin for the prediction (which you showed to increase in importance for the coupling). In the case of myelin->coupling-> intelligence, if you are able to predict outcomes with the same performance from myelin without the need for coupling measures, what is the benefit of coupling?

      From a predictive performance point of view, we do not believe that SC-FC coupling is a better indicator than a single mode (voxel, network or other indicator). Our starting point is to assess whether SC-FC coupling is related to the individual differences of cognitive performances rather than to prove its predictive power over other measures. As you suggest, it's a very interesting perspective on the predictive power of cognition by separating the various modalities and comparing them. We will continue to explore this issue in the future study.

      ** The statement on L187 'suggesting that increased SC-FC coupling during development is associated with higher intelligence' might not be completely appropriate before age corrections (especially given the large drop in performance that suggests confounding effects of age).

      According to your comment, we have removed the statement.

      ** L188: it might be useful to report the range of R across the outer cross-validation folds as from Figure 4A it is not completely clear that the predictive performance is above the random (0) threshold. (For the sake of clarity, on L180 it might be useful for the reader if you directly report that other outcomes were not above the random threshold).

      According to your comment, we have added the range of R and revised the description, please see lines 195-198: “Furthermore, even after controlling for age, SC-FC coupling remained a significant predictor of general intelligence better than at chance (Pearson’s r\=0.11±0.04, p\=0.01, FDR corrected, Figure 4A). For fluid intelligence and crystal intelligence, the predictive performances of SC-FC coupling were not better than at chance (Figure 4A).”

      In a similar vein, in the text, you report Pearson's R for the predictive results but Figure 4A shows predictive accuracy - accuracy is a different (categorical) metric. It would be good to homogenise to clarify predictive results.

      We have made the corresponding changes in Figure 4.

      Author response image 6.

      Encoding individual differences in intelligence using regional SC-FC coupling. (A) Predictive accuracy of fluid, crystallized, and general intelligence composite scores. (B) Regional distribution of predictive weight. (C) Predictive contribution of functional networks. The boxes show the median and interquartile range (IQR; 25–75%), and the whiskers depict the 1.5× IQR from the first or third quartile.

      *Methods and QC:

      -Parcellations

      ** It would be useful to mention briefly how the BNA was applied to the data and if any quality checks were performed for the resulting parcellations, especially for the youngest subjects which might be most dissimilar to the population used to derive the atlas (healthy adults HCP subjects) ~ question of parcellation quality.

      We have added the description, please see lines 434-436: “The BNA[31] was projected on native space according to the official scripts (http://www.brainnetome.org/resource/) and the native BNA was checked by visual inspection.” 

      ** Additionally, the appropriateness of structurally defined regions for the functional analysis is also a topic of important debate. It might be useful to mention the above as limitations (which apply to most studies with similar focus).

      We have added your comment to the methodological issues, please see lines 378-379: “Third, the appropriateness of structurally defined regions for the functional analysis is also a topic of important debate.”

      - Tractography

      ** L432: it might be useful to name the method you used (probtrackx).

      We have added this name to the description, please see lines 455-456: “probabilistic tractography (probtrackx)[78, 79] was implemented in the FDT toolbox …”

      ** L434: 'dividing the total fibres number in source region' - dividing by what?

      We have revised the description, please see line 458: “dividing by the total fibres number in source region.”

      ** L436: 'connections in subcortical areas were removed' - why did you trace connections to subcortical areas in the first place if you then removed them (to match with cortical MPC areas I suspect)? Or do you mean there were spurious streamlines through subcortical regions that you filtered?

      On the one hand we need to match the MPC, and on the other hand, as we stated in methodological issues, the challenge of accurately resolving the connections of small structures within subcortical regions using whole-brain diffusion imaging and tractography techniques[10, 11]. 

      ** Following on the above, did you use any exclusion masks during the tracing? In general, more information about quality checks for the tractography would be useful. For example, L437: did you do any quality evaluations based on the removed spurious streamlines? For example, were there any trends between spurious streamlines and the age of the subject? Distance between regions/size of the regions?

      We did not use any exclusion masks. We performed visual inspection for the tractography quality and did not assess the relationship between spurious streamlines and age or distance between regions/size of the regions.

      ** L439: 'weighted probabilistic network' - this was weighted by the filtered connectivity densities or something else?

      The probabilistic network is weighted by the filtered connectivity densities.

      ** I appreciate the short description of the communication models in Text S1, it is very useful.

      Thank you for your comment.

      ** In addition to limitations mentioned in L368 - during reconstruction, have you noticed problems resolving short inter-hemispheric connections?

      We have not considered this issue, we have added it to the limitation, please see lines 383-384: “In addition, the reconstruction of short connections between hemispheres is a notable challenge.”

      - Functional analysis:

      ** There is a difference in acquisition times between participants below and above 8 years (21 vs 26 min), does the different length of acquisition affect the quality of the processed data?

      We have made relatively strict quality control to ensure the quality of the processed data.  

      ** L446 'regressed out nuisance variables' - it would be informative to describe in more detail what you used to perform this.

      We have provided more detail about the regression of nuisance variables, please see lines 476-477: “The nuisance variables were removed from time series based on general linear model.”

      ** L450-452: it would be useful to add the number of excluded participants to get an intuition for the overall quality of the functional data. Have you checked if the quality is associated with the age of the participant (which might be related to motion etc). Adding a distribution of remaining frames across participants (vs age) would be useful to see in the supplementary methods to better understand the data you are using.

      We have supplemented the exclusion information of the subjects during the data processing, and the distribution and aged correlation of motion and remaining frames. Please see lines 481-485: “Quality control. The exclusion of participants in the whole multimodal data processing pipeline was depicted in Figure S13. In the context of fMRI data, we computed Pearson’s correlation between motion and age, as well as between the number of remaining frames and age, for the included participants aged 5 to 22 years and 8 to 22 years, respectively. These correlations were presented in Figure S14.”

      Author response image 7.

      Exclusion of participants in the whole multimodal data processing pipeline.  

      Author response image 8.

      Figure S14. Correlations between motion and age and number of remaining frames and age.

      ** L454: 'Pearson's correlation's... ' In contrast to MPC you did not remove negative correlations in the functional matrices. Why this choice?

      Whether the negative correlation connection of functional signal is removed or not has always been a controversial issue. Referring to previous studies of SC-FC coupling[12-14], we find that the practice of retaining negative correlation connections has been widely used. In order to retain more information, we chose this strategy. Considering that MPC is a nascent approach to network modeling, we adopted a more conservative strategy that removing negative correlation by referring to the study [4] that proposed the approach.

      - Gene expression:

      ** L635, you focus on the left cortex, is this common? Do you expect the gene expression to be fully symmetric (given reported functional hemispheric asymmetries)? It might be good to expand on the reasoning.

      An important consideration regarding sample assignment arises from the fact that only two out of six brains were sampled from both hemispheres and four brains have samples collected only in the left. This sparse sampling should be carefully considered when combining data across donors[1]. We have supplemented the description, please see lines 569-571: “Restricting analyses to the left hemisphere will minimize variability across regions (and hemispheres) in terms of the number of samples available[40].”

      ** Paragraph of L537: you use evolution of coupling with age (correlation) and compare to gene expression with adults (cohort of Allen Human Brain Atlas - no temporal evolution to the gene expressions) and on L369 you mention that 'relative spatial patterns of gene expressions remain stable after birth'. Of course this is not a place to question previous studies, but would you really expect the gene expression associated with the temporary processes to remain stable throughout the development? For example, myelination would follow different spatiotemporal gradient across brain regions, is it reasonable to expect that the expression patterns remain the same? How do you then interpret a changing measure of coupling (correlation with age) with a gene expression assessed statically?

      We agree with your comment that the spatial expression patterns is expected to vary at different periods. We have revised the previous description, please see lines 383-386: “Fifth, it is important to acknowledge that changes in gene expression levels during development may introduce bias in the results.”

      - Reproducibility analyses:

      ** Paragraph L576: are we to understand that you performed the entire pipeline 3 times (WD, S1, S2) for both parcellations schemes and tractography methods (~12 times) including the selection of communication models and you always got the same best three communication models and gene expression etc? Or did you make some design choices (i.e. selection of communication models) only on a specific set-up and transfer to other settings?

      The choice of communication model is established at the beginning, which we have clarified in the article, please see lines 106-108: “We used these three models to represent the extracortical connectivity properties in subsequent discovery and reproducibility analyses (Figure S1).” For reproducibility analyses (parcellation, tractography, and split-half validation), we fixed other settings and only assessed the impact of a single factor.

      ** Paragraph of L241: I really appreciate you evaluated the robustness of your results to different tractography strategies. It is reassuring to see the similarity in results for the two approaches. Did you notice any age-related effects on tractography quality for the two methods given the wide age range (did you check?)

      In our study, the tractography quality was checked by visual inspection. Using quantifiable tools to tractography quality in future studies could answer this question objectively.

      ** Additionally, I wonder how much of that overlap is driven by the changes in MPC which is the same between the two methods... especially given its high weight in the SC-FC coupling you reported earlier in the paper. It might be informative to directly compare the connectivity matrices derived from the two tracto methods directly. Generally, as mentioned in the previous comments, I think it would be interesting to assess coupling using different input settings (with WM structural and MPC separate and then combined).

      As your previous comment, we have examined the coupling patterns, coupling differences, coupling age correlation, and spatial correlations between the patterns based on different models, as shown in Figure S2. Please see our response to the previous comment for details.

      ** L251 - I also wonder if the random splitting is best adapted to validation in your case given you study relationships with age. Would it make more sense to make stratified splits to ensure a 'similar age coverage' across splits?

      In our study, we adopt the random splitting process which repeated 1,000 times to minimize bias due to data partitioning. The stratification you mentioned is a reasonable method, and keeping the age distribution even will lead to higher verification similarity than our validation method. However, from the validation results of our method, the similarity is sufficient to explain the generalization of our findings.

      Minor comments

      L42: 'is regulated by genes'

      ** Coupling (if having a functional role and being regulated at all) is possibly resulting from a complex interplay of different factors in addition to genes, for example, learning/environment, it might be more cautious to use 'regulated in part by genes' or similar.

      We have corrected it, please see line 42.

      L43 (and also L377): 'development of SC-FC coupling'

      ** I know this is very nitpicky and depends on your opinion about the nature of SC-FC coupling, but 'development of SC-FC coupling' gives an impression of something maturing that has a role 'in itself' (for example development of eye from neuroepithelium to mature organ etc.). For now, I am not sure it is fully certain that SC-FC coupling is more than a byproduct of the comparison between SC and FC, using 'changes in SC-FC coupling with development' might be more apt.

      We have corrected it, please see lines 43-44.

      L261 'SC-FC coupling was stronger ... [] ... and followed fundamental properties of cortical organization.' vs L168 'No significant correlations were found between developmental changes in SC-FC coupling and the fundamental properties of cortical organization'.

      **Which one is it? I think in the first you refer to mean coupling over all infants and in the second about correlation with age. How do you interpret the difference?

      Between the ages of 5 and 22 years, we found that the mean SC-FC coupling pattern has become similar to that of adults, consistent with the fundamental properties of cortical organization. However, the developmental changes in SC-FC coupling are heterogeneous and sequential and do not follow the mean coupling pattern to change in the same magnitude.

      L277: 'temporal and spatial complexity'

      ** Additionally, communication models have different assumptions about the flow within the structural network and will have different biological plausibility (they will be more or less

      'realistic').

      Here temporal and spatial complexity is from a computational point of view.

      L283: 'We excluded a centralized model (shortest paths), which was not biologically plausible' ** But in Text S1 and Table S1 you specify the shortest paths models. Does this mean you computed them but did not incorporate them in the final coupling computations even if they were predictive?

      ** Generally, I find the selection of the final 3 communication models confusing. It would be very useful if you could clarify this further, for example in the methods section.

      We used all twenty-seven communication models (including shortest paths) to predict FC at the node level for each participant. Then we identified three communication models that can significantly predict FC. For the shortest path, he was excluded because he did not meet the significance criteria. We have further added methodological details to this section, please see lines 503-507.

      L332 'As we observed increasing coupling in these [frontoparietal network and default mode network] networks, this may have contributed to the improvements in general intelligence, highlighting the flexible and integrated role of these networks' vs L293 'SC-FC coupling in association areas, which have lower structural connectivity, was lower than that in sensory areas. This configuration effectively releases the association cortex from strong structural constraints imposed by early activity cascades, promoting higher cognitive functions that transcend simple sensori-motor exchanges'

      ** I am not sure I follow the reasoning. Could you expand on why it would be the decoupling promoting the cognitive function in one case (association areas generally), but on the reverse the increased coupling in frontoparietal promoting the cognition in the other (specifically frontoparietal)?

      We tried to explain the problem, for general intelligence, increased coupling in frontoparietal could allow more effective information integration enable efficient collaboration between different cognitive processes.

      * Formatting errors etc.

      L52: maybe rephrase?

      We have rephrased, please see lines 51-53: “The T1- to T2-weighted (T1w/T2w) ratio of MRI has been proposed as a means of quantifying microstructure profile covariance (MPC), which reflects a simplified recapitulation in cellular changes across intracortical laminar structure[6, 1215].”

      L68: specialization1,[20].

      We have corrected it.

      L167: 'networks significantly increased with age and exhibited greater increased' - needs rephrasing.

      We have corrected it.

      L194: 'networks were significantly predicted the general intelligence' - needs rephrasing.

      We have corrected it, please see lines 204-205: “we found that the weights of frontoparietal and default mode networks significantly contributed to the prediction of the general intelligence.”

      L447: 'and temporal bandpass filtering' - there is a verb missing.

      We have corrected it, please see line 471: “executed temporal bandpass filtering.”

      L448: 'greater than 0.15' - unit missing.

      We have corrected it, please see line 472: “greater than 0.15 mm”.

      L452: 'After censoring, regression of nuisance variables, and temporal bandpass filtering,' - no need to repeat the steps as you mentioned them 3 sentences earlier.

      We have removed it.

      L458-459: sorry I find this description slightly confusing. What do you mean by 'modal'? Connectional -> connectivity profile. The whole thing could be simplified, if I understand correctly your vector of independent variables is a set of wm and microstructural 'connectivity' of the given node... if this is not the case, please make it clearer.

      We have corrected it, please see line 488: “where 𝒔𝑖 is the 𝑖th SC profiles, 𝑛 is the number of SC profiles”.

      L479: 'values and system-specific of 480 coupling'.

      We have corrected it.

      L500: 'regular' - regularisation.

      We have changed it to “regularization”.

      L567: Do you mean that in contrast to probabilistic with FSL you use deterministic methods within Camino? For L570, you introduce communication models through 'such as': did you fit all models like before? If not, it might be clearer to just list the ones you estimated rather than introduce through 'such as'.

      We have changed the description to avoid ambiguity, please see lines 608-609: “We then calculated the communication properties of the WMC including communicability, mean first passage times of random walkers, and flow graphs (timescales=1).”

      Citation [12], it is unusual to include competing interests in the citation, moreover, Dr. Bullmore mentioned is not in the authors' list - this is most likely an error with citation import, it would be good to double-check.

      We have corrected it.

      L590: Python scripts used to perform PLS regression can 591 be found at https://scikitlearn.org/. The link leads to general documentation for sklearn.

      We have corrected it, please see lines 627-630: “Python scripts used to perform PLS regression can be found at https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cro ss_decomposition.PLSRegression.”

      P26 and 27 - there are two related sections: Data and code availability and Code availability - it might be worth merging into one section if possible.

      We have corrected it, please see lines 623-633.

      References

      (1) Arnatkeviciute A, Fulcher BD, Fornito A. A practical guide to linking brain-wide gene expression and neuroimaging data. Neuroimage. 2019;189:353-67. Epub 2019/01/17. doi: 10.1016/j.neuroimage.2019.01.011. PubMed PMID: 30648605.

      (2) Zhong S, He Y, Gong G. Convergence and divergence across construction methods for human brain white matter networks: an assessment based on individual differences. Hum Brain Mapp. 2015;36(5):1995-2013. Epub 2015/02/03. doi: 10.1002/hbm.22751. PubMed PMID: 25641208; PubMed Central PMCID: PMCPMC6869604.

      (3) Waehnert MD, Dinse J, Weiss M, Streicher MN, Waehnert P, Geyer S, et al. Anatomically motivated modeling of cortical laminae. Neuroimage. 2014;93 Pt 2:210-20. Epub 2013/04/23. doi: 10.1016/j.neuroimage.2013.03.078. PubMed PMID: 23603284.

      (4) Paquola C, Vos De Wael R, Wagstyl K, Bethlehem RAI, Hong SJ, Seidlitz J, et al. Microstructural and functional gradients are increasingly dissociated in transmodal cortices. PLoS Biol. 2019;17(5):e3000284. Epub 2019/05/21. doi: 10.1371/journal.pbio.3000284. PubMed PMID: 31107870.

      (5) Haufe S, Meinecke F, Gorgen K, Dahne S, Haynes JD, Blankertz B, et al. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage. 2014;87:96-110. Epub 2013/11/19. doi: 10.1016/j.neuroimage.2013.10.067. PubMed PMID: 24239590.

      (6) Demirtas M, Burt JB, Helmer M, Ji JL, Adkinson BD, Glasser MF, et al. Hierarchical Heterogeneity across Human Cortex Shapes Large-Scale Neural Dynamics. Neuron. 2019;101(6):1181-94 e13. Epub 2019/02/13. doi: 10.1016/j.neuron.2019.01.017. PubMed PMID: 30744986; PubMed Central PMCID: PMCPMC6447428.

      (7) Deco G, Kringelbach ML, Arnatkeviciute A, Oldham S, Sabaroedin K, Rogasch NC, et al. Dynamical consequences of regional heterogeneity in the brain's transcriptional landscape. Sci Adv. 2021;7(29). Epub 2021/07/16. doi: 10.1126/sciadv.abf4752. PubMed PMID: 34261652; PubMed Central PMCID: PMCPMC8279501.

      (8) Chen J, Tam A, Kebets V, Orban C, Ooi LQR, Asplund CL, et al. Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study. Nat Commun. 2022;13(1):2217. Epub 2022/04/27. doi: 10.1038/s41467-022-29766-8. PubMed PMID: 35468875; PubMed Central PMCID: PMCPMC9038754.

      (9) Li J, Bzdok D, Chen J, Tam A, Ooi LQR, Holmes AJ, et al. Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity. Sci Adv. 2022;8(11):eabj1812. Epub 2022/03/17. doi: 10.1126/sciadv.abj1812. PubMed PMID: 35294251; PubMed Central PMCID: PMCPMC8926333.

      (10) Thomas C, Ye FQ, Irfanoglu MO, Modi P, Saleem KS, Leopold DA, et al. Anatomical accuracy of brain connections derived from diffusion MRI tractography is inherently limited. Proc Natl Acad Sci U S A. 2014;111(46):16574-9. Epub 2014/11/05. doi: 10.1073/pnas.1405672111. PubMed PMID: 25368179; PubMed Central PMCID: PMCPMC4246325.

      (11) Reveley C, Seth AK, Pierpaoli C, Silva AC, Yu D, Saunders RC, et al. Superficial white matter fiber systems impede detection of long-range cortical connections in diffusion MR tractography. Proc Natl Acad Sci U S A. 2015;112(21):E2820-8. Epub 2015/05/13. doi: 10.1073/pnas.1418198112. PubMed PMID: 25964365; PubMed Central PMCID: PMCPMC4450402.

      (12) Gu Z, Jamison KW, Sabuncu MR, Kuceyeski A. Heritability and interindividual variability of regional structure-function coupling. Nat Commun. 2021;12(1):4894. Epub 2021/08/14. doi: 10.1038/s41467-021-25184-4. PubMed PMID: 34385454; PubMed Central PMCID: PMCPMC8361191.

      (13) Liu ZQ, Vazquez-Rodriguez B, Spreng RN, Bernhardt BC, Betzel RF, Misic B. Time-resolved structure-function coupling in brain networks. Commun Biol. 2022;5(1):532. Epub 2022/06/03. doi: 10.1038/s42003-022-03466-x. PubMed PMID: 35654886; PubMed Central PMCID: PMCPMC9163085.

      (14) Zamani Esfahlani F, Faskowitz J, Slack J, Misic B, Betzel RF. Local structure-function relationships in human brain networks across the lifespan. Nat Commun. 2022;13(1):2053. Epub 2022/04/21. doi: 10.1038/s41467-022-29770-y. PubMed PMID: 35440659; PubMed Central PMCID: PMCPMC9018911.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank both reviewers for their reviews of our work and suggestions for improvement. Changes to the manuscript are captured with the Track Changes feature, and our point-by-point responses are included below in bold/italic text.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary Bell et al. overexpress Prom1 or Ttyh1 and test its effect on EV formation from cell lines. They find that Ttyh1 expression leads to an increase in small EVs as well as tubulated EVs, while Prom1 expression leads to a milder increase in small EVs. EV induction by Prom1 is dependent on cholesterol and the authors show that Prom1 makes the cholesterol in EVs more resistant to detergent. The authors show no connection between Ttyh1 EV induction and cholesterol, although they claim it is important. They also show that a disease mutation in Prom1 decreases Prom1 trafficking to the plasma membrane and increases cholesterol resistance to detergent in EVs. The authors also find that the disease mutation decreases the size of the Prom1-induced EVs.

      Major Comments

      Results - line 99-106 - The EV isolation protocol would remove large EVs like the Prom1+ midbody remnants. It is important to explicitly specify that this study focused on small EVs.

      We agree with the reviewers and appreciate the suggestion to make this distinction. We have clarified the Results text (lines 104-105) to specify that our method specifically reconstitutes and isolates small EVs.

      Statistics - The t tests appear to have been performed without correction for multiple comparisons (Figure 2C-D, Fig. 4D). Given that >10 comparisons were made, this can alter the biological significance of p__We agree with the reviewers that multiple test correction is appropriate for these figures. We have applied Bonferroni correction to the t-tests in Figs 2C, 2D, and 4D by adjusting our significance thresholds (alpha), and included additional text in the figure legend to indicate how and why the correction was performed.__

      The DLS data does not appear to give any insight into EV size (unlike the EM data) and could be removed from the whole manuscript (or moved to supplemental). The authors should also remove any conclusions based on the DLS data.

      We appreciate the reviewers raising this point and agree that the DLS is less informative than our other measurements of EV size and morphology. We have moved all DLS figure panels where EV size is characterized by another method to the Supplement.

      Discussion - line 382-383 "Because Prom1 EVs arise directly from blebbing of the plasma membrane23, this finding suggests that Prom1 and Ttyh1 traffic to similar regions of the plasma membrane." The authors have not examined where Prom1 or Ttyh1 localize in the plasma membrane and can not draw this conclusion. That both proteins promote plasma membrane budding would only suggest that both proteins localize to the plasma membrane, not subregions of the plasma membrane. However, the authors have not demonstrated that Ttyh1 specifically induces plasma membrane budding. The different size of Ttyh1 EVs could be due to different biogenesis mechanisms (i.e. derived from intracellular organelles instead of the plasma membrane), making this statement an over-interpretation on both parts.

      This is a fair point. We have removed this sentence from the Discussion (lines 402-403) as the reviewer requests.

      Discussion - line 398-400 "Membrane cholesterol is necessary for Prom1-mediated remodeling20,21 and is present at similar levels in purified Prom1 and Ttyh1 EVs (Fig 5E), indicating that it is undoubtedly important for EV formation by both proteins." & line 415-417 "We find that conservative mutations in several of these adjacent aromatic residues impair EV formation by Prom1, but do not mimic the stable cholesterol binding of W795R (Figs 2C, 4D). " The author's data suggests that cholesterol is not important for Ttyh1 to induce EV formation. The authors show that cholesterol depletion does not alter Ttyh1 EV production. Similarly, they find separable effects on cholesterol binding and EV formation with Prom1 mutants, which suggest that there is more to Prom1-mediated EV formation than cholesterol. That cholesterol is present at similar levels can reflect that overexpression of these proteins does not alter the amount of cholesterol in the EV source membrane (i.e. plasma membrane). Also, wouldn't molecular crowding of a membrane protein be predicted to influence how easy it is to extract lipids?

      We thank the reviewer for highlighting this imprecisely phrased sentence. We only meant to indicate that cholesterol is present in both sets of EVs and contributes globally to membrane fluidity. We have removed this sentence from the Discussion (lines 419-421) to avoid over-interpretation or confusion.

      The reviewer is also correct to point out that molecular crowding could alter how extractable lipids are from EVs. We have included additional explanatory text in the Discussion (lines 421-426) addressing this point.

      Discussion - line 431-433 "Our findings suggest that the dynamic interaction of Prom1 with cholesterol may promote efficient maturation and trafficking of Prom1 between the endomembrane system and the plasma membrane. The authors did not investigate whether depleting cholesterol improved Prom1(W795R) trafficking to the plasma membrane, making this inference untested. Soften interpretation or test experimentally.

      We appreciate the reviewer raising this point. We have altered the text in this paragraph (lines449-459) to soften our interpretation of these results, as suggested by the reviewer.

      Minor Comments Abstract - "the EVs produced are biophysically similar" The authors don't perform any typical biophysical characterization (beyond size and perhaps density), so do they mean physically similar? Given the Prom1 and Ttyh1 EVs can have different shapes and are significantly different sizes, this statement feels misleading.

      We thank the reviewer for pointing out the ambiguity around this word. We agree that "physically similar" is a more precise and accurate term, and have revised all instances of this language in the manuscript.

      Intro - line 59-60 - "Large Prom1 EVs (500-700 nm in diameter) appear to form from bulk release of membrane from the cell midbody" Midbody remnants are well defined (if variously named, i.e. flemmingsome) large EVs derived from the spindle midbody, intercellular bridge, and cytokinetic ring. I'm not sure what the authors are trying to express by "bulk release of membrane". Midbody remnants are also a site of membrane tubulation.

      The reviewer is correct to point out that midbody remnant release is a well defined process. We originally included this statement to avoid indicating that we are studying the only known class of Prominin EVs, but now recognize that including this creates more confusion that it alleviates. To improve clarity concurrently with the changes referenced above emphasizing that we are specifically studying small EVs, we have removed this reference to the larger class of EVs from the introduction (lines 61-63).

      The effect on total numbers of EVs is buried in the y-axes of the EM graphs, making it difficult to distinguish where a higher n of images was examined vs. where there is an increase in EVs. This is especially hard to interpret given the high difference in n values.

      The reviewers raise a valid critique of these figure panels. To improve clarity, we have adjusted the y-axes to represent the fraction of EVs rather than the absolute value of EVs, and listed the n values in figure legends.

      Fig. 2C - Missing WT error bars

      We appreciate the reviewer's concern for the WT error bars in these figures. The measurements underlying these plots are derived from quantification of Western blots. Because the blots have a limited number of lanes, the WT sample was run as a normalization control on each of several sets of blots. By employing this approach, we could make quantitative comparisons within each blot without needing to make direct comparisons between blots, eliminating confounding variables such as blotting times, positions of blots on rotary shakers, developer incubation time, exposure times, etc. Because WT lanes were used for normalization, each "WT" blot condition has its own set of error bars that was used for t-test comparison with the samples that share a blot. For this purely technical reason, we can represent the data either normalized against WT values or with three separate WT measurements for each plot. In the interest of clarity and transparency, we elected to report the values normalized to WT and to include all raw blot images in Supplementary Fig. S4. We understand that we could have made this more transparent, so to clarify this decision for readers, we now explicitly reference the raw blot images in both the Results text (lines 185) and in the Figure 2 legend.

      Fig. 3H, 5C - Why not show raw numbers on the y-axes of the inset graphs like the main graph? Also, if it is only showing a subset of roundness ranges, then the x-axis should not go to 1 (i.e. axis range 0-0.8 would be clearer). I had a hard time figuring out what these insets were trying to show me, so please think about presenting this data more clearly (and larger).

      For clarity, we have moved the inset graphs to separate panels alongside the main panel and implemented the requested changes to the axes (see Figs. 3G, 5B).

      Discussion - line 377 - "Though we do not claim that Ttyh1 endogenously induces EV formation" This statement could be misinterpreted to say that you do not think endogenous Ttyh1 regulates EV formation. Rephrase as "although we have not examined whether..."

      We thank the reviewer for pointing out this unclear sentence and have applied the requested change (line 397).

      Discussion - line 400-402 "Our results do not indicate that Ttyh1 does not bind cholesterol, merely that it does not form an interaction that is sufficiently kinetically stable to be co-immunoprecipitated." The phrasing here is confusing with multiple "not". It is better to leave things open than to say what you have not shown. Rephrase suggestion: "Although Ttyh1 was not able to form a kinetically stable interaction for co-immunoprecipitation, it remains to be determined whether Ttyh1 is able to bind cholesterol."

      We thank the reviewer for their suggestion and have modified the sentence to avoid double-negative phrasing (lines 422-426).

      Movies - I'm not sure what the two videos add. It's difficult to convince myself that I see plasma membrane labeling in either movie, especially in comparison to the over-exposed WGA staining. Also, why are there ~5 sec of empty movie at the end of each?

      We appreciate the reviewer's feedback and have removed the movies from the manuscript.

      Reviewer #1 (Significance (Required)):

      The data is interesting and well presented, but over interpreted in the discussion. The data on Ttyh1 expression inducing EVs is novel, but limited to overexpression studies. This study will be of interest to the EV, membrane curvature, and Prmn1/Tthy1 fields My expertise is in basic research on membrane trafficking (including EV formation) and lipids

      We thank the reviewer for their favorable review and helpful suggestions.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, authors investigated the role of Prom1 and Ttyh1 proteins on EV formation. They showed that both proteins can induce EV formation, while the mechanisms by which they do it might differ slightly. Ttyh1 binding to cholesterol is not as pronounced as Prom1. Surprisingly, cholesterol binding efficiency inversely correlates with EV formation. Also, EVs induced by Tthy1 and Prom1 are structurally different.

      My suggestions to improve the manuscript are below.

      • Figure 2E is not very convincing. As the authors mentioned, the signal is too low to have a concrete conclusion. The line scans somehow show that WT is more membrane-localized than mutant, but colocalization of Prom1 and WGA seems very similar in both cases. Is it certain that the addition of fluorophore did not change the trafficking? Does endogenous Prom-1 staining look like this? Also, why is WGA staining brighter in mutant sample, just a usual variation or biologically important?

      We understand the reviewer's concern about low signal, but respectfully disagree that the signal is too low to draw a meaningful conclusion. The only point we conclusively make in Fig. 2E is that WT Prom1 is more efficiently trafficked to the plasma membrane than W795R Prom1. We feel that this effect is sufficiently well evidenced by the line scan analysis in Supp. Fig. S5, where Prom1 peaks are cleanly visible for WT but not for W795R protein.

      We observe somewhat variable WGA staining in our experiments, and the differences we show in this figure panel are representative of typical staining variation. We do not draw any biological conclusions from the level of WGA present, only from its localization. Because both the plasma membrane and late endosomes are WGA+, we suspect that the W795R Prom1 is failing to traffic from endosomes to the plasma membrane. However, given the limitations of our fluorescence assay, we have removed any claim beyond the change plasma membrane trafficking efficiency from discussion of this experiment.

      We cannot conclude whether the mStayGold fluorophore alters trafficking of Prom1 to the plasma membrane. In response to the reviewer's comment, we attempted to use immunofluorescence to measure membrane localization of untagged Prom1 with the AC133-1 antibody. Unfortunately, we were unable to optimize this protocol to achieve sufficient membrane staining for quantification. We have softened our interpretation of Fig. 2E in the Results and Discussion (lines 203-204, 450) to acknowledge that the effects we observe are only measured with fluorophore-tagged Prom1.

      • I also recommend showing the localization of Ttyh1 on cells.

      We appreciate the reviewer's suggestion here, and it is an experiment we considered. One of the challenges we faced in this assay was quantitatively measuring fluorescent signal along cell-boundary plasma membranes without saturating signal from the very bright WGA+ endosomes. Because Ttyh1 globally expresses at higher levels than Prom1 (see Figs. 3C, 3I), direct comparison of membrane-localized Prom1 and Ttyh1 is technically challenging in these cells. However, Ttyh membrane localization has been widely reported in other papers (Matthews et al., J. Neurochem, 2007; Jung et al., J. Neurosci., 2017; Sukalskaia et al., Nat. Commun., 2021; Melvin et al., Comm. Biol., 2022) that we now explicitly mention and cite for reader clarity in both the Introduction and Results (lines 69-71, 224-225).

      • A graph directly showing cholesterol binding vs EV formation efficiency would be very useful.

      We agree with the reviewer that this would be an interesting and useful addition to the paper. We now include this panel in the revised manuscript as Fig. 4F.

      • "Prominin and Tweety homology proteins are homologous and functionally analogous" involves speculation and authors should clearly mention this. Revealing that they are both contributing to EV formation does not make them definitely functionally analogous.

      We agree with the reviewer that this sentence is indeed ambiguous and somewhat speculative. We have revised the section heading to "Prominin and Tweety homology proteins are homologous proteins that both promote EV formation" (lines 461-462) to indicate the specific analogous function we observe.

      Reviewer #2 (Significance (Required)):

      Overall, it is a useful addition to the field of cell biology, particularly EV field. EV formation and efficiency are both important topics, and this manuscript might give insights.

      We thank the reviewer for their favorable review and helpful suggestions.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      Bell et al. overexpress Prom1 or Ttyh1 and test its effect on EV formation from cell lines. They find that Ttyh1 expression leads to an increase in small EVs as well as tubulated EVs, while Prom1 expression leads to a milder increase in small EVs. EV induction by Prom1 is dependent on cholesterol and the authors show that Prom1 makes the cholesterol in EVs more resistant to detergent. The authors show no connection between Ttyh1 EV induction and cholesterol, although they claim it is important. They also show that a disease mutation in Prom1 decreases Prom1 trafficking to the plasma membrane and increases cholesterol resistance to detergent in EVs. The authors also find that the disease mutation decreases the size of the Prom1-induced EVs.

      Major Comments

      Results - line 99-106 - The EV isolation protocol would remove large EVs like the Prom1+ midbody remnants. It is important to explicitly specify that this study focused on small EVs.

      Statistics - The t tests appear to have been performed without correction for multiple comparisons (Figure 2C-D, Fig. 4D). Given that >10 comparisons were made, this can alter the biological significance of p<0.05 (1 incorrect in 20 comparisons). Please reanalyze with a more appropriate statistical test for multiple comparisons (i.e. ANOVA) or apply a correction to the t test values (i.e. Bonferroni).

      The DLS data does not appear to give any insight into EV size (unlike the EM data) and could be removed from the whole manuscript (or moved to supplemental). The authors should also remove any conclusions based on the DLS data.

      Discussion - line 382-383 "Because Prom1 EVs arise directly from blebbing of the plasma membrane23, this finding suggests that Prom1 and Ttyh1 traffic to similar regions of the plasma membrane." The authors have not examined where Prom1 or Ttyh1 localize in the plasma membrane and can not draw this conclusion. That both proteins promote plasma membrane budding would only suggest that both proteins localize to the plasma membrane, not subregions of the plasma membrane. However, the authors have not demonstrated that Ttyh1 specifically induces plasma membrane budding. The different size of Ttyh1 EVs could be due to different biogenesis mechanisms (i.e. derived from intracellular organelles instead of the plasma membrane), making this statement an over-interpretation on both parts.

      Discussion - line 398-400 "Membrane cholesterol is necessary for Prom1-mediated remodeling20,21 and is present at similar levels in purified Prom1 and Ttyh1 EVs (Fig 5E), indicating that it is undoubtedly important for EV formation by both proteins." & line 415-417 "We find that conservative mutations in several of these adjacent aromatic residues impair EV formation by Prom1, but do not mimic the stable cholesterol binding of W795R (Figs 2C, 4D). " The author's data suggests that cholesterol is not important for Ttyh1 to induce EV formation. The authors show that cholesterol depletion does not alter Ttyh1 EV production. Similarly, they find separable effects on cholesterol binding and EV formation with Prom1 mutants, which suggest that there is more to Prom1-mediated EV formation than cholesterol. That cholesterol is present at similar levels can reflect that overexpression of these proteins does not alter the amount of cholesterol in the EV source membrane (i.e. plasma membrane). Also, wouldn't molecular crowding of a membrane protein be predicted to influence how easy it is to extract lipids?

      Discussion - line 431-433 "Our findings suggest that the dynamic interaction of Prom1 with cholesterol may promote efficient maturation and trafficking of Prom1 between the endomembrane system and the plasma membrane. The authors did not investigate whether depleting cholesterol improved Prom1(W795R) trafficking to the plasma membrane, making this inference untested. Soften interpretation or test experimentally.

      Minor Comments

      Abstract - "the EVs produced are biophysically similar" The authors don't perform any typical biophysical characterization (beyond size and perhaps density), so do they mean physically similar? Given the Prom1 and Ttyh1 EVs can have different shapes and are significantly different sizes, this statement feels misleading.

      Intro - line 59-60 - "Large Prom1 EVs (500-700 nm in diameter) appear to form from bulk release of membrane from the cell midbody" Midbody remnants are well defined (if variously named, i.e. flemmingsome) large EVs derived from the spindle midbody, intercellular bridge, and cytokinetic ring. I'm not sure what the authors are trying to express by "bulk release of membrane". Midbody remnants are also a site of membrane tubulation.

      The effect on total numbers of EVs is buried in the y-axes of the EM graphs, making it difficult to distinguish where a higher n of images was examined vs. where there is an increase in EVs. This is especially hard to interpret given the high difference in n values.

      Fig. 2C - Missing WT error bars

      Fig. 3H, 5C - Why not show raw numbers on the y-axes of the inset graphs like the main graph? Also, if it is only showing a subset of roundness ranges, then the x-axis should not go to 1 (i.e. axis range 0-0.8 would be clearer). I had a hard time figuring out what these insets were trying to show me, so please think about presenting this data more clearly (and larger).

      Discussion - line 377 - "Though we do not claim that Ttyh1 endogenously induces EV formation" This statement could be misinterpreted to say that you do not think endogenous Ttyh1 regulates EV formation. Rephrase as "although we have not examined whether..."

      Discussion - line 400-402 "Our results do not indicate that Ttyh1 does not bind cholesterol, merely that it does not form an interaction that is sufficiently kinetically stable to be co-immunoprecipitated." The phrasing here is confusing with multiple "not". It is better to leave things open than to say what you have not shown. Rephrase suggestion: "Although Ttyh1 was not able to form a kinetically stable interaction for co-immunoprecipitation, it remains to be determined whether Ttyh1 is able to bind cholesterol."

      Movies - I'm not sure what the two videos add. It's difficult to convince myself that I see plasma membrane labeling in either movie, especially in comparison to the over-exposed WGA staining. Also, why are there ~5 sec of empty movie at the end of each?

      Significance

      The data is interesting and well presented, but over interpreted in the discussion. The data on Ttyh1 expression inducing EVs is novel, but limited to overexpression studies. This study will be of interest to the EV, membrane curvature, and Prmn1/Tthy1 fields My expertise is in basic research on membrane trafficking (including EV formation) and lipids

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors report a mass spectrometry (MS)-based interactomics technique, time-resolved interactome profiling (TRIP), which allows for tracking temporal changes in the interactome of protein of interest. To show that TRIP can successfully deconvolute interactomes over time, they pulsed thyroid cells with homopropargylglycine (Hpg), immunoprecipitated the Hpg incorporated thyroglobulin (Tg) and its interacting proteins at different time points, and subjected the samples to tandem mass tag (TMT)-based quantitative MS analysis. The MS results show that WT and variant Tg proteins indeed associate with different proteostasis network factors in a differential manner over the course of time. In addition, they utilized an siRNA-based luciferase fusion assay to evaluate whether silencing each proteostasis network component changes the levels of Tg in both lysate and media. From the combination of the TRIP and siRNA-based assays, they found many hits, including hits implicated in protein degradation, VCP and TEX264, which they validated with multiple experiments.

      I am overall quite positive and think this is an important study. But there are some meaningful points to consider.

      Our Response: We thank Reviewer #1 for their positive outlook on our manuscript and their constructive feedback. We have addressed the comments below.

      Significant comments:

      Reviewer #1, Comment #1: Oonly two replicates of the main data (the TRIP-MS experiments) for this paper is problematic. Especially since the manuscript is supposed to be demonstrating and validating the new technique. Consistent with this concern, the relative enrichment profiles for some of the results were surprising. For instance, interaction with CCDC47 was tapering off but then at 3 h it suddenly reaches the maximum level of engagement. Is this a real finding or the variability in the method? Impossible to tell with two replicates. Presenting heat maps based on biological duplicates is also very problematic. It masks the error, which is large as can be seen in some of the panels showing individual proteins. In my view, triplicates and a clear understanding of the error in the technique should be required.

      Our Response: The TRIP datasets for WT Tg contains 5 biological replicates, while the A2234D and C1264R Tg contains 6 biological replicates. Two replicates are typically included in a TMTpro 16plex mass spectrometry run, and each analysis consists of 3 MS runs. We apologize that the number of replicates and layout of the MS runs was not clearly explained. Data for individual replicates is found in Dataset EV1, Dataset EV3, and a newly added Table EV3 delineates the sample layout across the TMT channels and MS runs. We clarified the text as follows:

      "Subsequently, two sets of TRIP time course samples (0, 0.5, 1, 1.5, 2, and 3 hr) could be pooled using the 16plex TMTpro and analyzed by LC-MS/MS (Fig 2A). In total, 5 biological replicates were analyzed for WT and 6 biological replicates were analyzed for A2234D and C1264R, respectively (Table EV3)."

      Reviewer #1, Comment #2: The same concern arises for the high-throughput siRNA screen, which was performed only in duplicate for WT and A2234D.

      Our Response: While the initial screen was performed in duplicate for WT and A2234D, which is common for larger screens due to resource constraints, we would like to direct the reviewer to the fact that we followed up on observed hits using thyroid cell lines with many more replicates. Furthermore, most hits came from the C1264R Tg variant, which had three replicates in the initial screen. Hits were also extensively followed-up.

      Reviewer #1, Comment #3: *There are issues with some of the immunoprecipitation experiments: In Figure 1C, a negative control for FLAG IP is missing. *

      *-In Figure 2B, I am curious why the band (Hpg -, chase time 0 h) is so faint for the first WB (IB for FLAG) - is Hpg treatment indeed leading to much more Tg present at 0 h? If so, that is a concern. *

      -Also, a negative control must be included (either plain cells or cells expressing fluorescent protein or a different epitope-tagged WT Tg).

      -In this same figure, I am puzzled why the bands for 1.5-3 timepoints in Biotin PD elution, probed for Rhodamine, are very faint especially considering that in Figure 1D, the corresponding bands, which are 4 h after the pulse, look fine. It seems like the IP failed here?

      Our Response: In Fig 2B, we have updated this figure with higher-quality images that are more representative of the results found when performing this experiment. Furthermore, to address the missing negative controls in Fig. 1C, we have added a separate figure (Fig EV2) where (-) FLAG-tagged Tg is included in this panel. We updated the text as follows:

      "Furthermore, the C-terminal FLAG-tag and Hpg labeling are necessary for this two-stage enrichment strategy, and DSP crosslinking is necessary to capture these interactions after stringent wash steps (Fig 1D, Fig EV2)."

      Regarding the Biotin PD rhodamine/TAMRA signal in Fig 2B: The blots in this figure panel represent the time-resolved Tg fractions from cell lysate, corresponding only to intracellular thyroglobulin. The decrease in band intensity for 1.5-3 hr time points is expected due to continued secretion and/or degradation dynamics taking place that decrease the intracellular population of labeled thyroglobulin that is able to be captured. For comparison, please note the C1264R panel (Fig 2C), where the rhodamine/TAMRA signal in the Biotin PD elutions is more stable compared to WT, indicating the cellular retention of C1264R while WT Tg is efficiently secreted and the signal is lost more rapidly. Fig 1D contains samples derived from a 4 hr Hpg pulse (without chase), explaining why the overall fluorescent Tg signal is more intense.

      Suggestion to consider:

      Reviewer #1, Comment #4: This manuscript, supported by the title and abstract, mainly focuses on the presentation of the development and application of TRIP, which is highly significant. The story becomes less coherent and harder to follow as significant amounts of text/figures are dedicated to siRNA-based high throughput screening and follow-up. In addition, although the discovery of TEX264 as one of the hits is very interesting and exciting, TEX264 apparently was not a hit in the TRIP experiment and is pretty distracting from the main point of the paper highlighted in the abstract and title, therefore. The siRNA-based assay and follow-up studies could be a separate scientific story of their own. Especially considering my concerns on the number of replicates for both the TRIP and siRNA-based assay, it could be beneficial to actually split the manuscript into two and conduct more replicates of the -omic work, which should corroborate the exciting discoveries the authors have made.

      Our Response: We have edited the manuscript to hopefully provide a more cohesive presentation of all data, findings, and conclusions within the paper. Given the generally positive outlook on the manuscript from other reviewers and our responses to significant comments from Reviewer #1 we opted to keep the manuscript as a single piece and address all reviewer comments.

      Minor comments:

      Reviewer #1, Comment #5: Throughout the manuscript, the authors have not defined what FT is; presumably it means FLAG tag.

      Our Response: Reviewer #1 is correct in FT corresponding to FLAG tag. We have now edited the manuscript text to clarify this as follows:

      "Thyroglobulin was chosen as model secretory client protein, and we generated isogenic Fischer rat thyroid cells (FRT) cells that stably expressed FLAG-tagged Tg (Tg-FT), including WT or mutant variants (A2234D and C1264R)."

      Reviewer #1, Comment #6: The authors might discuss their rationale for choosing 0-3 hrs for their TRIP studies. That includes any relevant information about the half-life of WT versus variant Tg, whether the Hpg pulse time is short enough to avoid missing key features of the temporal interactome, and discussion of what would happen if the TRIP were performed at prolonged time points (e.g. 6-10 h).

      Our Response: Apologies that we omitted this important point, which is indeed related to the secretion and degradation half-life. We edited the manuscript text to discuss the rationale for 0-3 hr, length of the Hpg pulse and the impact on capturing interactions, and performing TRIP at prolonged time points as follows:

      "Our previous study indicated that ~70% of WT Tg-FT was secreted after 4 hours, while approximately 50% of A2234D and 15% of C1264R was degraded after the same time period (Wright et al, 2021). Therefore, we reasoned that a 3-hr chase period would be a enought time to capture the majority of Tg interactions throughout processing, secretion, cellular retention, and degradation, while still being able to capture an appreciable amount of sample for analysis."

      We explain the labeling timeline and limitations further in the discussion:

      "To address this, we utilized a labeling time of 1 hr which allows us to generate a large enough labeled population of Tg-FT for TRIP analysis, but some early interactions are likely missed within the TRIP workflow. In the case of mutant Tg, performing the TRIP analysis for much longer chase periods (6-8 hrs) may provide insightful details to the iterative binding process of PN components that is thought to facilitate protein retention within the secretory pathway."

      Reviewer #1, Comment #7: Lines 68-69: the two citations should probably come one sentence earlier (at least Coscia et al 2020 is a structure paper).

      Our Response: We agree. We have edited the manuscript as follows to correct this:

      "In earlier work, we mapped the interactome of the secreted thyroid prohormone thyroglobulin (Tg) comparing the WT protein to secretion-defective mutations implicated in congenital hypothyroidism (CH) (Wright et al, 2021). Tg is a heavily post-translationally modified, 330 kDa prohormone that is necessary to produce triiodothyronine (T3) and thyroxine (T4) thyroid specific hormones (Citterio et al, 2019; Coscia et al, 2020). Tg biogenesis relies extensively on distinct interactions with the PN to facilitate folding and eventual secretion."

      Reviewer #1, Comment #8: Line 91: "(Figure 1A)" should follow the sentence "To develop the time-resolved..." to help readers better understand the system.

      Our Response: __We agree. We have edited the manuscript to add the Fig 1A reference. Furthermore, we redesigned the schematic in Fig 1A to better explain the experimental system. (see also __Reviewer #2, comment 10)

      "To develop the time-resolved interactome profiling method, we envisioned a two-stage enrichment strategy utilizing epitope-tagged immunoprecipitation coupled with pulsed biorthogonal unnatural amino acid labeling and functionalization (Fig 1A). Cells can be pulse labeled with homopropargylglycine (Hpg) to synchronize newly synthesized populations of protein. After pulsed labeling with Hpg, samples can then be collected across time points throughout a chase period (Fig 1A, Box 1) (Kiick et al, 2001; Beatty et al, 2006). The Hpg alkyne incorporated into the newly synthesized population of protein can be conjugated to biotin using copper-catalyzed alkyne-azide cycloaddition (CuAAC) (Fig 1A, Box 2). Subsequently, the first stage of the enrichment strategy can take place where the client protein of interest is globally captured and enriched using epitope-tagged immunoprecipitation, followed by elution (Fig 1A, Box 3)."

      Reviewer #1, Comment #9: Line 101: Fisher should be Fischer

      Our Response: Thank you. We have edited the manuscript text to correct this.

      Reviewer #1, Comment #10: Line 131: Should be 1.5 hrs instead of 2 hrs.

      Our Response: We edited this point (see below in comment #11)

      Reviewer #1, Comment #11: Lines 135-136: I do not agree with the claim that HSPA5 profile looked similar for MS and WB. I do not see a peak for HSPA5 at 2 hrs in Figure 2D.

      Our Response: We replaced the mass spectrometry quantification in Fig 2D, E with the scaled, relative enrichments. This provides a more meaningful comparison, as all interactions are scaled in the same way. Unfortunately, it is still difficult to directly compare the Western blot results in Fig. 2B-C to the mass spectrometry quantifications in Fig 2D-E because the WB intensities are not normalized to the Tg bait protein amounts, which is changing over time. At 2-3hrs time points, little WT Tg is pulled down as most of it is secreted. Therefore, the HSPA5 interactions are no longer detectable by Western blot. On the other hand, MS is much more sensitive to capture the interactions. We modified the text as follows:

      "For C1264R, interactions with HSPA5 were highly abundant at the 0 hr time point and remained mostly steady throughout the first 1.5 hours (Fig 2C). A similar temporal profile was also observed for HSP90B1. Additionally, interactions with PDIA4 were detectable for C1264R and were found to gradually increase throughout the first 1.5 hr of the chase period, before rapidly declining (Fig 2C). We noticed similar temporal profiles for PDIA4 and HSPA5 to our western blot analysis, when measured via TMTpro LC-MS/MS as further outlined below (Fig 2D-E). In particular, the HSPA5 WT Tg interaction declined within the first hours, yet for C1264R Tg, the HSPA5 interactions remained mostly steady over the 3-hour chase period. (Fig 2E)."

      Reviewer #1, Comment #12: Line 186: The cited paper Shurtleff et al 2018 is missing in the reference list.

      Our Response: Thank you. We have corrected this in the citation management system and it is now available in the reference list.

      Reviewer #1, Comment #13: Line 188: I disagree with the authors' claim here because, at least for CCDC47, interactions with C1264R seem to come back at the 3 hr time point.

      Our Response: We have removed the discussion of EMC and PAT complex components from the text. The implications of these interactions for Tg biogenesis remain unclear and were therefore a distraction from the discussion of other core proteostasis network components pertinent to Tg processing. Nonetheless, the full dataset - including these interactions - remains available to readers in Appendix Fig S1 for further perusal.

      Reviewer #1, Comment #14: Line 203: I am not sure if P4HA1 can be included in the examples for showing distinct patterns for mutants compared to the WT according to their data in Figure 3H.

      Our Response: We agree. We have edited the text to remove the discussion of prolyl hydroxylation and isomerization family members and elected to discuss the new clustering analysis and the robustness of the TRIP method in more detail. The full TRIP data is nonetheless available to interested readers in Appendix Fig S1.

      Reviewer #1, Comment #15: Line 216: The authors should add citations about the functions of STT3A and STT3B proteins.

      Our Response: We've edited the manuscript text to include a reference to the primary literature for STT3A and STT3B functions, as follows:

      "Previously, we showed that A2234D and C1264R differ in interactions with N glycosylation components, particularly the oligosaccharyltransferase (OST) complex. Efficient A2234D degradation required both STT3A and STT3B isoforms of the OST, which mediate co-translational or post-translational N-glycosylation, respectively (Kelleher et al, 2003; Cherepanova & Gilmore, 2016)."

      Reviewer #1, Comment #16: Lines 248-251, "We found that interactions with these components...": this sentence should refer to Figure 3 - Figure Supplement 3 instead of Figure 3L and S4.

      Our Response: Thank you. This section of the manuscript was significantly rewritten and the figure references updated.

      Reviewer #1, Comment #17: Lines 258-260, "Another striking observation was that the temporal profile of EMC interactions for C1264R correlated with RTN3, PGRMC1, CTSB, and CTSD interactions.": Please provide more evidence to support the potential correlation between different interaction profiles. Or the authors should move this sentence to the discussion section as it sounds speculative. This highlights the issue of only having duplicates, as well.

      Our Response: We agree that this point was highly speculative and we removed discussion of the EMC interactions.

      To further investigate the correlation of interaction profiles across the dataset, we performed unbiased k-means clustering. This led to the identification of 7 and 6 unique clusters of interactors for WT and C1264R Tg-FT, respectively. These data are represented in Fig 3F and Fig EV5. Unique clusters highlight similar temporal interaction profiles for Tg-FT interactors, and provide a quantitative representation of correlative interactions that take place during Tg-FT processing.

      "To assess temporal interaction changes in an unbiased fashion and identify protein groups exhibiting comparative behavior, we carried out k-means clustering of the temporal profiles for WT and C1264R. This analysis revealed a large divergence in the interaction profiles. For WT Tg, only one cluster exhibited steadily decreasing interactions (cluster 4), while others increased with time, or showed peaks at intermediate times (Fig 3F, Fig EV5A). On the other hand, C1264R largely exhibited clusters with decreasing interactions over time (Fig 3F, Fig EV5B). Cluster 2 for WT with biomodal interactions at early and late time points contains many Hsp70/90 chaperoning components. For C1264R Tg, many Hsp70/90 chaperoning components and disulfide/redox-processing components are instead part of cluster 2', which exhibited an initial rise in interactions strength before plateauing (Fig 3F, Fig EV5A,B). This divergent temporal engagement between WT Tg and the destabilized C1264R mutant is aligned with the patterns observed in the manual grouping (Fig 3B,C), highlighting that the unbiased temporal clustering can reveal broader patterns in the reorganization of the proteostasis dynamics."

      One of the clusters of the C1264R Tg interactions contained autophagy interactors along with glycosylation components. We therefore postulate that this could point to a coordination of these processes. We discuss this new point in the updated manuscript:

      "In the k-means clustered profiles, autophagy interactions largely group together in the same cluster, showing stronger interactions at earlier time points. In the same cluster are glycosylation components (UGGT1 and STT3B, MLEC), further supporting a possible coordination for C1264R Tg between lectin-dependent protein quality control and targeting to autophagy (Fig EV5B,C)."

      Reviewer #1, Comment #18: Line 340: As written, should cite more than one paper

      Our Response: Thank you. We reworded the manuscript to correct this, as follows:

      "The discovery of several protein degradation components as hits for rescuing mutant Tg secretion may suggest that the blockage of degradation pathways can broadly rescue the secretion of A2234D and C1264R mutant Tg, a phenomenon similarly found for destabilized CFTR implicated in the protein folding disease cystic fibrosis (Vij et al, 2006; Pankow et al, 2015; McDonald et al, 2022)."

      Reviewer #1, Comment #19: Line 371: Should be Figure 4 - figure supplement 2

      Our Response: We edited the manuscript to correct this error.

      Reviewer #1, Comment #20: Line 1231: "Zhang et al 2018" needs to be removed

      Our Response: We have removed this citation.

      Reviewer #1, Comment #21: Line 1286: FRTR should be FRT

      Our Response: Thank you. We have corrected this within the text.

      Reviewer #1, Comment #22: Figure 3E: Color used to highlight the three proteins (CCDC47, EMC1, EMC4) should match the color used in Figure 3 - Figure Supplement 3

      Our Response: __We have edited Figure 3 to remove the section related to membrane protein biogenesis. This data is still available in __Appendix Fig S1 with consistent color coding.

      Reviewer #1, Comment #23: Figure 4A: The bottom figure where lysate signal is inversely proportional to time is misleading because the authors are assessing steady-state level of proteins in this assay.

      __Our Response: __We agree. We updated the schematic in __Fig 4A __to better explain the workflow and differentiate the steady-state protein level being measured within the lysate.

      Reviewer #1, Comment #24: Figure 4 - Figure Supplement 1 caption: in (C), (F) should be (B). (K) should be (G) and I am not sure what the authors mean when they refer to (J) in caption of (G).

      Our Response: We have corrected this lettering mistake to match the figure properly. Please note that this figure is now Fig EV6, and it includes some new and reorganized panels.

      Reviewer #1, Comment #25: Figure 5 caption for (C and D): Need to specify the time that the samples were collected (8 hrs), as it seems different from A and B according to the main text.

      Our Response: We have specified the collection time within the caption for these data in Fig 5C __and __5D.

      Reviewer #1, Comment #26: Figure 5 - Figure Supplement 1: Data for HERPUD1 and P3H1 should be included.

      Our Response: We have now included data to confirm the knockdown for HERPUD1 and LEPRE1 (P3H1) in Fig EV7F-G.

      Reviewer #1, Comment #27: Figure 5 - Figure Supplement 2B: Please mention in the caption how degradation is defined.

      Our Response: We have updated the Fig EV7H caption to include how "degradation" is defined within these experiments:

      "% Degradation is defined as . Where is the fraction of Tg-FT detected in the lysate at a given timepoint n, and is the fraction of Tg-FT detected in the media at a given timepoint n."

      Reviewer #1 (Significance (Required)):

      Reviewer #1, Comment #28: This manuscript is highly significant because the authors (1) designed and validated a new methodology for time-resolved interactomics study, (2) presented the dynamic changes in Tg interactome for WT and variants, and (3) discovered how proteins implicated in degradation pathways (e.g. VCP, TEX264, RTN3) can change the secretion profile of WT and mutant Tg proteins. With TRIP, the authors demonstrated that they could obtain valuable data that were previously not captured from steady-state interactomics studies (Wright et al. 2021; Figure 3M and Figure 3 - Figure supplement 4D-4I). Furthermore, the authors treated cells with VCP inhibitors and performed both 35S pulse-chase analyses and TRIP. These experiments provide valuable information to the field by (1) presenting a new method to rescue Tg secretion defect, and (2) demonstrating a broader applicability of TRIP. If the major comments above can be addressed I believe this is a tremendous contribution to the field.

      Our Response: We thank Reviewer #1 for their review comments and praise for the work presented within this manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Reviewer #2: In the manuscript 'Time-Resolved Interactome Profiling Deconvolutes Secretory Protein Quality Control Dynamics' Wright et al. developed an approach for time-resolved protein protein interaction mapping relying on pulsed unnatural amino acid incorporation, protein cross linking, sequential affinity purification, and quantitative mass spectrometry named time-resolved interactome profiling (TRIP). The authors applied the TRIP method to compare the interactions of the secreted thyroid prohormone thyroglobulin (Tg) comparing the WT protein to secretion-defective mutations implicated in congenital hypothyroidism. They further employed an RNA interference screening platform (1) to investigate if (1) interactors identified via TRIP are functionally relevant for Tg protein quality control and (2) to identify factors that can rescue mutant Tg secretion. The screen was initially performed in HEK293 cells, but selected hits with a phenotype in HEK cells were then followed up in Fisher rat thyroid cells. Further functional validation was performed by pharmacologic inhibition of VCP, a hit from the RNAi screen with an effect on Tg lysate abundance and Tg secretion. While the authors present a comprehensive study including identification of protein-protein interactions using proteomics followed up by an RNA interference screen for functional validation, major comments need to be addressed for both the proteomics as well as the functional genomics aspects of the study (see comments below).

      Our response: Thank you to reviewer 2 for their constructive feedback. We addressed all comments in detail below.

      Major comments:

      Reviewer #2, Comment #1: The authors describe a new method for quantitative, temporal interaction mapping. The protocol involves two enrichment steps as well as several reactions including cross-linking of the samples as well as functionalization of the unnatural amino acids. Given all these steps, the authors should rigorously characterize the quantitative reproducibility of the experiment when performed in independent biological replicates. This is important because in the final quantitative MS experiment, the authors only use two biological replicates, which is too low especially for such an involved sample preparation procedure, which would expect to have a high variability between replicates. Given the low number of replicates and the unknown reproducibility of the quantification for this protocol, it is questionable at this point how reliable the quantification over the time course is.

      __Our Response: __We apologize that the number of replicates and robustness of the analysis was not entirely clear in our manuscript. We thank the reviewer for the feedback, as this is important point to clarify. We included several additional analyses to further explain the robustness and quantitative reproducibility of our results:

      • We clarified the number of replicates For quantitative MS experiments five biological replicates were analyzed for WT, while six biological replicates were analyzed for A2234D and C1264R Tg-FT, respectively not two as mistakenly presumed by Reviewer #2. These data are available in Dataset EV1 and Table EV3. There is only one place where two biological replicates are included, C1264R Tg-FT FRT cells treated with ML-240 treatment for TRIP analysis. We have further clarified the number of biological replicates within the manuscript text as follows (see also reviewer #1, comment 1):

      "Subsequently, two sets of TRIP time course samples (0, 0.5, 1, 1.5, 2, and 3 hr) could be pooled using the 16plex TMTpro and analyzed by LC-MS/MS (Fig 2A). In total, 5 biological replicates were analyzed for WT and 6 biological replicates were analyzed for A2234D and C1264R, respectively (Table EV3)."

      • We displayed the reproducibility of TRIP time profiles for several individual proteins in Fig EV3 __and in __Fig 3K (VCP). We included shading to indicate the standard error of the mean (SEM) for the individual protein time courses to provide further assessment of the quantitative reproducibility. We updated the text as follows: "To benchmark the TRIP methodology, we chose to monitor a set of well-validated Tg interactors and compare the time-resolved PN interactome changes to our previously published steady-state interactomics dataset (Wright et al, 2021). Previously, we found that CALR, CANX, ERP29 (PDIA9), ERP44, and P4HB interactions with mutants A2234D or C1264R Tg exhibited little to no change when compared to WT under steady state conditions (Fig EV4A). However, in our TRIP dataset we were able to uncover distinct temporal changes in engagement that were previously masked within the steady-state data. Our time-resolved data deconvolutes these aggregate measurements, revealing prolonged CALR, ERP29, and P4HB engagements for both A2234D and C1264R Tg mutants compared to WT (Fig EV4B-F). We found that these measurements for key interactors and PN pathways exhibited robust reproducibility, as exemplified by the standard error of the mean for the TRIP data (Fig EV4B-I, Appendix Figure S1B)."

      • For full transparency, we also include the SEM of all TRIP profiles in the heatmap in Appendix Fig S1B.

      • Furthermore, we included 25-75% quartile ranges for the pathway aggregated time courses (Fig 3B,C,J,K) and the k-means hierarchical clustering analysis (Fig 3F, Fig EV5). Especially these clustering data allow for the visualization and analysis of temporal protein interactions that are correlated with one another, while the accompanying quartile ranges provide further context for the reproducibility of these measurements and cluster profiles (see __Reviewer #1, Comment 17 __above for further explanation about the k-means clustering).

        Reviewer #2, Comment #2: Compared to the previous dataset published last year, the authors discover an overlap in interactors, but also a huge discrepancy, with 96 previously identified interactors not detected in the current study, but 198 additional interactors identified. How do the authors explain the big differences between these datasets?

      __Our Response: __We can only speculate here but this difference in overlapping interactors may stem from several different factors, including but not limited to cell line, instrumentation, LC-MS/MS methodology, and sample processing workflows. Our previous dataset was published using transiently transfected HEK293 cell lines expressed FLAG-tagged constructs of Tg. The HEK293 cell line makes for a robust cell line used throughout several biological investigations, but it is not representative of the native cellular environment in which Tg is expressed. Moreover, transiently transfected cells can lead to high protein expression that may not always represent what is found within the native cellular environment and proteome. Here, we used Fischer rat thyroid (FRT) cells engineered to stably express FLAG-tagged constructs of Tg. This cell line model should more accurately represent the native cellular environment Tg is expressed as it is exclusively found within thyroid tissue. Our previous dataset was collected across two different instruments with similar LC-MS/MS methodology. Here, this dataset was collected on a single instrument after performing further method optimization from our methodology used to acquire the first dataset. In line with our LC-MS/MS methodology development, the sample processing workflows here are quite different. Our previous dataset utilized 6plex TMT labeling with globally immunoprecipitated samples from various Tg constructs. Global immunoprecipitation of Tg leads to much larger protein sample amounts than the TRIP methodology presented here, which we coupled with 16plex TMTpro labeling. This is also one of the reasons we chose to deploy a booster/carrier channel within our experimental labeling schemes.

      Reviewer #2, Comment #3: For the temporal interaction analysis the authors describe differences in the temporal profiles of selected interactions comparing wt and mutant, however no statistical analysis is performed comparing wt and mutant interaction profiles across the time course. Furthermore the variability between the replicates for the temporal profiles is not shown and some of the temporal profiles appear to be noisy. A more rigorous statistical analysis should be performed including additional biological replicates to evaluate the changes over the time course, especially as the temporal interaction analysis is the novelty of this study.

      Our Response: Please also see our response to Reviewer #2, comment 1 above. We previously presented an analysis of the variability of the TRIP measurements (SEM) (now in Appendix Fig S1B). We have since provided further statistical analysis found in the updated Fig 2B,C,J, which include 25-75% quartile ranges for respective proteostasis network pathways. We also included SEM for the time profiles of individual interactors in Fig EV4.

      To assess the divergence in time profiles in an unbiased way, we added a k-means hierarchical clustering analysis (Fig 3F, Fig. EV5). These clustering data allow for the visualization and analysis of temporal protein interaction profiles that are similar to one another and how groups of interactors shift between different clusters for WT Tg and the C1264R mutant.

      Reviewer #2, Comment #4: To functionally validate interactors derived from the TRIP analysis as well as to identify factors that can rescue mutant Tg secretion the authors developed an RNA interference screen. There are a number of aspects that need to be addressed/clarified for this part of the study.

      Our Response: We have added some clarifying changes to the text and the figure panels associated with the siRNA screening and follow-up experiments on the trafficking and degradation factors that rescue Tg secretion. We have addressed other comments from Reviewers #3 and #4 related to these portions of the paper and hope that Reviewer #2 finds them satisfactory.

      Reviewer #2, Comment #5: While the authors validate the stable cell lines expressing the nanoluciferase tagged Tg and the linearity of luminescence signal in lysate and media carefully, they do not validate their platform in combination with the RNAi knockdown strategy. The authors should select genes as positive controls that are expected to modulate Tg secretion and demonstrate that the knockout of these positive controls indeed results in changes in Tg secretion in their system.

      Our Response: This is an excellent suggestion and certainly something we would have done given any prior knowledge on known control genes that would positively or negatively regulate Tg secretion. The purpose for developing the siRNA screening platform was to investigate and hopefully discover genes that are able to positively or negatively regulate Tg processing. We have done so to the best of our ability, identifying for example NAPA which positively regulates WT Tg secretion, as seen by the decrease in WT Tg secretion when treated with NAPA siRNA. Conversely, we found that VCP may negatively regulate C1264R Tg secretion, as discovered by the increase in secretion with VCP siRNA or ML-240 treatment. We included a standard "TOX" siRNA control, which we knew would likely negatively affect WT Tg secretion and this was indeed the case. As we stated within the manuscript:

      "This is the first study to broadly investigate the functional implications of Tg in-teractors and other PQC network components on Tg processing."

      Reviewer #2, Comment #6: For the screen the authors select 167 Tg interactors and PN (Proteostasis network) related factors. This statement is very vague and the authors should clarify which genes were knocked down and which criteria were applied to narrow down the list of interactors and to select PN factors. The authors should therefore provide a supplementary table including all genes included in the screen, their source (were this derived from the initial study by Wright et al, from the current study or compiled from prior knowledge about PN), as well as their results from the screen based on luminescence in media and lysate. It is unclear how many of the selected factors are actually coming from the TRIP analysis.

      Our Response: The list of genes included within the siRNA screen, as well as the results were previously included, and are now included in Appendix Fig S2. We have further provided the information requested by Reviewer #2 within Dataset EV5 indicating whether a gene was included in the siRNA screen due to its identification within our previous proteomics dataset (Wright et al, 2021.), the proteomics dataset presented here, or based upon primary literature. We added a comment in the text:

      "Moreover, we were interested in identifying factors whose modulation may act to rescue mutant Tg secretion. HEK293 cells were engineered to stably express nanoluciferase-tagged Tg constructs (Tg-NLuc) and screened against 167 Tg interactors and related PN components (see Dataset EV5 for the list of genes)."

      Reviewer #2, Comment #7: Only a small number of the 167 selected genes shows an effect on Tg abundance/secretion. How do the authors explain this result? Would we not expect that Tg interactors, especially those from the TRIP method which interact with the newly synthesized are more enriched for functionally relevant genes.

      Our Response: The proteostasis network contains genes and proteins of high redundancy in structure and function, and many single-gene knockdowns are likely insufficient to have a large impact on Tg abundance or secretion. In fact, these results are in line with what we would have expected when designing these experiments. Our goal here was to identify the key players that control Tg protein quality control.

      We explain the proteostasis network redundancy in the manuscript:

      "The functional implications of protein-protein interactions can be difficult to deduce, especially in the case of PQC mechanisms containing several layers of redundancy across stress response pathways, paralogs, and multiple unique proteins sharing similar functions (Wright & Plate, 2021; Bludau & Aebersold, 2020; Karagöz et al, 2019; Braakman & Hebert, 2013)."

      Reviewer #2, Comment #8: The authors initially performed the screen in HEK293 cells and as a second step wanted to validate the hits from the HEK cells in more relevant Fisher rat thyroid cells. Indeed they could show that knockdown of NAPA increased WT TG in lysate and decreased WT Tg secretion. Furthermore, they further validated genes to modulate mutant Tg lysate and media abundance. The authors should perform a rescue experiment to demonstrate that the observed phenotype can be reversed through re-introduction of NAPA.

      Our Response: We have now performed the requested NAPA complementation experiments and provided the data within Fig EV 7I. Overexpression of a human, siRNA-resistant NAPA construct partially reversed the increase in WT Tg lysate retention. These results further support the identification of NAPA as a pro-trafficking factor for WT Tg. We updated the manuscript text to include these data as follows:

      "To understand if these results were directly attributable to NAPA function, we performed complementation experiments where FRT cells treated with NAPA siRNAs were co-transfected with a human NAPA plasmid. WT Tg lysate abundance decreased when NAPA expression was complemented, confirming that the observed retention phenotype could be attributed to NAPA silencing (Fig EV7I). These results established that NAPA acts as a pro-secretion factor for WT Tg."

      Reviewer #2, Comment #9: One hit from this analysis was the ER-phagy receptor TEX264, while TEX264 was not identified in the TRIP data, is selectively increased the C1264R secretion, but not wt and the other Tg mutant. Following Co-IP data however revealed some interaction between the C1264R and to a lesser extent the A2234D mutant. How do the authors explain that TEX264 was missed in the TRIP dataset?

      Our Response: The TRIP samples are of much lower protein abundance compared to globally purified samples used for the Co-IP analysis. While the interaction is seen with the globally purified Co-IP samples, this interaction is likely much more difficult to capture with the low abundance, time-resolved samples that are acquired through the TRIP workflow, especially if this interaction is transient or requires the coordination of other accessory proteins as has been detailed in the literature and discussed within the manuscript presented here:

      "While A2234D and C1264R Tg were preferentially enriched with TEX264 compared to WT, it remains unclear what other accessory proteins may be necessary for the recognition of TEX264 clients (Chino et al, 2019; An et al, 2019). Furthermore, TEX264 function in both protein degradation and DNA damage repair further complicates siRNA-based investigations (Fielden et al., 2022). Further investigation is needed to fully elucidate 1) if Tg degradation takes place via ER-phagy and 2) by which mechanisms this targeting is mediated."

      Minor comments:

      Reviewer #2, Comment #10: The workflow needs to be described clearer. For example, it should be better explained why the authors selected a two-stage enrichment strategy, I assume that the first based on the Flag affinity tag is to purify the protein of interest and the second step based on the incorporation and functionalization of the unnatural amino acids to enrich for the newly synthesized fraction at specific time points after protein synthesis? These are critical steps for the method but the rationals are not well explained, neither in the text nor the figures captures all these steps of the method very clearly, which makes it really difficult for the reader to understand the individual steps of the method. Moreover, the structures in Figure 1 workflow are not clearly labeled, so that it is confusing which part represents which protein/molecule.

      Our Response: Thank you for this feedback. We have updated Fig 1 to provide more detail to provide more clarity for the readers. Furthermore, we have edited the text to more clearly describe the workflow:

      "To develop the time-resolved interactome profiling method, we envisioned a two-stage enrichment strategy utilizing epitope-tagged immunoprecipitation coupled with pulsed biorthogonal unnatural amino acid labeling and functionalization (Fig 1A). Cells can be pulse labeled with homopropargylglycine (Hpg) to synchronize newly synthesized populations of protein. After pulsed labeling with Hpg, samples can then be collected across time points throughout a chase period (Fig 1A, Box 1) (Kiick et al, 2001; Beatty et al, 2006). The Hpg alkyne incorporated into the newly synthesized population of protein can be conjugated to biotin using copper-catalyzed alkyne-azide cycloaddition (CuAAC) (Fig 1A, Box 2). Subsequently, the first stage of the enrichment strategy can take place where the client protein of interest is globally captured and enriched using epitope-tagged immunoprecipitation, followed by elution (Fig 1A, Box 3). The second enrichment step can then utilize a biotin-streptavidin pulldown to capture the Hpg pulse-labeled, and CuAAC conjugated population, enriching samples into time-resolved fractions (Fig 1A, Box 4) (Li et al, 2020; Thompson et al, 2019)."

      Reviewer #2, Comment #11: Except for the general workflow shown in Figure 1, a more detailed workflow showing the experimental steps, such as the sample fractions with the following steps could be added so that the design of the method is clearer. Also the style of the workflows including Figure 1, Figure 2A, and Figure 3A are different. It would be helpful to make them the same style and make the Figure 2A as a zoom in or more detailed illustration on part of Figure 1.

      Our Response: Thank you for this feedback. In addition to updating Fig 1, we also expanded Fig 2A to more clearly outline the experimental steps in the TRIP workflow. Assuming the term "style" used here is in reference to color pallets and figure schematics used, these have been updated to ensure they are agreeable aesthetically across manuscript figures.

      Reviewer #2, Comment #12: A summary of proteomics results of time course labeling after all enrichment steps, including the total number of identified proteins at different conditions and control would be helpful for having an overview impression on the proteomics results

      Our Response: __We have included an updated __Dataset EV1 that provides a summary of proteomics data included which runs given proteins were identified in, % of TMT channels quantified, % of Hpg Pulse channels quantified, and generally number of proteins quantified across runs for each construct.

      Reviewer #2, Comment #13: In Figure 2B, the WB for PDIA4 in the Biotin PD elution is missing. Why was the PDIA4 interaction missing for the time course analysis, but the interaction was captured in the initial test for Wt Tg (Figure 1D). Additionally, in this panel the Rhodamine Probe Gel shows inconsistencies at the time points 1.5 - 3h. Does this mean that the labeling did not work well for these conditions? As we would expect a consistent Rhodamine Probe signal at every time point.

      Our Response: Please also see our response to Reviewer #1, comments 3 & 11. Fig 1D features continuous Hpg labeling for 4 hours to ensure that most intracellular Tg is labeled for this proof-of-concept experiment for the two-stage enrichment strategy. Fig 2B features a shorter 60 minute pulse of Hpg labeling, prior to the full chase period and two-stage enrichment strategy. PDIA4 interactions were detectable throughout Fig 1D because those measurements captured a larger population of labeled Tg, whereas in Fig 2B Tg bait protein amounts were much smaller after the two-stage enrichment procedure to capture the time-synchronized population.

      The Rhodamine/TAMRA Probe Gel in Fig 2B does not have inconsistencies in Tg abundance, but highlights the fact that pulse labeled WT Tg is being secreted or degraded in FRT cells. As you would expect as time continues during the chase period, intracellular WT Tg signal decreases as secretion and degradation take place. Constant Rhodamine/TAMRA probe signal would not be expected here. Consistent with this, the C1264R Tg signal remains more stable for the intial time course. This is expected as the C1264R Tg variant is retained intracellular undergoing increased interactions the proteostasis network. We have removed the PDIA4 panel for WT Tg because there was no signal above the detection limit. This is now explained as follows:

      "For WT Tg, interactions with HSPA5 peaked within the first 30 minutes of the chase period and rapidly declined, in line with previous observations, but PDIA4 interactions were not detectable by western blot analysis (Fig 2B) (Menon et al, 2007; Kim & Arvan, 1995)."

      Reviewer #2, Comment #14: In Figure 2, why was there no WB results for the A2234D? In Figure 2D and 2E, at which time point are the changes significant compared to WT?

      Our Response: We did not perform the WB experiments with A2234D. We used WT and C1264R Tg in our proof of concept experiments via WB and decided to move forward with analyzing A2234D Tg by LC-MS/MS. Please see our response above to Reviewer #2, comment 3 for information on the statistical analysis.

      Reviewer #2, Comment #15: All figure legends should indicate how many biological replicates were performed for each experiment represented in the figure.

      Our Response: We have updated the figure captions to include this information where applicable.

      Reviewer #2, Comment #16: The heatmaps shown in Figure 3, Figure 3 - Figure Supplement 3, and Figure 7 are in the current form incomprehensible. The heatmaps depict the relative enrichment vs the control sample, which was scaled between 1 and -1. The color coding with 5 different colors from 1 to -1 is very confusing and should be changed to just two colors, one for positive and one for negative relative enrichment. I would also suggest changing the visualization of the heatmap showing the wt and mutants side by side, instead of stacked on top of each other for each individual protein.

      Our Response: Thank you for this feedback, and we apologize for the confusion. We adjusted our data analysis approach by removing previous negative enrichment values. As these served only as "background" within the dataset, they did not carry much meaning. The TRIP enrichment is now scaled from 0 to 1, where a value of 1 represents the time point at which the enrichment is greatest, while 0 represents the background intensity in the (-) Hpg control sample. The associated figures have been updated accordingly, and we feel they are now more comprehensible and aesthetically pleasing.

      We opted to keep the Viridis color scheme in the heatmap to allow for more nuanced differentiation of the enrichment values.

      Reviewer #2, Comment #17: The data analysis method for generating relative enrichment shown in the heatmap is not explained. This should be described in the method section for a better understanding of the data analysis.

      Our Response: We have edited the methods section as follows to better explain the analysis:

      "For time resolved analysis, data were processed in R with custom scripts. Briefly, TMT abundances across chase samples were normalized to Tg TMT abundance as described previously and compared to (-) Hpg samples for enrichment analysis (Wright et al, 2021). For relative enrichment analysis, the means of log2 interaction differences were scaled to values from 0 to 1, where a value of 1 represented the time point at which the enrichment reached the maximum, and 0 represented the background intensity in the (-) Hpg channel. Negative log2 enrichment values were set to 0 as the enrichment fell below the background."

      Reviewer #2, Comment #18: There are no legends of flowcharts in Figure 2A and Figure 3A and it is difficult to understand which are the key components in the complex and what are the differences among different periods of labeling.

      Our Response: We have now consolidated Fig 2A and Fig 3A into a single panel found in Fig 2A, which is significantly reorganized to better explain the TRIP workflow. The caption has additionally been updated to highlight key steps within the workflow with numbering to allow readers to follow and visualize the steps more easily. The figure caption now reads as follows:

      "(A) Workflow for TRIP protocol utilizing western blot or mass spectrometric analysis of time-resolved interactomes. (1) Cells are pulse-labeled with Hpg (200μM final concentration) for 1 hr, chased in regular media for specified time points, and cross-linked with DSP (0.5mM) for 10 minutes to capture transient proteoastasis network interactions; (2) Lysates are functionalized with a TAMRA-Azide-PEG-Desthiobiotin probe using copper CuAAC Click reaction; (3) Lysates undergo the first stage of the enrichment strategy where the Tg-FT is globally captured and enriched using immunoprecipitation; (4) Eluted Tg-FT populations from the global immunoprecipitation undergo biotin-streptavidin pulldown to capture the pulse Hpg-labeled, and CuAAC conjugated population of Tg-FT, enriching samples into time-resolved fractions; (5) Time-resolved fraction may then undergo western blot analysis or (6) quantitative liquid chromatography - tandem mass spectrometry (LC-MS/MS) analysis with tandem mass tag (TMTpro) multiplexing or analysis. The (-) Hpg control channel is used to identify enriched interactors and a (-) Biotin pulldown channel to act as a booster (or carrier)."

      Reviewer #2, Comment #19: Why did only one of the VCP inhibitors (ML-240) exhibit a phenotype in Tg abundance and secretion, but not the other VCP inhibitors?

      Our Response: Please also see our response to Reviewer #3, comment 2 below. This could be due to a number of reasons, but we added a brief discussion on the mechanisms of action for the inhibitors that may at least partially explain the differences in phenotype seen with the VCP inhibitors. We updated the text as follows:

      "ML-240 and CB-5083 are ATP-competitive inhibitors that preferentially target the D2 domain of VCP subunits, whereas NMS-873 is a non-ATP-competitive allosteric inhibitor which binds at the D1-D2 interface of VCP subunits (Chou et al, 2013, 2014; Anderson et al, 2015; le Moigne et al, 2017; Tang et al, 2019). ML-240 and NMS-873 have been shown to decrease both proteasomal degradation and autophagy, in line with VCP playing a role in both processes (Chou et al, 2013, 2014; Her et al, 2016). Conversely, while CB-5083 is known to decrease proteasomal degradation it has been shown to increase autophagy. (Anderson et al, 2015; le Moigne et al, 2017; Tang et al, 2019)."

      Reviewer #2 (Significance (Required)):

      Reviewer #2, Comment #20: __The authors __describe a novel and elegant method to map time resolved protein interactions of newly synthesized proteins, which allows monitoring of proteins regulating protein quality control.

      Authors describe it as a general method, however, they only demonstrate the applicability to one protein and do not systematically evaluate the quantitative nature of their approach by determining quantitative reproducibility, which would be necessary to be able to claim that this is a method with broad applicability.

      Given my expertise in quantitative proteomics, I can mainly comment on the technological aspects of the proteomics part of the manuscript, but do not feel qualified to evaluate the significance of this study in terms of novel biology. Nevertheless, it feels that there is a stronger emphasis on the biology in the current form of the manuscript which will raise interest of scientists with a focus on protein quality control and Tg biology.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Please place your comments about significance in section 2.

      In this manuscript, the authors describe their efforts to develop a methodology for determining time-resolved protein-protein interactions using quantitative mass spectrometry. With TRIP (time-resolved interactome profiling), they combine a pulsed bio-orthogonal unnatural amino acid labelling (homopropargylglycine, Hpg), CuAAC conjugation and biotin-streptavidin pulldowns to enrich at different timepoints and time-resolve by combining TMT labelling and LC-MS/MS (Figure 1). This technique is then applied to the maturation of the secreted WT and mutant thyroglobulin (Tg-WT, Tg-C1264R, Tg-A2234D) expressed in HEK293 and rat thyroid cells (FRT) and linked to hyperthyroidism. There, they identify a collection of ER resident proteins involved in protein folding/processing (e.g. chaperones, redox, glycans, hydroxylation) as well as degradation (e.g. autophagy, ERAD/proteasomes) (Fig. 2). Here the authors effectively use pulse-labelled form of TRIPs to highlight the different interactions formed with Tg-WT vs. Tg-mutants during biogenesis and secretion (or retention). The analysis found ~200 new interactions compared to previous studies along with about 40% of those identified previously. Differences in interactions were observed for mutants, which shown extended interaction with chaperones and redox processing pathways. While many interactions appeared as might be expected, the identification of membrane protein processing elements (e.g. EMC, PAT) was puzzling and raised some questions about the specificity within the protocol. Mutants enriched for CANX CALR and UGGT, suggesting prolonged association with glyco-processing factors. Interaction of C1264R with the ER-phagy factors CCPG1 and RTN3 was greater than WT. The authors note that their interaction correlated with that of EMC1 & 4, but it is not clear why that might be.

      With interactors in hand, the authors complemented the TRIP protocol with siRNA KD of identified factors, to investigate any changes to secreted vs intracellular Tg upon loss. KD of NAPA (a-SNAP) and LMAN1 increased WT lysate (intracellular) Tg but not mutants. NAPA also reduced Tg-WT secretion. In contrast, KD of NAPA increased A2234D secretion while LEPRE1 increased C1264R (but not A2234D or WT), suggesting mutants have differential processing paths and requirements. KD of VCP increased secretion of both mutants. Some ER-phagy receptors were found among interactors (e.g. RTN3 in Tg-C1264R only) but often their KD had no impact on secretion (CCPG1, SEC62, FAM134B). NAMA observations were recapitulated in thyroid derived cell line (FRT). KD of TEX264 and VCP increased Tg-C1264 secretion while RTN3 KD in FRTs decreased Tg-C1264 secretion. This was in contrast to data from HEK293s for reasons that are not clear. Co-IP with TEX264 enriched for all Tg forms but more so for C1264R and A2234D - motivating the authors to propose selective targeting of Tg to TEX264 and the consideration of ER-phagy as a "major" degradative pathway during Tg processing.

      Given the observations with siRNAs to VCP, the authors next use a selection of VCP inhibitors to ask whether secretion can be rescued upon pharmacological impairment of the AAA ATPase. They observed that ML-240, but interestingly not the more conventionally used CB-5083 or NMS-873, increased secretion of Tg-C1264R but not lysate. Inhibitors increased lysate but decreased the secreted fraction for Tg-WT (Fig 7). Finally, the authors used TRIP again in ML-240 treated Tg-C1264R expressing cells to look for changes to interactome with treatment - observed decreases to glycan and chaperone interactions, CANX and UGGT1, decreased interaction with DNAJB11 and C10, like that of WT. There was no apparent change to the UPR, although activation was not directly measured.

      Major comments:

      Reviewer #3, Comment #1: __Are the key conclusions convincing? __The TRIP methodology appears to be quite robust and should be a powerful strategy for this field and others going forward. The drawback will be the length of pulse required will limit the number/type of proteins to be monitored to ones with longer t1/2's. There were interesting interactions found with Tg and the mutants linked to hyperthyroidism, but cut and dry differences did not appear as obvious, even though strong "trends" appear to be present. The path from identifying interactors in a time-resolved manner to then following them up with targeted KD does provides some clarity, which is important.

      Our Response: We thank Reviewer #3 for their time in reviewing our manuscript and providing this positive feedback. We have enhanced our analysis of the TRIP data to more clearly highlight difference in time profiles between WT and mutant variants. Please see our response to Reviewer #2, comment 1 & 3. We also highlight the limitations of the time resolution in the discussion (see also Reviewer #2, comment 6):

      "To address this, we utilized a labeling time of 1 hr which allows us to generate a large enough labeled population of Tg-FT for TRIP analysis, but some early interactions are likely missed within the TRIP workflow. In the case of mutant Tg, performing the TRIP analysis for much longer chase periods (6-8 hrs) may provide insightful details to the iterative binding process of PN components that is thought to facilitate protein retention within the secretory pathway."

      We have addressed all further comments below.

      __Reviewer #3, Comment #2: __Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? The data regarding VCP silencing and pharmacological impairment appear clear but leave some questions outstanding in this reviewer's opinion. The lack of effect with the 2 highly selective inhibitors suggests that the underlying mechanism for switching fate of intracellularly retained Tg-C1264R towards secreted forms is not at all clear. ML-240 is an early derivative of DBeQ and reportedly impairs both ERAD and autophagic pathways, similarly to DBeQ. The differences between the VCP inhibitors' mechanism of action were not discussed, but perhaps should be elaborated upon, particularly in the matter of how ERAD and ER-phagy pathways might be being differentially affected. At the risk of asking for too many additional experiments, this reviewer would just prefer to see this fleshed out in a bit more detail.

      Our response: We agree with Reviewer #3 that the underlying mechanism for switching fate of the intracellular retained Tg-C1264R towards secreted forms remains unclear. We have added additional text to discuss further the details surrounding the inhibitors used and the general manner in which ERAD and ER-phagy pathways can be affected. This added text reads as follows:

      "ML-240 and CB-5083 are ATP-competitive inhibitors that preferentially target the D2 domain of VCP subunits, whereas NMS-873 is a non-ATP-competitive allosteric inhibitor which binds at the D1-D2 interface of VCP subunits (Chou et al, 2013, 2014; Anderson et al, 2015; le Moigne et al, 2017; Tang et al, 2019). ML-240 and NMS-873 have been shown to decrease both proteasomal degradation and autophagy, in line with VCP playing a role in both processes (Chou et al, 2013, 2014; Her et al, 2016). Conversely, while CB-5083 is known to decrease proteasomal degradation it has been shown to increase autophagy. (Anderson et al, 2015; le Moigne et al, 2017; Tang et al, 2019)."

      "As we discovered that pharmacological VCP inhibition with ML-240 can rescue C1264R Tg secretion yet is detrimental for WT Tg processing, it is unclear whether VCP may exhibit distinct functions for WT and mutant Tg PQC. Finally, as ML-240 is shown to block both the proteasomal and autophagic functions of VCP it is unclear which of these pathways may be playing a role in the rescue of C1264R, or detrimental WT processing (Chou et al, 2013, 2014)."

      __Reviewer #3, Comment #3: __Q1. The degree (if any) of Tg-C1264 aggregation during and/or detergent solubility do not appear to have been considered as a potential source of the increase in released secreted material (Figure 4, 5). Do Tg mutants partition into RIPA-insoluble fractions at all? That is to say.. is the total population of synthesized Tg being considered? A full accounting? Could the authors address this and if biochemical extraction data (via urea or high SDS) is available, include it to answer this concern.

      Our response: The transient aggregation of Tg has been investigated in some detail previously (Kim et al, 1992, 1993). The transient aggregates have the ability to partition into RIPA-insoluble fractions. Of note, these aggregates are shown to be made up, at least in part, of mixed disulfide linkages requiring reducing agent to fully resolubilize. With that being said, these aggregates represent a minority of the overall Tg population. In our prior manuscript (Wright, et al. 2021), we quantified the RIPA-insoluble fraction found in the pellet (see Supplemental Info Fig. 5). As the majority of Tg remains soluble during processing it should be able to be captured via our TRIP methodology. That is to say, we are capturing most of the Tg that is available for analysis while understanding that some smaller population of Tg remains in RIPA-insoluble fractions.

      __Reviewer #3, Comment #4: __Q2. Along the same lines, what does Tg-WT and mutant expression look like by microscopy? Is Tg-WT uniformly distributed while Tg-mutants appear in puncta... more aggregated - perhaps reflecting the increased engagement of chaperones and redox machinery? Changes in the pattern of Tg-C1264R mutant (e.g. w/ VCP KD or inhibition) would add additional support for the authors interpretation of improved secretion. If this data is at hand, including it might be worth consideration.

      Our response: Thank you for this suggestion. The subcellular localization of Tg and any changes from proteostasis modulation is an ongoing area of follow up work in our lab. We have some preliminary results that the localization for WT and C1264R Tg indeed differs. However, given that this manuscript is already dense in information, we opted to reserve this data for a future manuscript where we plan to further elucidate the targeting mechanism of mutant Tg to VCP or TEX264. We direct the reviewer to work published by Zhang et al, 2022,(https://doi.org/10.1016/j.jbc.2022.102066) showing a staunch difference of WT vs mutant Tg in the localization from intracellular to a secreted population in rat tissue. While most all WT Tg is found in the follicular lumen (secreted), mutant Tg heavily co-localizes with the ER resident chaperone BiP. While this paper does not go into detail on the differences in subcellular localization, it further highlights the drastic changes in Tg processing and how these manifest in distinct differences in localization within tissue.

      __Reviewer #3, Comment #5: __Q3. Does the level of Tg mutant expression in the FRT clones impact the profiles obtained by TRIP? (Figure 3). This is a question of gauging the relative saturation of QC machinery and how that might impact profiles from TRIP. Were clones expressing at different levels tested? Perhaps a brief discussion of this.

      Our response: We do not foresee an impact from level of Tg expression on the profiles obtained by TRIP. We were able to identify distinct profiles because we processed the data and normalized it based on the relative Tg amount. For example, while WT and A2234D Tg are expressed at similar levels intracellularly, we were able to identify distinct differences in the interaction profiles across the two constructs. When developing FRT clones, we selected those that were expressed at similar levels and, therefore, did not have the capability to directly test differences, if any, in observed profiles that may be the result of different expression levels of the same Tg construct. Furthermore, Tg can make up 50% of all protein content within thyroid tissue (Di Jeso & Arvan, 2016). As such, thyroid cells are adept at maintaining the balance of QC machinery to process thyroid. Therefore, we do not anticipate that the amount of Tg expressed in TRIP experiments would have a significant impact on the profiles that we were able to observe.

      __Reviewer #3, Comment #6: __Q4. For Figure 3, the hour-long labelling period seems a bit long, compared with 3 hr of chase. Perhaps this reviewer missed this but how long does Tg take to mature and/or mutants to misfold and degrade? Is there any possibility to shorten this so that the profiles of labelled Tg could be more synchronized? If not, perhaps this could just be discussed.

      Our response: While the 1-hour labeling period may seem long, we had to balance the labeling time to 1) label a large enough population of Tg for it to remain detectible throughout the chase period, and 2) keep the chase period long enough to capture the large majority of Tg processing. In our hands we found that by 4 hours WT Tg was ~63% secreted, with ~25% retained intracellular (Fig EV7H). Conversely, we found that C1264R remains very stable over this period with most protein being retaining intracellularly and little degradation taking place (Fig EV9A). Hence, we opted for the overall ~4 hour total for sample processing (1 Hr pulse labeling + 3 hour chase period for time point collections). Literature suggest that WT Tg takes ~2 hours to be processed within the ER and reach the medial golgi. This is exemplified by the EndoH resistant population that appears at this ~2 hour time point (Menon et al. JBC. 2007). Please also see our response to Reviewer #1, comment 6. We updated the text as follows:

      "We pulse labeled WT Tg FRT cells with Hpg for 1 hr, followed by a 3 hr chase in regular media capturing time points in 30-minute intervals and analyzing via western blot or TMTpro LC-MS/MS (Fig 2A). Our previous study indicated that ~70% of WT Tg-FT was secreted after 4 hours, while approximately 50% of A2234D and 15% of C1264R was degraded after the same time period (Wright et al, 2021). Therefore, we reasoned that a 3-hr chase period would be a enought time to capture the majority of Tg interactions throughout processing, secretion, cellular retention, and degradation, while still being able to capture an appreciable amount of sample for analysis."

      We anticipate that this labeling period can be decreased with future iterations of this methodology. This will also be bolstered by the continued improvements that come about within quantitative proteomics in increased instrument sensitivity and improved sample preparation methods that have the ability to decrease sample loss.

      We explain the labeling timeline and limitations further in the discussion:

      "To address this, we utilized a labeling time of 1 hr which allows us to generate a large enough labeled population of Tg-FT for TRIP analysis, but some early interactions are likely missed within the TRIP workflow. In the case of mutant Tg, performing the TRIP analysis for much longer chase periods (6-8 hrs) may provide insightful details to the iterative binding process of PN components that is thought to facilitate protein retention within the secretory pathway."

      __Reviewer #3, Comment #7: __Q5. It is curious that only ML-240 and not other well characterized inhibitors of VCP/p97, has an effect, as both are used far more often than ML-240. The authors do not really address this in detail but does it suggest that the ML-240 effect on VCP/p97 could be affecting different pathways, given the nature of this compound. Is this compound acting on Tg-C1264R maturation at the level of translation or post-translationally? If the latter, through what means?

      Our Response: We thank Reviewer #3 for appreciating this surprising finding. We were similarly curious as to how, or why ML-240 was able to elicit this effect compared to other VCP inhibitors. We elaborated in the manuscript text on these compounds and on how the ERAD and ERphagy pathways, utilizing VCP, may be differentially regulated (See response to__ Reviewer #3, Comment 2__). While speculative, we believe that ML-240 acts on C1264R Tg maturation post-translationally. This is given by the fact that ML-240 does not seem to affect the translational velocity of C1264R Tg, as Fig EV9A shows similar levels of 35S-labeled C1264R in DMSO or ML-240 treated cells. It may be the case that acute treatment with ML-240 alters the folding vs degradation balance of the ER proteostasis network in such a way that some population of C1264R that is usually degraded is able to be secreted. Another Tg mutation G2320R was shown to be degraded via the proteasome in PLCCL3 thyrocytes, as MG-132 treatment slowed mutant Tg degradation (Menon et al. JBC. 2007), although G2320R degradation was not be exclusively proteasomal. The L2284P Tg mutation exemplified similar results to G2340R where MG-132 slowed degradation. Furthermore, L2284P Tg was not affected by autophagic/lysosomal inhibitors chloroquine and E64 (Tokunaga et al. JBC. 2000), suggesting ERAD more exclusively degrades L2284P. It is unclear which degradation pathway, ERAD or ER-phagy, may be the predominate pathway for C1264R Tg degradation. Furthermore, we do not exclude the possibility that both may be at play and affected by treatment with ML-240.

      We utilized our HEK293 Tg-NLuc cells and screened other proteasomal and lysosomal inhibitors bafilomycin and bortezomib. Neither of these compounds were able to rescue A2234D or C1264R secretion, highlighting that the effect is specific to ML-240 treatment. This new data is now shown in __Fig EV10A,B __and described in the text:

      "To understand whether this rescue in secretion was uniquely linked to VCP inhibition or could be more broadly attributed to blocking Tg degradation, we tested the proteasomal inhibitor bortezomib, and lysosomal inhibitor bafilomycin. Bafilomycin increased WT Tg lysate abundance, and bortezomib significantly increased A2234D lysate abundance, consistent with a role of these degradation processes in Tg PQC (Fig EV10A). When monitoring Tg-NLuc media abundance, neither bafilomycin nor bortezomib significantly altered WT, A2234D, or C1264R abundance (Fig. EV10B). confirming that general inhibition of proteasomal or lysosomal degradation does with rescue mutant Tg secretion."

      __Reviewer #3, Comment #8: __Q6. Continuing from Q5.. At what point and where is VCP/p97 able to affect mutant Tg processing? In line 317, the authors seem to correlate increased VCP association with mutants to their increased secretion. It is not clear how this would result, as engagement with VCP would be in a compartment different to that which supports trafficking and secretion. Could the authors expand on how this might come about. This is also relevant to the ML-240 data in Figure 7. Moreover, VCP is associated with ERAD (as is HerpUD1) rather than ER-phagy and at least in the siRNA raw data, there are also effects from Derlin3 and FAF2 KDs.. both ERAD factors. Some clarity here would be appreciated.

      Our Response: This line of discussion in the text was meant to suggest that, since VCP showed a higher enrichment for mutant Tg, particularly C1264R, it would make sense that inhibiting VCP would have a larger effect on mutant Tg processing as compared to WT Tg. As we saw with the siRNA screening data, suppression of VCP resulted in increased C1264R secretion, while not affecting WT Tg processing. This passage was not intended to suggest that increased VCP association with mutant Tg found within the TRIP dataset was the reason for rescued secretion. These are two different sets of experiments and environments in which these data are captured. We were simply looking for the opportunity to bridge the findings from the two sets of experiments to a single discussion point. Of note, we understand that VCP is associated with ERAD and acts to regulate autophagy. Given that core autophagy machinery is relevant for both bulk autophagy and ER-phagy, we did not want to rule out the fact that VCP inhibition via ML-240 could affect autophagic flux in these experiments (Chou et al. Chemmedchem. 2013; Khaminets et al. Nature. 2015; Hill et al. Nat. Chem. Bio. 2021.)

      It is great that the reviewer also noted that DERL3 and FAF2 knockdown increased C1264R Tg secretion. Since these ERAD factors did not reach the defined threshold in the screen, we did not include further discussion, but this data remains available in Appendix Fig S3. We have updated the manuscript text to clarify the previous points we aimed to make. The text now reads as follows:

      "VCP silencing exclusively affecting mutant Tg corroborates our TRIP dataset, and suggest a more prominent role for VCP in mutant Tg PQC compared to WT. VCP interactions were sparse for WT Tg while they remained more steady throughout the chase period for the mutants (Fig 3H,K)."

      __Reviewer #3, Comment #9: __Q7. There does not appear to be a direct demonstration of Tg-C1264R turnover by ER-phagy (via TEX264). Given the inconsistency with it not being detected by TRIP, while another receptor RTN3 was, but has not impact on Tg-C1264R secretion, perhaps including that data would go some way to demonstrating a fate of ER-phagy (at least partly) for this mutant.

      Our response: We performed follow-up experiments to test interactions with Tg and the wider panel of ER-phagy receptors. We transiently expressed FLAG-tagged CCPG1, RTN3L, and TEX264 in HEK293 cells stably expressing Tg-NLuc and performed FLAG IPs followed by western blot analysis. We found that WT and C1264R Tg were enriched, albeit modestly, in the RTN3L Co-IP compared to control samples expressing GFP. Additionally, we found that WT, A2234D, and C1264R Tg were all enriched with CCPG1 compared to control samples expressing GFP. CCPG1 was found to be a C1264R Tg interactor within our mass spectrometry datasets, along with RTN3. We have now integrated these data into the manuscript as Fig EV8, and updated the manuscript text as follows:

      "Additionally, we monitored Tg enrichment with ER-phagy receptors CCPG1 and RTN3 via Western blot as both were found to be C1264R Tg interactors within our TRIP dataset. RTN3L is found to be the only RTN3 isoform involved in ER turnover via ER-phagy (Grumati et al, 2017). WT and C1264R Tg-NLuc were modestly enriched with RTN3L compared to control samples expressing GFP. Conversely, we found that all Tg variants exhibited modest interactions with CCPG1 compared to control samples expressing GFP, although less than with TEX264 (Fig EV8).

      Together, these data suggest that TEX264, CCPG1, or RTN3L engage with Tg during processing, and CH-associated Tg mutants may be selectively targeted to TEX264. Furthermore, ER-phagy may be considered as a degradative pathway in Tg processing, as other studies have mainly focused on Tg degradation through ERAD (Tokunaga et al, 2000; Menon et al, 2007)."

      Whether the TEX246 recruitment of mutant Tg leads to degradation remains to be tested. When we monitored C1264R Tg degradation by pulse-chase assay (Fig. EV9A), only a small fraction (

      __Reviewer #3, Comment #10: __Q9. The authors provide data that the UPR was not induced by ML-240 at 3hrs (10µM) (Figure 7, supplemental 1). This is in stark contrast to the results of Chou et al (2013) which the authors reference, reporting that ML-240 induced ATF4 and CHOP by 2 hrs at concentrations lower than used here (albeit a different cell type). While not exclusively UPR, could the authors address the potential activation of the integrated stress response (eIF2a phosphorylation, ATF4 and CHOP) in the FRT cells due to ML-240 treatment? If present, is there some link that could this provide an explanation for increased Tg-C1264R secretion? [Basal PERK/UPR activation with mutants.]

      Our Response: Thank you for bringing up this important point. As the reviewer acknowledges, the difference in UPR activation could stem from the different cell lines. Additionally, we measured activation via qPCR, whereas Chou et al. measured via immunoblot. We would like to point out that while we did not observe the upregulation of HSPA5 or ASNS (markers of ATF6 and PERK/ISR activation, respectively) in the presence of short ML-240 treatment (2-3 hr), we did observe the upregulation of DNAJB9 (a marker of IRE1/XBP1s activation).

      To address Reviewer #3's point, we performed further experiments monitoring the potential activation of the ISR in FRT cells due to ML-240 treatment. We treated C1264R Tg-FT FRT cells with ML-240 (10μM) for 2 hours, and monitored eIF2a phosphorylation via immunoblot. Indeed, we observed that ML-240 induced eIF2a phosphorylation compared to cells treated with DMSO. Tunicamycin (1mg/mL) was used a positive control, and showed similar results to ML-240. We have integrated these results into the manuscript, available in Fig EV10C.

      However, we would like to point out that all of these markers represent signs of early UPR inductions. Importantly, our results that HSPA5 transcript levels are not induced suggest that there is only very modest upregulation of ER chaperone levels occurring. Typically, the ER proteostasis network remodeling requires a longer time than the acute 2-4 hr treatment with ML-240. We have updated the manuscript text as follows:

      "Finally, we monitored activation of the unfolded protein response (UPR) in the presence of ML-240 in FRT cells expressing C1264R Tg-FT. Phosphorylation of eIF2a, an activation marker for the PERK arm of the UPR, was induced within 2 hr of ML-240 treatment (Fig EV10C). We further investigated the induction of UPR targets via qRT-PC. HSPA5 and ASNS transcripts, markers of ATF6 and PERK UPR activation respectively, remained unchanged or slightly decreased after 3 hr treatment with ML-240 in C1264R Tg cells (Fig EV10D). Only DNAJB9 transcript expression showed a significant increase in both WT Tg and C2164R Tg FRT cells (Fig EV10D). Moreover, ML-240 did not significantly alter cell viability after 3 hr, as measured by propidium iodide staining (Fig EV10E). Overall, these results highlight that the short ML-240 treatment induces early UPR markers, but the selective rescue of C1264R Tg secretion via ML-240 treatment is unlikely the results of global remodeling of the ER PN due to UPR activation."

      __Reviewer #3, Comment #11: __Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. Any of the suggested experiments above all use reagents reported in the manuscript and so would presumably incur minimal cost and hopefully time. This reviewer is sympathetic to time and financial constraints and so discussion of the issue could suffice.

      Our response: We have addressed follow-up experiments whenever possible or provided further discussion details where applicable. We are appreciative of Reviewer #3's sympathy for the time and financial constraints that go into this work and addressing manuscript revisions. Unfortunately, the 1st and 2nd authors both left the lab immediately after the reviews were received. Hence, many of the experiments had to be addressed by other lab members joining the project, which took considerably longer than anticipated. We apologize for the long delay with our revisions.

      __Reviewer #3, Comment #12: __Are the data and the methods presented in such a way that they can be reproduced? Yes. The methodology is explained in detail.

      Our Response: Thank you.

      __Reviewer #3, Comment #13: __Are the experiments adequately replicated and statistical analysis adequate? Yes. Relevant information is either in the figure legends or is provided in the source data.

      Our Response: Thank you.

      Minor comments:

      __Reviewer #3, Comment #14: __Are prior studies referenced appropriately? The references are generally appropriate, with a few exceptions of more general references used

      Our Response: Thank you.

      __Reviewer #3, Comment #15: __Are the text and figures clear and accurate? The text is clearly written, and the figures are clear.

      Our Response: Thank you.

      __Reviewer #3, Comment #16: __Do you have suggestions that would help the authors improve the presentation of their data and conclusions? A summary figure comparing the changing profiles of WT and C1264R and the factors implicated for them could be helpful.

      Our Response: We opted not to include a summary figure because the paper and figures area already dense in information.

      __Reviewer #3, Comment #17: __Perhaps include common nomenclature for proteins as well (e.g. HSP5A - BiP, HSP90B1 - Grp94, etc..)

      Our Response: We updated the manuscript throughout to reference common nomenclature or other protein names where applicable at their first mention.

      __Reviewer #3, Comment #18: __Line 317 - our is misspelled

      Our Response: Thank you. We have made this correction.

      __Reviewer #3, Comment #19: __Figure 4 - Supplemental Figure 1 - Legend has text referring to panels J and K, but Figure only goes up to F.

      Our Response: Thank you. This was an error in references to Figure panel lettering and we have since corrected this. Please note that this Figure is now Fig EV6.

      Reviewer #3 (Significance (Required)):

      __Reviewer #3, Comment #20: __

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      • Place the work in the context of the existing literature (provide references, where appropriate).

      Protein-protein interactions are often used to illustrate complexes and functionality, but these provide only snapshots, rather than "movies". There are many datasets out there exploring P-P interactions, but most if not all lack any temporal resolution for the interactions they report. The TRIP method described approaches this from the dynamic perspective - identifying the transient interactions formed by folding nascent chains with proteins that aid in their maturation and trafficking, or degradation. This represents an important technical advance in our ability to dynamically monitor protein interactions. The use of Tg mutants is valuable and perhaps this will lead to new perspectives on how to rescue it or other pathophysiological mutants with loss of function phenotypes.

      • State what audience might be interested in and influenced by the reported findings.

      This work should appeal to a broad audience within cell biology, particularly as the TRIP technique is attempting to address a fundamental question - what interactions form during the biogenesis/lifetime of a protein. Moreover, the effort to try to understand the different interactions formed with pathologically relevant mutant proteins as a strategy to try to rescue functionality, is a valuable exercise of this approach.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      ER quality control

      Our Response: We thank reviewer #3 for this positive endorsement.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Summary

      In this manuscript, Wright et al. developed an approach (termed TRIP) that allowed to map the temporal changes in the interaction landscape of a newly synthesized protein of interest. Using their TRIP approach, the authors found that the extensive interactions of thyroglobulin (Tg) with the proteostasis network (PN) during its passage through the secretory pathway were profoundly altered in response to disease-causing mutations (e.g. C1264R). The authors cross-validated their findings with a focus RNAi screen monitoring the cellular and secreted abundance of Tg variants upon deletion of PN components. In subsequent experiments the authors focused on two hits, VCP and TEX264, for which they confirmed their inhibitory effect on the secretion of Tg C1264R. Importantly, the authors found that TEX264 increasingly interacts with the Tg mutant and that pharmacological inhibition of VCP yielded the same phenotype than depletion of VCP. Overall, Wright and colleagues__ established an elegant method to map protein interaction in a time-resolved manner and demonstrated its value by the analysis of disease-related Tg mutants__. Hence, this work has the potential to serve as a rich resource for Tg-related research and as a powerful new tool to examine protein interactions. However, several concerns remain.

      Our response: Thank you to reviewer #4 for their valuable feedback and positive assessment. We addressed all comments in detail below.

      Major points:

      __Reviewer #4, Comment #1: __Overall, the TRIP workflow is quite difficult to understand at a first glance - even for a reader with a background in proteomics, biochemistry and cell biology. The authors may want to improve the description of the TRIP methodology and explain in more detail what the individual components and steps are good for. Along the same line, from the main text and the figure legend it was not clear that Tg was actually Flag-tagged. However, without this information it is difficult to follow the workflow. While Figure 1A is certainly helpful, the bulky graphics are deflecting the reader's attention. A more schematic version might be more informative.

      Our Response: Thank you for this feedback, which was also mirrored by Reviewer #2 (comment 10). We have made significant updates to clarify Fig 1 to provide more detail and eliminate some of unnecessary bulky graphics. We also expanded the schematic for the TRIP workflow in Fig 2A and we aligned all symbols used. Furthermore, we have edited the text to describe the workflow more clearly:

      "To develop the time-resolved interactome profiling method, we envisioned a two-stage enrichment strategy utilizing epitope-tagged immunoprecipitation coupled with pulsed biorthogonal unnatural amino acid labeling and functionalization (Fig 1A). Cells can be pulse labeled with homopropargylglycine (Hpg) to synchronize newly synthesized populations of protein. After pulsed labeling with Hpg, samples can then be collected across time points throughout a chase period (Fig 1A, Box 1) (Kiick et al, 2001; Beatty et al, 2006). The Hpg alkyne incorporated into the newly synthesized population of protein can be conjugated to biotin using copper-catalyzed alkyne-azide cycloaddition (CuAAC) (Fig 1A, Box 2). Subsequently, the first stage of the enrichment strategy can take place where the client protein of interest is globally captured and enriched using epitope-tagged immunoprecipitation, followed by elution (Fig 1A, Box 3). The second enrichment step can then utilize a biotin-streptavidin pulldown to capture the Hpg pulse-labeled, and CuAAC conjugated population, enriching samples into time-resolved fractions (Fig 1A, Box 4) (Li et al, 2020; Thompson et al, 2019)."

      Additionally, we have improved text to very clearly state that for the TRIP experiments Tg is FLAG-tagged and this epitope tag is required for the two-stage enrichment strategy. As one small example:

      "Thyroglobulin was chosen as the model secretory client protein. We generated isogenic Fischer rat thyroid cells (FRT) cells that stably expressed FLAG-tagged Tg (Tg-FT), including WT or mutant variants (A2234D and C1264R) (Fig EV1)"

      "Furthermore, the C-terminal FLAG-tag and Hpg labeling are necessary for this two-stage enrichment strategy, and DSP crosslinking is necessary to capture these interactions after stringent wash steps (Fig 1D, Fig EV2)."

      __Reviewer #4, Comment #2: __To what extend do the difference in protein abundance between Tg WT and Tg C1264R contribute to the increase binding of their interactors (e.g., HSP5 and PDIA4). The authors should perform a TRIP coupled immunoblot analysis where WT and Mutant are loaded side-by-side on the SDS-PAGE.

      Our Response: As Reviewer #3 (comment 5) had a similar inquiry, we provide the same response as listed above:

      We do not foresee an impact from level of Tg expression on the profiles obtained by TRIP. We were able to identify distinct profiles because we processed the data and normalized it based on the relative Tg amount. For example, while WT and A2234D Tg are expressed at similar levels intracellularly, we ere able to identify distinct differences in the interaction profiles across the two constructs. When developing FRT clones, we selected those that were expressed at similar levels and, therefore, did not have the capability to directly test differences, if any, in observed profiles that may be the result of different expression levels of the same Tg construct. Furthermore, Tg can make up 50% of all protein content within thyroid tissue (Di Jeso & Arvan, 2016). As such, thyroid cells are adept at maintaining the balance of QC machinery to process thyroid. Therefore, we do not anticipate that the amount of Tg expressed in TRIP experiments would have a significant impact on the profiles that we were able to observe.

      __Reviewer #4, Comment #3: __While the RNAi screen was done with pooled siRNA, it is not clear what was used for the RNAi validation experiments shown in Figure 5. This should be done by individual siRNA and not the same pooled reagents as used for the screen.

      Our Response: Similarly, pooled siRNAs were initially utilized for the data shown in Figure 5. The RNAi screen utilized siRNAs optimized for human cells, where as those found for Figure 5 were for rat cells. For the revisions, we performed control experiments with individual siRNAs, which are now shown in Fig EV7J,K. While we did not find that any one single siRNA recapitulated the full phenotype, we did find that several single siRNAs for VCP and TEX264 at least partially restored the observed phenotype of increased C1264R Tg secretion. This result is expected given that we reasoned the siRNAs are likely providing an additive effect contributing to the observed phenotypes. We provided these single siRNA control experiments in Fig EV7J,K, and updated the manuscript text as follows:

      "Several individual VCP and TEX264 siRNAs were able to partially recapitulate these increased secretion phenotype on C1264R Tg-FT, confirming that the effect is mediated by the respective gene silencing (Fig EV7J,K)."

      Reviewer #4, Comment #4: __In Figure 5A it is not clear which band was used to quantify the effect of NAPA reduction. Also, this analysis lacks normalization to an unrelated protein or loading control. Moreover, the authors should also examine the effect of the siRNA targets shown in Figure 5C for Tg WT and not only the mutant.__

      Our Response: The uppermost band in Fig 5A was used for quantification. We added a red asterisk similar to that found in Fig 5C to denote this lower back in the lysate panel(s) as a non-specific background band found within the Western blot. These data are the result of immunoprecipitations of both cell lysate and medium content, as such there is no applicable loading control that can be used within the western blots. For experiments, cell amounts were normalized by seeding and subsequently culturing the same amount of cells, as denoted within the Materials and Methods - FRT siRNA validation studies section of the manuscript. Furthermore, there are no loading controls that are easily utilized for analyzing cell culture medium. We have further clarified the Fig 5 caption to provide clearer experimental detail:

      "(A and B) Western blot analysis (A) and quantification (B) of WT Tg-FT secretion from FRT cells transfected with select siRNAs hits from initial screening data set. Red asterisk denotes a non-specific background band within the western blot. Cells were transfected with 25nM siRNAs for 36 hrs, media exchanged and conditions for 4 hrs, Tg-FT was immunoprecipitated from lysate and media samples, and Tg-FT amounts were analyzed via immunoblotting. N = 6.

      (C and D) Western blot analysis (C) and quantification (D) of C1264R Tg-FT secretion from FRT cells transfected with select siRNA hits from the initial screening data set. Red asterisk denotes a non-specific background band within the western blot. Cells were transfected with 25nM siRNAs for 36 hrs, media exchanged and conditions for 8 hrs, Tg-FT was immunoprecipitated from lysate and media samples, and Tg-FT amounts were analyzed via immunoblotting. All statistical testing performed using an unpaired student's t-test with Welch's correction. *pFinally, as the siRNA targets shown in Fig 5C were shown to be hits exclusively for C1264R Tg-FT we did not believe it was necessary to follow-up on these with WT Tg-FT. Similarly, we did not follow-up on hits that were exclusive to WT Tg-FT with C1264R and A2234D Tg-FT.

      __Reviewer #4, Comment #5: __The authors should also test for the binding of RTN3 to Tg WT and mutant - in particular in comparison to TEX264. This would be important in the context that only RTN3 but not TEX264 was detected in the TRIP approach. Do the authors also detect VCP and LC3B in their pulldowns?

      Our response: Please also see Reviewer #3, comment 9, who made a similar point.

      We performed follow-up experiments to test interactions with Tg and the wider panel of ER-phagy receptors. We transiently expressed FLAG-tagged CCPG1, RTN3L, and TEX264 in HEK293 cells stably expressing Tg-NLuc and performed FLAG IPs followed by western blot analysis. We found that WT and C1264R Tg were enriched, albeit modestly, in the RTN3L Co-IP compared to control samples expressing GFP. Additionally, we found that WT, A2234D, and C1264R Tg were all enriched with CCPG1 compared to control samples expressing GFP. CCPG1 was found to be a C1264R Tg interactor within our mass spectrometry datasets, along with RTN3. We have now integrated these data into the manuscript as Fig EV8, and updated the manuscript text as follows:

      "Additionally, we monitored Tg enrichment with ER-phagy receptors CCPG1 and RTN3 via Western blot as both were found to be C1264R Tg interactors within our TRIP dataset. RTN3L is found to be the only RTN3 isoform involved in ER turnover via ER-phagy (Grumati et al, 2017). WT and C1264R Tg-NLuc were modestly enriched with RTN3L compared to control samples expressing GFP. Conversely, we found that all Tg variants exhibited modest interactions with CCPG1 compared to control samples expressing GFP, although less than with TEX264 (Fig EV8).

      Together, these data suggest that TEX264, CCPG1, or RTN3L engage with Tg during processing, and CH-associated Tg mutants may be selectively targeted to TEX264. Furthermore, ER-phagy may be considered as a degradative pathway in Tg processing, as other studies have mainly focused on Tg degradation through ERAD (Tokunaga et al, 2000; Menon et al, 2007)."

      Regarding VCP, we can detect it routinely in our AP-MS experiment as presented previously (Wright et al. 2021), and here in Fig 3, Appendix Fig S1. However, we have not been able to detect interactions via western blot, which may be attributed to the increased sensitivity that LC-MS offers. We have not probed for LC3 interactions via western blot as we did not detect it by LC-MS either, but we identified several lysosomal and other autophagy-related components previously (Wright et al. 2021), and here shown in Appendix Fig S1 and Fig EV5C.

      __Reviewer #4, Comment #6: __The effect of TEX264 depletion on Tg secretion should be confirmed by TEX263 KO experiments. Do the authors observe a similar increase in secreted Tg C1264R in BafA1- or SAR405-treated cells? Moreover, the authors should show that Tg C1264R is actually delivered to lysosomes using biochemical assays such as LysoIP or colocalization experiments.

      Our response: To address this concern, we generated stable TEX264 knockout FRT cell lines by CRISPR, and probed several clones for their impact on Tg secretion. We found that TEX264 knockout did not recapitulate the increase in C1264R Tg secretion observed with transient siRNA knockout. While disappointing, these results are not necessarily surprising, considering that prolonged TEX264 knockout may lead the cell to adapt compensation mechanisms by modulating other proteostasis factors and/or autophagy machinery.

      We performed experiments utilizing the autophagy inhibitor Bafilomycin A1, and have now included these results with the manuscript available in Fig EV10A,B. We found that BafA1 treatment led to the accumulation of WT Tg in the lysate but not for the C1264R Tg. We updated the manuscript text to accompany these data as follows:

      "To understand whether this rescue in secretion was uniquely linked to VCP inhibition or could be more broadly attributed to blocking Tg degradation, we tested the proteasomal inhibitor bortezomib, and lysosomal inhibitor bafilomycin. Bafilomycin increased WT Tg lysate abundance, and bortezomib significantly increased A2234D lysate abundance, consistent with a role of these degradation processes in Tg PQC (Fig EV10A). When monitoring Tg-NLuc media abundance, neither bafilomycin nor bortezomib significantly altered WT, A2234D, or C1264R abundance (Fig. EV10B). confirming that general inhibition of proteasomal or lysosomal degradation does with rescue mutant Tg secretion."

      These results raise the possibility that the mutant Tg interaction with TEX264 may not lead to active autophagic degradation of mutant Tg. This is also consistent with the slow degradation of C1264R Tg observed in the pulse-chase experiment in Fig EV9A. Whether the TEX246 recruitment of mutant Tg leads to degradation or assumes an alternative function, for example, intracellular sequestration, remains to be tested. Importantly, we have refrained from making claims in the manuscript that C1264R Tg is delivered to the lysosome but have presented data showing that it interacts with ER-phagy-related components and have further speculated on the possibility how autophagy could play a role in Tg processing.

      Thank you for the LysoIP suggestion. Ongoing work in the lab is addressing this question and experiments suggested by the reviewer, but this is better reserved for a follow-up manuscript.

      __Reviewer #4, Comment #7: __Figure 7A and 7C lack loading controls. The quantification shown in Figure 7B and 7D should be normalized to this control. Since VCP activity is often coupled to the of the proteasome, the authors should check whether blocking the proteasome yields a similar effect than ML-240.

      Our Response: Like Fig 5A discussed above (Reviewer #4, comment 4), these data are the result of immunoprecipitations from cell lysate and medium. As a result, there is not applicable loading control that can be used within the western blots. For experiments, cell amounts were normalized by seeding and subsequently culturing the same amount of cells, as denoted within the Materials and Methods - FRT siRNA validation studies section of the manuscript and Material and Methods - VCP pharmacological inhibition studies.

      Regarding the effect of proteasome inhibition, we tested whether bortezomib treatment can increase C1264R Tg secretion. We found that bortezomib led to a small but significant increase in A2234D Tg accumulation in the lysate, but did not increase secretion of Tg for WT or any of the mutant variants. This new data is shown in Fig EV10A,B. We updated the text as follow:

      "To understand whether this rescue in secretion was uniquely linked to VCP inhibition or could be more broadly attributed to blocking Tg degradation, we tested the proteasomal inhibitor bortezomib, and lysosomal inhibitor bafilomycin. Bafilomycin increased WT Tg lysate abundance, and bortezomib significantly increased A2234D lysate abundance, consistent with a role of these degradation processes in Tg PQC (Fig EV10A). When monitoring Tg-NLuc media abundance, neither bafilomycin nor bortezomib significantly altered WT, A2234D, or C1264R abundance (Fig. EV10B). confirming that general inhibition of proteasomal or lysosomal degradation does with rescue mutant Tg secretion."

      __Reviewer #4, Comment #8: __With regard to Figure 7 - Figure supplement 1: The authors should monitor the effect of ML-240 on Tg secretion such that WT and C1264R mutants are directly compared (side-by-side on the same immunoblot). Otherwise, it is difficult to claim that ML-240 rescues the secretion of the mutant.

      Our response: The reviewer is referring to the S35 pulse-chase experiments now shown in Fig EV9. We would like to clarify that these images are not immunoblots but autoradiographs. Even though the samples for WT and C1264R Tg were loaded onto separate gels, the gels were imaged at the same time and are therefore directly comparable. Regardless, the more meaningful information that can be gleaned from these experiments are the absolute rates of protein secretion and degradation and how they change in response to ML-240 treatment. The scale in the quantifications (0 - 100%) is the same and corresponds to the total amount of WT or C1264R Tg that is labeled with 35S during the 30 min pulse. Importantly, we found that C1264R Tg-FT secretion is significantly increased in the presence of ML-240, changing from

      __Reviewer #4, Comment #9: __How did ML-240 affect the ER-phagy components (in particular RTN3) in the TRIP analysis of Tg C1264R (Figure 7G-L)?

      Our response: This is a great discussion point raised by reviewer #4. We have updated the manuscript text to discuss in more detail changes in interactions with degradation components, especially with proteasomal degradation machinery (Fig 7M,N). The manuscript text now reads as follows:

      "The most striking interaction changes occurred with proteasomal degradation components, which remained steady until 1.5 hr, but then abruptly declined with ML-240 treatment at later time points (Fig 7M,N). This decline tracks with changes to the glycan processing machinery, highlighting that the coordination between N-glycosylation and diverting Tg away from ERAD may be a key to the rescue mechanism."

      Minor points:

      __Reviewer #4, Comment #10: __The candidate labeling in Figure 3 - Figure supplement 2 and 3 is too small und unreadable. The authors should provide a higher resolution of these figures or increase the font.

      Our response: These figures are now in the Appendix and we have edited this figure to provide higher resolution.

      Reviewer #4 (Significance (Required)):

      Please see above

    1. Author response:

      We would like to thank the reviewers for their constructive feedback. We have thoroughly considered their concerns and comments and we aim to include some additional results in an updated version of this manuscript. In addition, we would like to address some of the comments, with which we respectfully disagree. Below is our point-by-point reply.

      Reviewer 1:

      Summary:

      This paper is focused on the role of Cadherin Flamingo (Fmi) - also called Starry night (stan) - in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that express activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which makes continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact out-competed (PMID: 20679206), which is something to bear in mind. 

      We think it is unlikely that the outcome of RasV12, scrib (or lgl) competition depends on discrete vs. continuous clones or on creation of a privileged environment. As shown in the same reference mentioned by the reviewer, the outcome of RasV12, scrib (or lgl) tumors greatly depends on the clone being able to grow to a certain size. The authors show instances of discrete clones where larger RasV12, lgl clones outcompete the surrounding tissue and eliminate WT cells by apoptosis, whereas smaller clones behave more like losers. It is not clear what aspect of the environment determines the ability of some clones to grow larger than others, but in neither case are the clones prevented from competition. Other studies show that in mammalian cells, RasV12, scrib clones are capable of outcompeting the surrounding tissue, such as in Kohashi et al (2021), where cells carrying both mutations actively eliminate their neighbors.

      The authors show that clonal loss of Fmi by an allele or by RNAi in the RasV12, scrib-i tumors suppresses their growth in both the eye disc (continuous clones) and wing disc (discrete clones). The authors attributed this result to less killing of WT neighbors when Myc over-expressing clones lacking Fmi, but another interpretation (that Fmi regulates clonal growth) is equally as plausible with the current results.

      See point (1) for a discussion on this.

      Next, the authors show that scrib-RNAi clones that are normally out-competed by WT cells prior to adult stages are present in higher numbers when WT cells are depleted for Fmi. They then examine death in RasV12, scrib-i ey-FLP clones, or in discrete hs-FLP UAS-Myc clones. They state that they see death in WT cells neighboring RasV12, scrib-i clones in the eye disc (Figures 4A-C). Next, they write that RasV12, scrib-I cells become losers (i.e., have apoptosis markers) when Fmi is removed. Neither of these results are quantified and thus are not compelling. They state that a similar result is observed for Myc over-expression clones that lack Fmi, but the image was not compelling, the results are not quantified and the controls are missing (Myc over-expressing clones alone and Fmi clones alone).

      We assayed apoptosis in UAS-Myc clones in eye discs but neglected to include the results in Figure 4. We will include them in the updated manuscript. Regarding Fmi clones alone, we direct the reviewer’s attention to Fig. 2 Supplement 1 where we showed that fminull clones cause no competition. Dcp-1 staining showed low levels of apoptosis unrelated to the fminull clones or twin-spots, and we will comment on this in the revised manuscript.

      Regarding the quantification of apoptosis, we did not provide a quantification, in part because we observe a very clear visual difference between groups (Fig. 4A-K), and in part because it is challenging to come up with a rigorous quantification method. For example, how far from a winner clone can an apoptotic cell be and still be considered responsive to the clone? For UAS-Myc winner clones, we observe a modest amount of cell death both inside and outside the clones, consistent with prior observations. For fminull UAS-Myc clones, we observe vastly more cell death within the fminull UAS-Myc clones and modest death in nearby wildtype cells, and consequently a much higher ratio of cell death inside vs outside the clone. Because of the somewhat arbitrary nature of quantification, and the dramatic difference, we initially chose not to provide a quantification. However, given the request, we chose an arbitrary distance from the clone boundary in which to consider dying cells and counted the numbers for each condition. We view this as a very soft quantification, but will report it in a way that captures the phenomenon in the revised manuscript.

      They then want to test whether Myc over-expressing clones have more proliferation. They show an image of a wing disc that has many small Myc overexpressing clones with and without Fmi. The pHH3 results support their conclusion that Myc overexpressing clones have more pHH3, but I have reservations about the many clones in these panels (Figures 5L-N).

      As the reviewer’s reservations are not specified, we have no specific response.

      They show that the cell competition roles of Fmi are not shared by another PCP component and are not due to the Cadherin domain of Fmi. The authors appear to interpret their results as Fmi is required for winner status. Overall, some of these results are potentially interesting and at least partially supported by the data, but others are not supported by the data.

      Strengths: 

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      (1) In the Myc over-expression experiments, the increased size of the Myc clones could be because they divide faster (but don't outcompete WT neighbors). If the authors want to conclude that the bigger size of the Myc clones is due to out-competition of WT neighbors, they should measure cell death across many discs of with these clones. They should also assess if reducing apoptosis (like using one copy of the H99 deficiency that removes hid, rpr, and grim) suppresses winner clone size. If cell death is not addressed experimentally and quantified rigorously, then their results could be explained by faster division of Myc over-expressing clones (and not death of neighbors). This could also apply to the RasV12, scrib-i results.

      Indeed, Myc clones have been shown to divide faster than WT neighbors, but that is not the only reason clones are bigger. As shown in (de la Cova et al, 2004), Myc-overexpressing cells induce apoptosis in WT neighbors, and blocking this apoptosis results in larger wings due to increased presence of WT cells. Also, (Moreno and Basler, 2004) showed that Myc-overexpressing clones cause a reduction in WT clone size, as WT twin spots adjacent to 4xMyc clones are significantly smaller than WT twin spots adjacent to WT clones. In the same work, they show complete elimination of WT clones generated in a tub-Myc background. Since then, multiple papers have shown these same results. It is well established then that increased cell proliferation transforms Myc clones into supercompetitors and that in the absence of cell competition, Myc-overexpressing discs produce instead wings larger than usual.

      In (de la Cova et al, 2004) the authors already showed that blocking apoptosis with H99 hinders competition and causes wings with Myc clones to be larger than those where apoptosis wasn’t blocked. As these results are well established from prior literature, there is no need to repeat them here.

      (2) This same comment about Fmi affecting clone growth should be considered in the scrib RNAi clones in Figure 3.

      In later stages, scrib RNAi clones in the eye are eliminated by WT cells. While scrib RNAi clones are not substantially smaller in third instar when competing against fmi cells (Fig 3M), by adulthood we see that WT clones lacking Fmi have failed to remove scrib clones, unlike WT clones that have completely eliminated the scrib RNAi clones by this time. We therefore disagree that the only effect of Fmi could be related to rate of cell division.

      (3) I don't understand why the quantifications of clone areas in Figures 2D, 2H, 6D are log values. The simple ratio of GFP/RFP should be shown. Additionally, in some of the samples (e.g., fmiE59 >> Myc, only 5 discs and fmiE59 vs >Myc only 4 discs are quantified but other samples have more than 10 discs). I suggest that the authors increase the number of discs that they count in each genotype to at least 20 and then standardize this number.

      Log(ratio) values are easier to interpret than a linear scale. If represented linearly, 1 means equal ratios of A and B, while 2A/B is 2 and A/2B is 0.5. And the higher the ratio difference between A and B, the starker this effect becomes, making a linear scale deceiving to the eye, especially when decreased ratios are shown. Using log(ratios), a value of 0 means equal ratios, and increased and decreased ratios deviate equally from 0.

      Statistically, either analyzing a standardized number of discs for all conditions or a variable number not determined beforehand has no effect on the p-value, as long as the variable n number is not manipulated by p-hacking techniques, such as increasing the n of samples until a significant p-value has been obtained. While some of our groups have lower numbers, all statistical analyses were performed after all samples were collected. For all results obtained by cell counts, all samples had a minimum of 10 discs due to the inherent though modest variability of our automated cell counts, and we analyzed all the discs that we obtained from a given experiment, never “cherry-picking” examples. For the sake of transparency, all our graphs show individual values in addition to the distributions so that the reader knows the n values at a glance.

      (5) Figure 4 - shows examples of cell death. Cas3 is written on the figure but Dcp-1 is written in the results. Which antibody was used? The authors need to quantify these results. They also need to show that the death of cells is part of the phenotype, like an H99 deficiency, etc (see above).

      Thank you for flagging this error. We used cleaved Dcp-1 staining to detect cell death, not Cas3 (Drice in Drosophila). We will update all panels replacing Cas3 by Dcp-1.

      As described above, cell death is a well established consequence of myc overexpression induced cell death and we feel there is no need to repeat that result. To what extent loss of Fmi induces excess cell death or reduces proliferation in “would-be” winners, and to what extent it reduces “would-be” winners’ ability to eliminate competitors are interesting mechanistic questions that are beyond the scope of the current manuscript.

      (6) It is well established that clones overexpressing Myc have increased cell death. The authors should consider this when interpreting their results.

      We are aware that Myc-overexpressing clones have increased cell death, but it has also been demonstrated that despite that fact, they behave as winners and eliminate WT neighboring cells. And as mentioned in comment (1), WT clones generated in a 3x and 4x Myc background are eliminated and removed from the tissue, and blocking cell death increases the size of WT “losers” clones adjacent to Myc overexpressing clones.

      (7) A better characterization of discrete Fmi clones would also be helpful. I suggest inducing hs-flp clones in the eye or wing disc and then determining clone size vs twin spot size and also examining cell death etc. If such experiments have already been done and published, the authors should include a description of such work in the preprint.

      We have already analyzed the size of discrete Fmi clones and showed that they did not cause any competition, with fmi-null clones having the same size as WT clones in both eye and wing discs. We direct the reviewer’s attention to Figure 2 Supplement 1.

      (8) We need more information about the expression pattern of Fmi. Is it expressed in all cells in imaginal discs? Are there any patterns of expression during larval and pupal development?

      Fmi is equally expressed by all cells in all imaginal discs in Drosophila larva and pupa. We will include this information in the updated manuscript.

      (9) Overall, the paper is written for specialists who work in cell competition and is fairly difficult to follow, and I suggest re-writing the results to make it accessible to a broader audience.

      We have endeavored to both provide an accessible narrative and also describe in sufficient detail the data from multiple models of competition and complex genetic systems. We hope that most readers will be able, at a minimum, to follow our interpretations and the key takeaways, while those wishing to examine the nuts and bolts of the argument will find what they need presented as simply as possible.

      Reviewer 2:

      Summary:

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      We would like to thank the reviewer for their thoughtful and positive review.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a comprehensive mechanistic understanding. Induction of apoptosis and JNK activation are general outcomes, but it is important to determine how they are specifically induced in Fmi-depleted clones. The authors should take advantage of the power of fly genetics and conduct a series of genetic epistasis analyses.

      We appreciate that this manuscript does not address the mechanism by which Fmi participates in cell competition. Our intent here is to demonstrate that Fmi is a key contributor to competition. We indeed aim to delve into mechanism, are currently directing our efforts to exploring how Fmi regulates competition, but the size of the project and required experiments are outside of the scope of this manuscript. We feel that our current findings are sufficiently valuable to merit sharing while we continue to investigate the mechanism linking Fmi to competition.

      (2) The depletion of Fmi may not have had a significant impact on cell competition; instead, it is more likely to have solely facilitated the induction of apoptosis.

      We respectfully disagree for several reasons. First, loss of Fmi is specific to winners; loss of Fmi has no effect on its own or in losers when confronting winners in competition. And in the Ras V12 tumor model, loss of Fmi did not perturb whole eye tumors – it only impaired tumor growth when tumors were confronted with competitors. We agree that induction of apoptosis is affected, but so too is proliferation, and only when in winners in competition.

      (3) To make a solid conclusion for Figure 1, the authors should investigate whether complete removal of Fmi by a mutant allele affects tumor growth induced by expressing RasV12 and scrib RNAi throughout the eye.

      We agree with the reviewer that this is a worthwhile experiment, given that RNAi has its limitations. However, as fmi is homozygous lethal at the embryo stage, one cannot create whole disc tumors mutant for fmi. As an approximation to this condition, we have introduced the GMR-Hid, cell-lethal combination to eliminate non-tumor tissue in the eye disc. Following elimination of non-tumor cells, there remains essentially a whole disc harboring fminull tumor. Indeed, this shows that whole fminull tumors overgrow similar to control tumors, confirming that the lack of Fmi only affects clonal tumors. We will provide those results in the updated manuscript.

      (4) The authors should test whether the expression level of Fmi (both mRNA and protein) changes during tumorigenesis and cell competition.

      This is an intriguing point that we would like to validate. We are currently performing immunostaining for Fmi in clones to confirm whether its levels change during competition. We will provide these results in the updated manuscript.

      Reviewer 3:

      Summary: <br /> In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in the Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces the proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific to Flamingo as it cannot be recapitulated with other components of the PCP pathway, and does not rely on the interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo does not just suppress the competitive advantage of winner clones, but even turns them into putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long-term avenue for therapeutic purposes as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantification, and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provided some hints on the putative mechanism (specifically by comparing its localisation in winner and loser cells). 

      Also, on a more interpretative note, the absence of the impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      We would like to thank the reviewer for their thorough and positive review.

      Strengths: 

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition.

      - One of the rare genetic conditions that affects very specifically winner cells without any impact on losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective in the long term)

      Weaknesses: 

      - The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

      Reviewer 2 made the same comment in their weakness (1), and we refer to that response. In future work, we are excited to better understand the pathways linking Fmi and competition.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The authors demonstrate that ASGR1 is degraded in response to RSPO2RA-antibody treatment through both the proteasomal and the lysosomal pathway, suggesting that this is due to the RSPO2RA-mediated recruitment of ZNRF3/RNF43, which have E3 ubiquitin ligase activity. The paper doesn't show, however, if ASGR1 is indeed ubiquitinated.

      We thank the reviewer for this comment. We have now conducted ASGR1 ubiquitination assays by immunoprecipitation (IP) of ubiquitin in the membrane protein extract, and immunoblotting (IB) ASGR1 after treating HepG2 cells with our SWEETS molecules or controls. The new data demonstrated ubiquitination of ASGR1 with SWEETS treatment (new Fig. S3A and S3B). Additionally, we blocked the potential ubiquitination of ASGR1 by mutating the two lysine residues in the cytoplasmic domain and compared the ASGR1 degradation after SWEETS treatment. The new data show that removing the potential ubiquitylation Lys sites prevented ASGR1 degradation post SWEETS treatment (new Fig. S3C). These new results provide direct evidence that ASGR1 is ubiquitinated to undergo lysosome or proteasome degradation.

      The authors conclude that the RSPO2A-Ab fusions can act as a targeted protein degredation platform, because they can degrade ASGR. While I agree with this statement, I would argue that the goal of these Abs would not be to degrade ASGR per se. The argumentation is a bit confusing here. This holds for both the results and the discussion section: The authors focus on the dual role of their agents, i.e. on promoting both WNT signaling AND on degrading ASGR1. They might want to reconsider how they present their data (e.g. it may be interesting to target ASGR1, but one would presumably then like to do this without also increasing WNT responsiveness?).

      We thank the reviewer for this comment. As the reviewer states, the initial goal of the RSPO2RA-ab fusions was to generate tissue-specific RSPO mimetics that focus on elimination of E3. As an unintended consequence, we observed enhanced elimination of ASGR as well. While this was unintended, the results did provide POC that when an E3 ligase is brought into proximity of another protein, ubiquitination and degradation of this protein may occur. Additionally, our results highlight that one needs to be careful in fully assessing the impact of bispecific molecules on the intended target as well as unintended targets to understand the potential side effects of such bispecific molecules. We have revised the manuscript to make this more clear, both in the Results and Discussion sections.

      Lines 326-331: The authors use a lot of abbreviations for all of the different protein targeting technologies, but since they are hinting at specific mechanisms, it would be better to actually describe the biological activity of LYTAC versus AbTAC/PROTAB/REULR so non-experts can follow.

      We thank the reviewer for this suggestion. We have added more details in the Discussion to highlight the different mechanisms of the various systems described.

      Can the authors comment on how 8M24 and 8G8 compare to 4F3? The latter seems a bit more specific (ie. lower background activity in the absence of ASGR1 in 5C)? Are there any differences/advances between 8M24 and 8G8 over 4F3? This remains unclear.

      These three antibodies bind different regions/epitopes on ASGR. 8M24 and 8G8 bind non-overlapping epitopes on the carbohydrate recognition domain (CRD), while 4F3 binds the stalk region outside of the CRD. This information is in the Results section of the manuscript. We do not believe that the difference in the ASGR binding epitopes contributes to the slight differences in the background activity. The slight differences may be due to differences in the conformation of the antibodies resulting from the differences in their primary sequences, and these differences may not be significant. We have now repeated the experiments in Fig. 5C and 5D to address the reviewer’s next comment on the axis. These new data (new Fig. 5C and 5D) show less background differences between the molecules.

      Can the authors ensure that the axes are labelled/numbered similarly for Fig 5B-D? This will make it easier to compare 5C and 5D.

      We thank the reviewer for this suggestion. The y-axes in Fig. 5B–D now have the same scale and number format. For Figs. 5C and 5D, we focus on the potency increases of the SWEETS molecules post ASGR1 overexpression.

      Reviewer #2 (Public Review):

      Weaknesses:

      The authors show crystal structures for binding of these antibodies to ASGR1/2, and hypothesize about why specificity is mediated through specific residues. They do not test these hypotheses.

      We thank the reviewer for this comment. We did not further test the residue contributions to binding and specificity as this is not the main focus of the current manuscript. We have revised the section and tuned down the claims for specificity.

      The authors demonstrate in hepatocyte cell lines that these function as mimetics, and that they do not function in HEK cells, which do not express ASGR1. They do not perform an exhaustive screen of all non-hepatocyte cells, nor do they test these molecules in vivo.

      We agree with the reviewer. For the 4F3-based SWEETS molecule, additional in vitro and in vivo specificity characterized were performed and described in Zhang et al., Sci Rep, 2020. Since 8M24 is human specific and 8G8 only weakly interacts with mouse receptors, in vivo experiments in mouse were not performed. While we did not extensively test the 8M24- and 8G8-based SWEETS on additional cell lines or in vivo, we do believe the data presented strongly support the hepatocyte-specific effects of these molecules.

      Surprisingly, these molecules also induced loss of ASGR1, which the authors hypothesize is due to ubiquitination and degradation, initiated by the E3 ligases recruited to ASGR1. They demonstrate that inhibition of either the proteasome or lysosome abrogates this effect and that it is dependent on E1 ubiquitin ligases. They do not demonstrate direct ubiquitination of ASGR1 by ZNRF3/RNF43.

      We thank the reviewer for this comment. We have now conducted ASGR1 ubiquitination assays by immunoprecipitation (IP) of ubiquitin in the membrane protein extract, and immunoblotting (IB) ASGR1 after treating HepG2 cells with our SWEETS molecules or controls. The new data demonstrate ubiquitination of ASGR1 with SWEETS treatment (new Figs. S3A and S3B). Additionally, we blocked the potential ubiquitination of ASGR1 by mutating the two lysine residues in the cytoplasmic domain and compared the ASGR1 degradation after SWEETS treatment. The new data show that removing the potential ubiquitylation Lys sites prevented ASGR1 degradation post SWEETS treatment (new Fig. S3C). These new results provide direct evidence that ASGR1 is ubiquitinated to undergo lysosome or proteasome degradation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are multiple instances where articles (i.e. the use of "the") are missing.

      We thank the reviewer for this comment. Following the suggestion, the manuscript has gone through a detailed review by an editorial service, and these and other grammatical errors have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      The best I can think of is to inject these into Wnt reporter mice (or maybe humanized mice) and see if the liver lights up while other tissues do not.

      We thank the reviewer for this suggestion. The liver specificity was demonstrated in vivo in our earlier publication (SciRep, 10:13951, 2020) with the 4F3-RSPO2RA molecule. Unfortunately, as the results in this manuscript show, the new ASGR binders 8M24 and 8G8 either do not bind or only weakly interact with mouse receptors. Therefore, the in vivo experiments were not performed here.

      You could also consider addressing some of the statements in the manuscript that are currently hypothetical experimentally.

      We thank the reviewer for this comment. We did not further test the residues’ contribution to binding and specificity as this is not the main focus of the current manuscript. We have revised the section and tuned down the claims for specificity.

      It would be easier to compare the graphs in 5B-D if all Y-axes were the same scale, with the same scientific notation.

      We thank the reviewer for this suggestion. The y-axes in Fig. 5B-D now have the same scale and number format. For Figs. 5C and 5D, we focus on the potency increases of the SWEETS molecules post ASGR1 overexpression.

      Some of the western blots in Figure 6 do not have antibody/target labels, making them harder to interpret.

      All the Western blots antibody/target labels are on the right side of the blots for each panel, we have now made the text bold and thus easier to identify.

      Figure 6 and Supplementary Figure 2 are the same I think.

      Figure 6 and Supplementary Figure 2 show the same experimental set-up performed on two different cell lines, Fig. 6 is on Huh7 cells and Supplementary Fig. 2 is on HepG2 cells. The results from these two cell lines are quite consistent, making their appearance very similar.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This is a valuable study that develops a new model of the way muscle responds to perturbations, synthesizing models of how it responds to small and large perturbations, both of which are used to predict how muscles function for stability but also how they can be injured, and which tend to be predicted poorly by classic Hill-type models. The evidence presented to support the model is solid, since it outperforms Hill-type models in a variety of conditions. Although the combination of phenomenological and mechanistic aspects of the model may sometimes make it challenging to interpret the output, the work will be of interest to those developing realistic models of the stability and control of movement in humans or other animals.

      Reviewer #1 (Public Review):

      Muscle models are important tools in the fields of biomechanics and physiology. Muscle models serve a wide variety of functions, including validating existing theories, testing new hypotheses, and predicting forces produced by humans and animals in health and disease. This paper attempts to provide an alternative to Hill-type muscle models that includes contributions of titin to force enhancement over multiple time scales. Due to the significant limitations of Hill-type models, alternative models are needed and therefore the work is important and timely.

      The effort to include a role for titin in muscle models is a major strength of the methods and results. The results clearly demonstrate the weaknesses of Hill models and the advantages of incorporating titin into theoretical treatments of muscle mechanics. Another strength is to address muscle mechanics over a large range of time scales.

      The authors succeed in demonstrating the need to incorporate titin in muscle models, and further show that the model accurately predicts in situ force of cat soleus (Kirsch et al. 1994; Herzog & Leonard, 2002) and rabbit posts myofibrils (Leonard et al. 2010). However, it remains unclear whether the model will be practical for use with data from different muscles or preparations. Several ad hoc modifications were described in the paper, and the degree to which the model requires parameter optimization for different muscles, preparations and experiment types remains unclear.

      I think the authors should state how many parameters require fitting to the data vs the total number of model parameters. It would also be interesting for the authors to discuss challenges associated with modeling ex vivo and in vivo data sets, due to differences in means of stimulation vs. model inputs.

      (1) I think the authors should state how many parameters require fitting to the data vs the total number of model parameters.

      The total number of model parameters are listed in Table 1. Each parameter has, in addition, references listed for the source of data (if one exists) along with how the data were used (’C’ calculate, ’F’ fit, ’E’ estimated, or ’S’ for scaled) for the specific simulations that appear in this paper. While this is a daunting number of parameters, only a few of these parameters must be updated when modeling a new musculotendon.

      Similar to a Hill-type muscle model, at least 5 parameters are needed to fit the VEXAT model to a specific musculotendon: maximum isometric force (fiso), optimal contractile element (CE) length, pennation angle, maximum shortening velocity, and tendon slack length. However, similar to a Hill model, it is only possible to use this minimal set of parameters by making use of default values for the remaining set of parameters. The defaults we have used have been extracted from mammalian muscle (see Table 1) and may not be appropriate for modeling muscle tissue that differs widely in terms of the ratio of fast/slow twitch fibers, titin isoform, temperature, and scale.

      Even when these defaults are appropriate, variation is the rule for biological data rather than the exception. It will always be the case that the best fit can only be obtained by fitting more of the model’s parameters to additional data. Standard measurements of the active force-length relation, passive forcelength relation, and force-velocity relations are quite helpful to improve the accuracy of the model to a specific muscle. It is challenging to improve the fit of the model’s cross-bridge (XE) and titin models because the data required are so rare. The experiments of Kirsch et al., Prado et al, and Trombitas et´ al. are unique to our knowledge. However, if more data become available, it is relatively straight forward to update the model’s parameters using the methods described in Appendix B or the code that appears online (https://github.com/mjhmilla/Millard2023VexatMuscle).

      We have modified the manuscript to make it clear that, in some circumstances, the burden of parameter identification for the VEXAT model can be as low as a Hill model:

      - Section 3: last two sentences of the 2nd paragraph, found at: Page 10, column 2, lines 1-12 of MillardFranklinHerzog v3.pdf and 05 MillardFranklinHerzog v2 v3 diff.pdf

      - Table 1: last two sentences of the caption, found at: Page 11 of MillardFranklinHerzog v3.pdf and 05 MillardFranklinHerzog v2 v3 diff.pdf

      (2) It would also be interesting for the authors to discuss challenges associated with modeling ex vivo and in vivo data sets, due to differences in means of stimulation vs. model inputs.

      All of the experiments simulated in this work are in-situ or ex-vivo. So far the main challenges of simulating any experiment have been quite consistent across both in-situ and ex-vivo datasets: there are insufficient data to fit most model parameters to a specific specimen and, instead, defaults from the literature must be used. In an ideal case, a specimen would have roughly ten extra trials collected so that the maximum isometric force, optimal fiber length, active force-length relation, passive force-length relation (upto ≈ 0_._6_f_oM), and the force-velocity relations could be identified from measurements rather than relying on literature values. Since most lab specimens are viable for a small number of trials (with the exception of cat soleus), we don’t expect this situation to change in future.

      However, if data are available the fitting process is pretty straight forward for either in-situ or ex-vivo data: use a standard numerical method (for example non-linear least squares, or the bisection method) to adjust the model parameters to reduce the errors between simulation and experiment. The main difficulty, as described in the previous paragraph, is the availability of data to fit as many parameters as possible for a specific specimen. As such, the fitting process really varies from experiment to experiment and depends mainly on the richness of measurements taken from a specific specimen, and from the literature in general.

      Working from in-vivo data presents an entirely different set of challenges. When working with human data, for example, it’s just not possible to directly measure muscle force with tendon buckles, and so it is never completely clear how force is distributed across the many muscles that typically actuate a joint. Further, there is also uncertainty in the boundary condition of the muscle because optical motion capture markers will move with respect to the skeleton. Video fluoroscopy offers a method of improving the accuracy of measured boundary conditions, though only for a few labs due to its great expense. A final boundary condition remains impossible to measure in any case: the geometry and forces that act at the boundaries as muscle wraps over other muscles and bones. Fitting to in-vivo data are very difficult.

      While this is an interesting topic, it is tangent to our already lengthy manuscript. Since these reviews are public, we’ll leave it to the motivated reader to find this text here.

      Reviewer #2 (Public Review):

      This model of skeletal muscle includes springs and dampers which aim to capture the effect of crossbridge and titin stiffness during the stretch of active muscle. While both crossbridge and titin stiffness have previously been incorporated, in some form, into models, this model is the first to simultaneously include both. The authors suggest that this will allow for the prediction of muscle force in response to short-, mid- and long-range stretches. All these types of stretch are likely to be experienced by muscle during in vivo perturbations, and are known to elicit different muscle responses. Hence, it is valuable to have a single model which can predict muscle force under all these physiologically relevant conditions. In addition, this model dramatically simplifies sarcomere structure to enable this muscle model to be used in multi-muscle simulations of whole-body movement.

      In order to test this model, its force predictions are compared to 3 sets of experimental data which focus on short-, mid- and long-range perturbations, and to the predictions of a Hill-type muscle model. The choice of data sets is excellent and provide a robust test of the model’s ability to predict forces over a range of length perturbations. However, I find the comparison to a Hill-type muscle model to be somewhat limiting. It is well established that Hill-type models do not have any mechanism by which they can predict the effect of active muscle stretch. Hence, that the model proposed here represents an improvement over such a model is not a surprise. Many other models, some of which are also simple enough to be incorporated into whole-body simulations, have incorporated mechanistic elements which allow for the prediction of force responses to muscle stretch. And it is not clear from the results presented here that this model would outperform such models.

      The paper begins by outlining the phenomenological vs mechanistic approaches taken to muscle modelling, historically. It appears, although is not directly specified, that this model combines these approaches. A somewhat mechanistic model of the response of the crossbridges and titin to active stretch is combined with a phenomenological implementation of force-length and force-velocity relationships. This combination of approaches may be useful improving the accuracy of predictions of muscle models and whole-body simulations, which is certainly a worthy goal. However, it also may limit the insight that can be gained. For example, it does not seem that this model could reflect any effect of active titin properties on muscle shortening. In addition, it is not clear to me, either physiologically or in the model, what drives the shift from the high stiffness in short-range perturbations to the somewhat lower stiffness in mid-range perturbations.

      (1) It is well established that Hill-type models do not have any mechanism by which they can predict the effect of active muscle stretch.

      While many muscle physiologists are aware of the limitations of the Hill model, these limitations are not so well known among computational biomechanists. There are at least two reasons for this gap: there are few comprehensive evaluations of Hill models against several experiments, and some of the differences are quite nuanced. For example, active lengthening experiments can be replicated reasonably well using a Hill model if the lengthening is done on the ascending limb of the force length curve. Clearly the story is quite different on the descending limb as shown in Figure 9. Similarly, as Figure 8 shows, by choosing the right combination of tendon model and perturbation bandwidth it is possible to get reasonably accurate responses from the Hill model to stochastic length changes. Yet when a wide variety of perturbation bandwidths, magnitudes, and tendon models are tested it is clear that the Hill model cannot, in general, replicate the response of muscle to stochastic perturbations. For these reasons we think many of the Hill model’s drawbacks have not been clearly understood by computational biomechanists for many years now.

      (2) Many other models, some of which are also simple enough to be incorporated into whole-body simulations, have incorporated mechanistic elements which allow for the prediction of force responses to muscle stretch. And it is not clear from the results presented here that this model would outperform such models.

      We agree that it will be valuable to benchmark other models in the literature using the same set of experiments. Hopefully we, or perhaps others, will have the good fortune to secure research funding to continue this benchmarking work. This will, however, be quite challenging: few muscle models are accompanied by a professional-quality open-source implementation. Without such an implementation it is often impossible to reproduce published results let alone provide a fair and objective evaluation of a model.

      (3) For example, it does not seem that this model could reflect any effect of active titin properties on muscle shortening.

      The titin model described in the paper will provide an enhancement of force during a stretch-shortening cycle. This certainly would be an interesting next experiment to simulate in a future paper.

      (4) In addition, it is not clear to me, either physiologically or in the model, what drives the shift from the high stiffness in short-range perturbations to the somewhat lower stiffness in mid-range perturbations.

      We can only respond to what drives the frequency dependent stiffness in the model, though we’re quite interested in what happens physiologically. Hopefully that there are some new experiments done to examine this phenomena in the future. In the case of the model, the reasons are pretty straight forward: the formulation of Eqn. 16 is responsible for this shift.

      Equation 16 has been formulated so that the acceleration of the attachment point of the XE is driven by the force difference between the XE and a reference Hill model (numerator of the first term in Eqn. 16) which is then low pass filtered (denominator of the first term in Eqn. 16). Due to this formulation the attachment point moves less when the numerator is small, or when the differences in the numerator change rapidly and effectively become filtered out. When the attachment point moves less, more of the CE’s force output is determined by variations in the length of the XE and its stiffness.

      On the other hand, the attachment point will move when the numerator of the first term in Eqn. 16 is large, or when those differences are not short lived. When the attachment point moves to reduce the strain in the XE, the force produced by the XE’s spring-damper is reduced. As a result, the CE’s force output is less influenced by variations of the length of the XE and its stiffness.

      Reviewer #2 (Recommendations for the Authors):

      I find the clarity of the manuscript to be much improved following revision. While I still find the combination of phenomenological and mechanistic approaches to be a little limiting with regards to our understanding of muscle contraction, the revised description of small length changes makes the interpretation much less confusing.

      Similarly, while I agree that Hill-type models are widely used their limitations have been addressed extensively and are very well established. Hence, moving forward I think it would be much more valuable to start to compare these newer models to one another rather than just showing an improvement over a Hill model under (very biologically important) conditions which that model has no capacity to predict forces.

      (1) While I still find the combination of phenomenological and mechanistic approaches to be a little limiting with regards to our understanding of muscle contraction ...

      We have had to abstract some of the details of reality to have a model that can be used to simulate hundreds of muscles. In contrast, FiberSim produced by Kenneth Campbell’s group uses much less abstraction and might be of greater interest to you. FiberSim’s models include individual cross-bridges, titin molecules, and an explicit representation of the spatial geometry of a sarcomere. While this model is a great tool for testing muscle physiology questions through simulation, it is computationally expensive to use this model to simulate hundreds of muscles simultaneously.

      Kosta S, Colli D, Ye Q, Campbell KS. FiberSim: A flexible open-source model of myofilament-level contraction. Biophysical journal. 2022 Jan 18;121(2):175-82.https://campbell-muscle-lab.github.io/FiberSim/

      (2) Similarly, while I agree that Hill-type models are widely used their limitations have been addressed extensively and are very well established.

      Please see our response 1 to Reviewer # 1.

      (3) Hence, moving forward I think it would be much more valuable to start to compare these newer models to one another rather than just showing an improvement over a Hill model under (very biologically important) conditions which that model has no capacity to predict forces.

      Please see our response to 2 to Reviewer #1.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      In the paper by Choi et al., the authors aimed to develop base editing strategies to convert CAG repeats to CAA repeats in the huntingtin gene (HTT), which causes Huntington's disease (HD). They hypothesized that this conversion would delay disease onset by shortening the uninterrupted CAG repeat. Using HEK-293T cells as a model, the researchers employed cytosine base editors and guide RNAs (gRNAs) to efficiently convert CAG to CAA at various sites within the CAG repeat. No significant indels, off-target edits, transcriptome alterations, or changes in HTT protein levels were detected. Interestingly, somatic CAG repeat expansion was completely abolished in HD knock-in mice carrying CAA-interrupted repeats. 

      Correction of factual errors

      We analyzed HEK293 cells, not "HEK-293T".

      Strengths: 

      This study represents the first proof-of-concept exploration of the cytosine base editing technique as a potential treatment for HD and other repeat expansion disorders with similar mechanisms. 

      Weaknesses: 

      Given that HD is a neurodegenerative disorder, it is crucial to determine the efficiency of the base editing strategies tested in this manuscript and their feasibility in relevant cells affected by HD and the brain, which needed to be improved in this manuscript. 

      We appreciate the reviewer's constructive recommendations. Our genetic investigation focused on understanding observations in HD patients to develop genetic-based treatment strategies and test their feasibility. We agree with the reviewer regarding the importance of data from relevant cell types. Unfortunately, the levels of CAG-to-CAA conversion in the patient-derived neurons were modest, as described in our manuscript (approximately 2%). In addition, AAV did not produce detectable conversions in the brain of HD knock-in mice (data not shown), which was somewhat expected from the literature (PMID: 31937940). We believe some technical hurdles can be overcome by developing efficient delivery methods. Nonetheless, it will be an important follow-up study to perform preclinical studies employing optimized base editing strategies and efficient brain delivery methods to fully demonstrate the therapeutic potential of BE strategies. 

      Reviewer #2 (Public Review):

      Summary: 

      In a proof-of-concept study with the aspiration of developing a treatment to delay HD onset, Choi et al. design and test an A>G DNA base editing strategy to exploit the recently established inverse relationship between the number of uninterrupted CAG repeats in polyglutamine repeat expansions and the age-of-onset of Huntington's Disease (HD). Most of the study is devoted to optimizing a base editing strategy typified by BE4max and gRNA2. The base editing is performed in human HEK293 cells engineered with a 51 CAG canonical repeat and in HD knock-in mice harboring 105+ CAG repeats. 

      Correction of factual errors

      We tested base editing strategies aimed at C > T conversion, not A > G DNA base editing. In addition to HEK293 and knock-in mice, we tested base editing strategies in patient-derived iPSC and neurons.

      Weaknesses: 

      Genotypic data on DNA editing are not portrayed in a clear manner consistent with the study's goal, namely reducing the number of uninterrupted CAG repeats by a clinically relevant amount according to the authors' least square approximated mean age-at-onset. No phenotypic data are presented to show that editing performed in either model would lead to reduced hallmarks of HD onset. 

      More evidence is needed to support the central claims and therapeutic potential needs to be more adequate. 

      Our strategies for converting CAG to CAA in model systems resulted in quantitative DNA modification in a population of cells. Consequently, individual cells may carry different genotypes, some harboring CAA and others CAG at the same genomic location. Therefore, using a standard genotype format for DNA to present base editing outcomes may not be ideal. Instead, we presented the resulting genotype data in a quantitative fashion to provide the percentage of conversion at each site. This approach allows for an intuitive interpretation of both the extent of repeat length reduction and the proportion of such modifications.

      Currently, genetically precise HD mouse models with robust motor and behavioral phenotypes are unavailable. While some HD mouse models, such as the BAC and YAC models, feature pronounced behavioral phenotypes, they consist of interrupted CAG repeat sequences, making them unsuitable for base conversion studies due to their inherently short uninterrupted repeats. Although genetically precise HD knockin mouse models exist, they do not manifest motor symptom-like phenotypes. Given that CAG repeat expansion is the primary driver of the disease and knock-in mice recapitulate such phenomenon, our genetic investigation focused on assessing the effects of base conversion on CAG repeat instability in knock-in mice. However, as emphasized by the reviewer, subsequent preclinical studies to evaluate the therapeutic efficacy of CAG-to-CAA conversion strategies using mouse models harboring uninterrupted adult-onset CAG repeats and robust HD-like phenotypes remain crucial.

      Reviewer #3 (Public Review):

      Summary: 

      In human patients with Huntington's disease (HD), caused by a CAG repeat expansion mutation, the number of uninterrupted CAG repeats at the genomic level influences age-at-onset of clinical signs independent of the number of polyglutamine repeats at the protein level. In most patients, the CAG repeat terminates with a CAACAG doublet. However, CAG repeat variants exist that either do not have that doublet or have two doublets. These variants consequently differ in their number of uninterrupted CAG repeats, while the number of glutamine repeats is the same as both CAA and CAG codes for glutamine. The authors first confirm that a shorter uninterrupted CAG repeat number in human HD patients is associated with developing the first clinical signs of HD later. They predict that introducing a further CAA-CAG doublet will result in years of delay of clinical onset. Based on this observation, the authors tested the hypothesis that turning CAG to CAA within a CAG repeat sequence using base editing techniques will benefit HD biology. They show that, indeed, in HD cell models (HEK293 cells expressing 16/17 CAG repeats; a single human stem cell line carrying a CAG repeat expansion in the fully penetrant range with 42 CAG repeats), their base editing strategies do induce the desired CAG-CAA conversion. The efficiency of conversion differed depending on the strategy used. In stem cells, delivery posed a problem, so to test allele specificity, the authors then used a HEK 293 cell line with 51 CAG repeats on the expanded allele. Conversion occurred in both alleles with huntingtin protein and mRNA levels; transcriptomics data was unchanged. In knock-in mice carrying 110 CAG repeats, however, base editing did not work as well for different, mainly technical, reasons. 

      Correction of factual errors

      "HD cell models (HEK293 cells expressing 16/17 CAG repeats" is an incorrect description. It should be "HD cell models (HEK293 cells expressing 51/17 CAG repeats".

      Strengths: 

      The authors use state-of-the-art methods and carefully and thoroughly designed experiments. The data support the conclusions drawn. This work is a very valuable translation from the insight gained from large GWAS studies into HD pathogenesis. It rightly emphasises the potential this has as a causal treatment in HD, while the authors also acknowledge important limitations. 

      Weaknesses: 

      They could dedicate a little more to discussing several of the mentioned challenges. The reader will better understand where base editing is in HD currently and what needs to be done before it can be considered a treatment option. For instance, 

      - It is important to clarify what can be gained by examining again the relationship between uninterrupted CAG repeat length and age-at-onset. Could the authors clarify why they do this and what it adds to their already published GWAS findings? What is the n of datasets? 

      Published HD GWAS (PMID: 31398342) compared the onset age of duplicated interruption and loss of interruption to that of canonical repeats to determine whether uninterrupted CAG repeat or polyglutamine determines age at onset. However, GWAS findings did not quantify the magnitude of the unexplained remaining variance in age at onset in duplicated interruption and loss of interruption. Our study further investigated to gain insights into the amount of additional impact of duplicated interruption to estimate the maximum clinical benefits of base editing strategies for CAG-to-CAA conversion. Since the purpose of this genetic analysis is described in the result section already, we added the following sentence in the introduction section to bring up what is unknown. 

      "Still, age at onset of loss of interruption and duplicated interruption was not fully accounted for by uninterrupted CAG repeat, suggesting additional effects of non-canonical repeats."

      We added sample size for the least square approximation analysis in the text and corresponding figure legend. Sample sizes for molecular and animal experiments can be found in the corresponding figure legend.

      - What do they think an ideal conversion rate would be, and how that could be achieved? 

      It is a very important question. However, speculating the ideal conversion levels is out of the scope of this genetic investigation. A series of preclinical studies using relevant models may generate data that may shed light on the conversion rate levels that are required to produce meaningful clinical benefits. In the discussion section, we added the following sentence. 

      "Currently, the ideal levels of CAG-to-CAA conversion that produce significant clinical benefits are unknown. A series of preclinical studies using relevant model systems may generate data that may shed light on the optimal conversion rate levels that are required to produce significant clinical benefits."

      - Is there a dose-effect relationship for base editing, and would it be realistic to achieve the ideal conversion rate in target cells, given the difficulties described by the authors in differentiated neurons from stem cells? 

      We observed a clear dose-response relationship between the amount of BE reagents and the levels of conversion in non-neuronal cells. Unfortunately, the conversion rate was low in neuronal cells, potentially due to limited delivery, as speculated in the result section. As described in the discussion sections, we predict that efficient delivery methods will be crucial to produce significant CAG-to-CAA conversion to achieve therapeutic benefits.

      - The liver is a good tool for in-vivo experiments examining repeat instability in mouse models. However, the authors could comment on why they did not examine the brain.

      We focused on liver instability because of 1) the expectation that delivery/targeting efficiency is significantly lower in the brain (PMID: 31937940) and 2) shared underlying mechanisms between the brain and liver (described in the result section). The following sentence was added in the method section to provide a rationale for liver analysis. 

      "Since significantly lower delivery/targeting efficiency was expected in the brain 34, we focused on analyzing liver instability."

      - Is there a limit to judging the effects of base editing on somatic instability with longer repeats, given the difficulties in measuring long CAG repeat expansions? 

      Determining the levels of base conversion using sequencing technologies gets harder as repeats become longer. Fragment analysis can overcome such technical difficulty if conversion efficiency is high. As pointed out, the repeat expansion measure is also challenging because amplification is biased toward shorter alleles. However, if repeat sizes are relatively similar, the levels of repeat expansion as a function of base conversion can be determined relatively precisely without a significant bias by a standard fragment analysis approach. 

      - Given the methodological challenges for assessing HTT fragments, are there other ways to measure the downstream effects of base editing rather than extrapolate what it will likely be?

      Our CAG-to-CAA conversion strategies are not expected to directly generate fragments of huntingtin DNA, RNA, or protein. In contrast, immediate downstream effects of CAG-to-CAA conversion include sequence changes (DNA and RNA) and alteration of repeat instability, which are presented in the manuscript. If repeat instability is associated with HTT exon 1A fragment, base conversion strategies may indirectly alter the levels of such putative toxic species, which remains to be determined.  

      - Sequencing errors could mask low-level, but biologically still relevant, off-target effects (such as gRNAdependent and gRNA-independent DNA, Off-targets, RNA off-targets, bystander editing). How likely is that? 

      We agree with the reviewer that increased editing efficiency is expected to increase the levels of off-target editing. However, the field is actively developing base editors with minimal off-target effect (PMID: 35941130), which will increase the safety aspects of this technology for clinical use. We added the following sentence.  "In addition, developing base editors with high level on-target gene specificity and minimal off-target effects is a critical aspect to address 100."

      - How worried are the authors about immune responses following base editing? How could this be assessed? 

      We added the following sentence in the discussion section as the reviewer raised an important safety issue.  

      "Thorough assessments of immune responses against base editing strategies (e.g., development of antibody, B cell, and T cell-specific immune responses) and subsequent modification (e.g., immunosilencing) 101 will be critical to address immune response-associated safety issues of BE strategies."

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The following points could be considered to improve the overall quality of the manuscript: 

      (1) The authors mentioned that the reason for checking repeat instability in the nonneuronal cells was due to the availability of specific types of AAV; there are other subtypes of AAVs available to infect neurons and iPSCs. 

      Our pilot experiments testing several AAV serotypes in patient-derived iPSC and HD knock-in mice showed that only AAV9 converted CAG to CAA at detectable levels in the liver, not in the brain or neurons. We also speculate that difficulties in targeting the CAG repeat region due to GC-rich sequence contributed to low conversion efficiency. Therefore, subsequent optimization of base editor and delivery may improve BE strategies for HD, permitting robust conversion at the challenging locus. 

      (2) Despite its bold nature, minimal data in the manuscript demonstrate that this gene editing strategy is disease-modifying.

      Resources required to demonstrate the therapeutic benefits of CAG-to-CAA conversion strategies are not fully available. Especially, relevant HD mouse models that carry uninterrupted adult onset CAG repeat and that permit measuring the levels of disease-modifying are lacking, as described in our response to the second reviewer. Given that CAG repeat expansion is the primary driver of the disease, this genetic investigation focused on determining the impacts of base editing strategies on CAG repeat expansion. Still, as indicated by the reviewer, follow-up preclinical studies to evaluate the levels of disease-modifying of CAG-to-CAA conversion strategies using relevant mouse models represent important next steps.

      (3) Off-target analysis at the DNA level was limited to "predicted" off-target sites. What about possible translocations that can result from co-nicking on different chromosomes, as a large number of potential targets exist? 

      Among gRNAs we tested, we focused on gRNAs 1 and 2, which predicted small numbers of off-target. Therefore, our off-target analysis at the DNA level was focused on validating those predicted off-targets. As pointed out, thoroughly evaluating off-target effects will be necessary when candidate BE strategies take the next steps for therapeutic development.

      Genomic translocation caused by double-strand breaks can produce negative consequences, such as cancer. Importantly, although paired nicks efficiently induced translocations, translocations were not detected when a single nick was introduced on each chromosome (PMID: 25201414). Therefore, it is predicted that BE strategies using nickase confers little risk of translocation.

      (4) For in vivo work, somatic repeat expansion was analyzed only in peripheral tissue samples. Since the main affected cellular population in HD is the brain, the outcome of this treatment on a disease-relevant organ still needs to be determined. 

      Challenges in delivery to the brain made us determine instability in the liver since many mechanistic components of somatic CAG repeat instability are shared between the liver and striatum, as rationalized in the manuscript. However, we agree with the reviewer regarding the importance of determining the effects of base conversion on brain instability. We added the following sentence in the method section to provide a rationale. "Since significantly lower delivery/targeting efficiency was expected in brain 34, we focused on analyzing liver instability."

      Reviewer #2 (Recommendations For The Authors):

      Throughout the manuscript, the authors apologize for techniques that do not work when workarounds seem readily apparent to an expert in the field. In its current form, the manuscript reads verbose, speculative, apologetic, and preliminary. 

      Drug development programs that are supported by human genetics data show increased success rates in clinical trials (PMID: 26121088, 31827124, 31830040). This is why this genetic study focused on 1) investigating observations in HD subjects and 2) subsequently developing treatment strategies that are supported by patient genetics. As the first illustration of base editing in HD, the main scope of our manuscript is to justify the genetic rationale of CAG-to-CAA conversion and demonstrate the feasibility of therapeutic strategies rooted in patient genetics. As our study was not aimed at entirely demonstrating the clinical benefits of base editing strategies in HD, some of our data were based on tools and approaches that were not fully optimal. We agree with the reviewer that it will be an important next step to employ optimized approaches to evaluate the efficacy of base editing strategies in model systems. Nevertheless, our novel base conversion strategies derived from HD patient genetics represent a significant advancement as they may contribute to developing effective treatments for this devastating disorder. 

      Reviewer#3 (Recommendations For The Authors):

      It would make for an easier read if abbreviations were kept to a minimum. 

      As recommended, we decreased the use of abbreviations. The following has been spelled out throughout the manuscript: CR (canonical repeat), LI (loss of interruption), DI (duplicated interruption), and CBE (cytosine base editor). Other abbreviations with infrequent usage (e.g., ABE, SS, QC) were also spelled out in the text.

    1. Reviewer #2 (Public Review):

      Summary:

      The authors had two aims: First, to decompose the attentional blink (AB) deficit into the two components of signal detection theory; sensitivity and bias. Second, the authors aimed to assess the two subcomponents of sensitivity; detection and discrimination. They observed that the AB is only expressed in sensitivity. Furthermore, detection and discrimination were doubly dissociated. Detection modulated N2p and P3 ERP amplitude, but not frontoparietal beta-band coherence, whereas this pattern was reversed for discrimination.

      Strengths:

      The experiment is elegantly designed, and the data - both behavioral and electrophysiological - are aptly analyzed. The outcomes, in particular the dissociation between detection and discrimination blinks, are consistently and clearly supported by the results. The discussion of the results is also appropriately balanced.

      Weaknesses:

      The lack of an effect of stimulus contrast does not seem very surprising from what we know of the nature of AB already. Low-level perceptual factors are not thought to cause AB. This is fine, as there are also other, novel findings reported, but perhaps the authors could bolster the importance of these (null) findings by referring to AB-specific papers, if there are indeed any, that would have predicted different outcomes in this regard.

      On an analytical note, the ERP analysis could be finetuned a little more. The task design does not allow measurement of the N2pc or N400 components, which are also relevant to the AB, but the N1 component could additionally be analyzed. In doing so, I would furthermore recommend selecting more lateral electrode sites for both the N1, as well as the P1. Both P1 and N1 are likely not maximal near the midline, where the authors currently focused their P1 analysis.

      Impact & Context:

      The results of this study will likely influence how we think about selective attention in the context of the AB phenomenon. However, I think its impact could be further improved by extending its theoretical framing. In particular, there has been some recent work on the nature of the AB deficit, showing that it can be discrete (all-or-none) and gradual (Sy et al., 2021; Karabay et al., 2022, both in JEP: General). These different faces of target awareness in the AB may be linked directly to the detection and discrimination subcomponents that are analyzed in the present paper. I would encourage the authors to discuss this potential link and comment on the bearing of the present work on these previous behavioral findings.

    2. Author response:

      Reviewer #1: 

      Summary:

      In this study, the authors used a multi-alternative decision task and a multidimensional signal-detection model to gain further insight into the cause of perceptual impairments during the attentional blink. The model-based analyses of behavioural and EEG data show that such perceptual failures can be unpacked into distinct deficits in visual detection and discrimination, with visual detection being linked to the amplitude of late ERP components (N2P and P3) and discrimination being linked to the coherence of fronto-parietal brain activity.

      Strengths:

      The main strength of this paper lies in the fact that it presents a novel perspective on the cause of perceptual failures during the attentional blink. The multidimensional signaldetection modelling approach is explained clearly, and the results of the study show that this approach offers a powerful method to unpack behavioural and EEG data into distinct processes of detection and discrimination.

      Weaknesses:

      (1.1) While the model-based analyses are compelling, the paper also features some analyses that seem misguided, or, at least, insufficiently motivated and explained. Specifically, in the introduction, the authors raise the suggestion that the attentional blink could be due to a reduction in sensitivity or a response bias. The suggestion that a response bias could play a role seems misguided, as any response bias would be expected to be constant across lags, while the attentional blink effect is only observed at short lags. Thus, it is difficult to understand why the authors would think that a response bias could explain the attentional blink.

      A deficit in T2 identification accuracy could arise from either sensitivity or criterion effects; the criterion effect may manifest as a choice bias. For example, in short T1-T2 lag trials, when T2 closely follows T1, participants may adopt a more conservative choice criterion for reporting the presence of T2. Moreover, criterion effects need not be uniform across lags: A participant could infer the T1-T2 lag interval based on various factors, including trial length, thereby permitting them to adjust their choice criterion variably across different lags. We will provide a more detailed illustration of this claim in the revision.

      (1.2) A second point of concern regards the way in which the measures for detection and discrimination accuracy were computed. If I understand the paper correctly, a correct detection was defined as either correctly identifying T2 (i.e., reporting CW or CCW if T2 was CW or CCW, respectively, see Figure 2B), or correctly reporting T2's absence (a correct rejection). Here, it seems that one should also count a misidentification (i.e., incorrect choice of CW or CCW when T2 was present) as a correct detection, because participants apparently did detect T2, but failed to judge/remember its orientation properly in case of a misidentification. Conversely, the manner in which discrimination performance is computed also raises questions. Here, the authors appear to compute accuracy as the average proportion of T2-present trials on which participants selected the correct response option for T2, thus including trials in which participants missed T2 entirely. Thus, a failure to detect T2 is now counted as a failure to discriminate T2. Wouldn't a more proper measure of discrimination accuracy be to compute the proportion of correct discriminations for trials in which participants detected T2?

      Detection and discrimination accuracies were computed with precisely the same procedure, and under the same conditions, as described by the Reviewer (underlined text, above). We regret our poor description; we will improve upon it in the revised manuscript.

      (1.3) My last point of critique is that the paper offers little if any guidance on how the inferred distinction between detection and discrimination can be linked to existing theories of the attentional blink. The discussion mostly focuses on comparisons to previous EEG studies, but it would be interesting to know how the authors connect their findings to extant, mechanistic accounts of the attentional blink. A key question here is whether the finding of dissociable processes of detection and discrimination would also hold with more meaningful stimuli in an identification task (e.g., the canonical AB task of identifying two letters shown amongst digits). There is evidence to suggest that meaningful stimuli are categorized just as quickly as they are detected (Grill-Spector & Kanwisher, 2005; Grill-Spector K, Kanwisher N. Visual recognition: as soon as you know it is there, you know what it is. Psychol Sci. 2005 Feb;16(2):152-60. doi: 10.1111/j.0956-7976.2005.00796.x. PMID: 15686582.). Does that mean that the observed distinction between detection and discrimination would only apply to tasks in which the targets consist of otherwise meaningless visual elements, such as lines of different orientations?

      Our results are consistent with previous literature suggested by the Reviewer. Specifically, we do not claim that detection and discrimination are sequential processes; in fact, we modeled them as concurrent computations (Figs. 3A-B). Yet, our results suggest that these processes possess distinct neural bases. We have discussed this idea briefly in the Discussion section (e.g., “Yet, we found no evidence for these two computations being sequential…”). We will discuss this further in the revised manuscript in the context of previous literature.

      Reviewer #2:

      Summary:

      The authors had two aims: First, to decompose the attentional blink (AB) deficit into the two components of signal detection theory; sensitivity and bias. Second, the authors aimed to assess the two subcomponents of sensitivity; detection and discrimination. They observed that the AB is only expressed in sensitivity. Furthermore, detection and discrimination were doubly dissociated. Detection modulated N2p and P3 ERP amplitude, but not frontoparietal beta-band coherence, whereas this pattern was reversed for discrimination.

      Strengths:

      The experiment is elegantly designed, and the data - both behavioral and electrophysiological - are aptly analyzed. The outcomes, in particular the dissociation between detection and discrimination blinks, are consistently and clearly supported by the results. The discussion of the results is also appropriately balanced.

      Weaknesses:

      (2.1) The lack of an effect of stimulus contrast does not seem very surprising from what we know of the nature of AB already. Low-level perceptual factors are not thought to cause AB. This is fine, as there are also other, novel findings reported, but perhaps the authors could bolster the importance of these (null) findings by referring to AB-specific papers, if there are indeed any, that would have predicted different outcomes in this regard.

      While there is consensus that the low-level perceptual factors are not affected by the attentional blink, other studies may suggest evidence to the contrary (e.g., Chua et al, Percept. Psychophys., 2005). We will highlight the significance of our findings in the context of such conflicting evidence in literature, in the revised manuscript.

      (2.2) On an analytical note, the ERP analysis could be finetuned a little more. The task design does not allow measurement of the N2pc or N400 components, which are also relevant to the AB, but the N1 component could additionally be analyzed. In doing so, I would furthermore recommend selecting more lateral electrode sites for both the N1, as well as the P1. Both P1 and N1 are likely not maximal near the midline, where the authors currently focused their P1 analysis.

      We will incorporate these additional analyses in the revised manuscript.

      (2.3) Impact & Context:

      The results of this study will likely influence how we think about selective attention in the context of the AB phenomenon. However, I think its impact could be further improved by extending its theoretical framing. In particular, there has been some recent work on the nature of the AB deficit, showing that it can be discrete (all-or-none) and gradual (Sy et al., 2021; Karabay et al., 2022, both in JEP: General). These different faces of target awareness in the AB may be linked directly to the detection and discrimination subcomponents that are analyzed in the present paper. I would encourage the authors to discuss this potential link and comment on the bearing of the present work on these behavioural findings.

      Thank you. We will discuss our findings in the context of these recent studies.

      Reviewer #3:

      Summary:

      In the present study, the authors aimed to achieve a better understanding of the mechanisms underlying the attentional blink, that is, a deficit in processing the second of two target stimuli when they appear in rapid succession. Specifically, they used a concurrent detection and identification task in- and outside of the attentional blink and decoupled effects of perceptual sensitivity and response bias using a novel signal detection model. They conclude that the attentional blink selectively impairs perceptual sensitivity but not response bias, and link established EEG markers of the attentional blink to deficits in stimulus detection (N2p, P3) and discrimination (fronto-parietal high-beta coherence), respectively. Taken together, their study suggests distinct mechanisms mediating detection and discrimination deficits in the attentional blink.

      Strengths:

      Major strengths of the present study include its innovative approach to investigating the mechanisms underlying the attentional blink, an elegant, carefully calibrated experimental paradigm, a novel signal detection model, and multifaceted data analyses using state-of-theart model comparisons and robust statistical tests. The study appears to have been carefully conducted and the overall conclusions seem warranted given the results. In my opinion, the manuscript is a valuable contribution to the current literature on the attentional blink. Moreover, the novel paradigm and signal detection model are likely to stimulate future research.

      Weaknesses:

      Weaknesses of the present manuscript mainly concern the negligence of some relevant literature, unclear hypotheses, potentially data-driven analyses, relatively low statistical power, potential flaws in the EEG methods, and the absence of a discussion of limitations. In the following, I will list some major and minor concerns in detail.

      Major points

      (3.1) Hypotheses:

      I appreciate the multifaceted, in-depth analysis of the given dataset including its high amount of different statistical tests. However, neither the Introduction nor the Methods contain specific statistical hypotheses. Moreover, many of the tests (e.g., correlations) rely on selected results of previous tests. It is unclear how many of the tests were planned a priori, how many more were performed, and how exactly corrections for multiple tests were implemented. Thus, I find it difficult to assess the robustness of the results.

      As outlined in the Introduction, we hypothesized that neural computations associated with target detection would be characterized by regional neuronal markers (e.g., parietal or occipital ERPs), whereas computations linked to feature discrimination may involve neural coordination across multiple brain regions (e.g. fronto-parietal coherence). We planned and conducted our statistical tests based on this hypothesis. All multiple comparison corrections (e.g., Bonferroni-Holm correction, see Methods) were performed separately for each class of analyses. We will clarify these hypotheses and provide further details in the revised manuscript.

      (3.2) Power:

      Some important null findings may result from the rather small sample sizes of N = 24 for behavioral and N = 18 for ERP analyses. For example, the correlation between detection and discrimination d' deficits across participants (r=0.39, p=0.059) (p. 12, l. 263) and the attentional blink effect on the P1 component (p=0.050, no test statistic) (p. 14, 301) could each have been significant with one more participant. In my opinion, such results should not be interpreted as evidence for the absence of effects.

      We agree and will revise the manuscript accordingly. We will also report Bayes factor (BF) values, where relevant, to further evaluate these claims.

      (3.3) Neural basis of the attentional blink:

      The introduction (e.g., p. 4, l. 56-76) and discussion (e.g., p. 19, 427-447) do not incorporate the insights from the highly relevant recent review by Zivony & Lamy (2022), which is only cited once (p. 19, l. 428). Moreover, the sections do not mention some relevant ERP studies of the attentional blink (e.g., Batterink et al., 2012; Craston et al., 2009; Dell'Acqua et al., 2015; Dellert et al., 2022; Eiserbeck et al., 2022; Meijs et al., 2018).

      We will motivate and discuss our study in the context of these previous studies. 

      (3.4) Detection versus discrimination:

      Concerning the neural basis of detection versus discrimination (e.g., p. 6, l. 98-110; p. 18, l. 399-412), relevant existing literature (e.g., Broadbent & Broadbent, 1987; Hillis & Brainard, 2007; Koivisto et al., 2017; Straube & Fahle, 2011; Wiens et al., 2023) is not included.

      Thank you for these suggestions. We will include these important studies in our discussion.

      (3.5) Pooling of lags and lags 1 sparing:

      I wonder why the authors chose to include 5 different lags when they later pooled early (100, 300 ms) and late (700, 900 ms) lags, and whether this pooling is justified. This is important because T2 at lag 1 (100 ms) is typically "spared" (high accuracy) while T2 at lag 3 (300 ms) shows the maximum AB (for reviews, see, e.g., Dux & Marois, 2009; Martens & Wyble, 2010). Interestingly, this sparing was not observed here (p. 43, Figure 2). Nevertheless, considering the literature and the research questions at hand, it is questionable whether lag 1 and 3 should be pooled.

      Lag-1 sparing is not always observed in attentional blink studies; there are notable exceptions that do not report such sparing (Hommel et al., Q. J. Exp. Psychol., 2005; Livesay et al., Attention, Percept. Psychophys., 2011). Our statistical tests revealed no significant difference in accuracies between short lag (100 and 300 ms) trials or between long lag (700 and 900 ms) trials but did reveal significant differences between the short and long lag trials (ANOVA, followed by post-hoc tests). To simplify the presentation of the findings, we pooled together the short lag (100 and 300 ms) and, separately, the long lag (700 and 900 ms) trials. We will present these analyses, and clarify the motivation for pooling in the revised manuscript. 

      (3.6) Discrimination in the attentional blink

      Concerning the claims that previous attentional blink studies conflated detection and discrimination (p. 6, l. 111-114; p. 18, l. 416), there is a recent ERP study (Dellert et al., 2022) in which participants did not perform a discrimination task for the T2 stimuli. Moreover, since the relevance of all stimuli except T1 was uncertain in this study, irrelevant distractors could not be filtered out (cf. p. 19, l. 437). Under these conditions, the attentional blink was still associated with reduced negativities in the N2 range (cf. p. 19, l. 427-437) but not with a reduced P3 (cf. p. 19, l 439-447).

      We will address the difference between our findings and those of Dellert et al (2022) in the revised manuscript.

      (3.7) General EEG methods:

      While most of the description of the EEG preprocessing and analysis (p. 31/32) is appropriate, it also lacks some important information (see, e.g., Keil et al., 2014). For example, it does not include the length of the segments, the type and proportion of artifacts rejected, the number of trials used for averaging in each condition, specific hypotheses, and the test statistics (in addition to p-values).

      We regret the oversight. We will include these details in the revised Methods.

      (3.8) EEG filters:

      P. 31, l. 728: "The data were (...) bandpass filtered between 0.5 to 18 Hz (...). Next, a bandstop filter from 9-11 Hz was applied to remove the 10 Hz oscillations evoked by the RSVP presentation." These filter settings do not follow common recommendations and could potentially induce filter distortions (e.g., Luck, 2014; Zhang et al., 2024). For example, the 0.5 high-pass filter could distort the slow P3 wave. Mostly, I am concerned about the bandstop filter. Since the authors commendably corrected for RSVP-evoked responses by subtracting T2-absent from T2-present ERPs (p. 31, l. 746), I wonder why the additional filter was necessary, and whether it might have removed relevant peaks in the ERPs of interest.

      Thank you for this suggestion. We will repeat this analysis by removing these additional filters.

      (3.9) Coherence analysis:

      P. 33, l. 786: "For subsequent, partial correlation analyses of coherence with behavioral metrics and neural distances (...), we focused on a 300 ms time period (0-300 ms following T2 onset) and high-beta frequency band (20-30 Hz) identified by the cluster-based permutation test (Fig. 5A-C)." I wonder whether there were any a priori criteria for the definition and selection of such successive analyses. Given the many factors (frequency bands, hemispheres) in the analyses and the particular shape of the cluster (p. 49, Fig 5C), this focus seems largely data-driven. It remains unclear how many such tests were performed and whether the results (e.g., the resulting weak correlation of r = 0.22 in one frequency band and one hemisphere in one part of a complexly shaped cluster; p. 15, l. 327) can be considered robust.

      Please see responses to comments #3.1 and #3.2 (above). In addition to reporting further details regarding statistical tests and multiple comparisons corrections, we will compute and report Bayes factors to quantify the strength of the evidence for correlations, as appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The current manuscript provides an extensive in vivo analysis of two guidance pathways identifying multiple mechanisms that shape the bifurcation of DRG axons when forming the dorsal funiculus in the DREZ. 

      Strengths: 

      Multiple mouse mutant lines were used, together with complementary techniques; the results are very clear and compelling. 

      The findings are very significant and clearly move forward our understanding of the regulation of axonal development at the DREZ. 

      Weaknesses: 

      No major weaknesses were found. As it is I have no recommendations that would increase the clarity or quality of the manuscript. 

      Reviewer #2 (Public Review):

      Summary: 

      In this manuscript, the authors conduct a detailed analysis of the molecular cues that control the guidance of bifurcated dorsal root ganglion axons in a key region of the spinal cord called the dorsal funiculus. This is a specific case of axon guidance that occurs in a precise way. The authors knew that Slit was important but many axons still target correctly in Slit knockouts, suggesting a role for other guidance factors. Netrin1 is also expressed in this region, so they looked at netrin mutants. The authors found axons outside the DREZ in the Ntn1 mutants, and they show by single-neuron genetic labeling that many of these come from DRG neurons. Quantified axonal tracing studies in Slit1/2, Ntn1, or triple mutant embryos support the idea that Slit and Ntr1 have distinct functions in guidance and that the effect of their loss is additive. Interestingly none of these knockouts affect bifurcation itself but rather the guidance of one or both of the bifurcated axon terminals. Knockout of the Slit receptors (Robo1/2) or the Netrin 1 receptor (DCC) in embryos causes similar guidance defects to loss of the ligands, providing additional confirmation of the requirement for both guidance pathways. 

      Strengths: 

      This study expands understanding of the role of the axon guidance factors Ntr1/DCC and Slit/Robo in a specific axon guidance decision. The strength of the study is the careful axonal labeling and quantification, which allows the authors to establish precise consequences of the loss of each guidance factor or receptor. 

      Weaknesses: 

      There are some places in the text where the discussion of these data is compared with other studies and models, but additional details would help clarify the arguments. 

      The details were added to the first section of Discussion in the revision to address this weakness.  Also see the response to the recommendations below.

      Reviewer #3 (Public Review):

      Summary: 

      In this paper, Curran et al investigate the role of Ntn, Slit1, and Slit 2 in the axon patterning of DRG neurons. The paper uses mouse genetics to perturb each guidance molecule and its corresponding receptor. Cre-based approaches and immunostaining of DRG neurons are used to assess the phenotypes. Overall, the study uses the strength of mouse genetics and imaging to reveal new genetic modifiers of DRG axons. The conclusions of the experiments match the presented results. The paper is an important contribution to the field, as evidence that dorsal funiculus formation is impacted by Ntn and Slit signaling. However, there are some potential areas of the manuscript that should be edited to better match the results with the conclusions of the work. 

      Strengths: 

      The manuscript uses the advantage of mouse genetics to investigate the axon patterning of DRG neurons. The work does a great job of assessing individual phenotypes in single and double mutants. This reveals an intriguing cooperative and independent function of Ntn, Slit1, and Slit2 in DRG axon patterning. The sophisticated triple mutant analysis is lauded and provides important insight. 

      Weaknesses: 

      Overall, the manuscript is sound in technique and analysis. However, the majority of the manuscript is about the dorsal funiculus and not the bifurcation of the axons, as the title would make a reader believe. Further, the manuscript would provide a more scholarly discussion of the current knowledge of DRG axon patterning and how their work fits into that knowledge. 

      We revised the title as suggested.  Additional discussion of DRG axon growth at the DREZ is added to the last section of the Discussion in the revision.  Also see the response to the recommendations below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Given the reasons stated above, I have no specific recommendations for the authors. 

      There is a typo in the Abstract (... mice with triple deletion of Ntn1, Slit2, and Slit2....). 

      Corrected in the revision.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors twice repeated that their data on DRG guidance defects in the Ntn1 mutants differ from studies previously published in references 19 and 26. However it is unclear to me, without having read those other studies, what is actually different between this study and those, and why there would be differences between the results from two groups. If the authors think this is an important point to make they need to more clearly say what the other group saw and offer an explanation of why the data may be different. 

      We added detailed comparison of the defects from different studies to the first section of the Discussion and suggested multiple roles of Ntn1 in controlling sensory axon growth at the DREZ in the revision.

      (2) In the final section of the discussion it says, "The guidance regulation of DRG axon bifurcation by Slit and Ntn1 may be similar to but overshadowed by their function in midline guidance [43]." The meaning of this sentence was unclear to me. I had been thinking that since there are total knockout embryos (not conditional) there could be patterning effects that happen before the DRG branching that influence the formation of the DREZ. Is this what the authors mean to say here? How can the authors show that the guidance factors they have knocked out are actually functioning in the DRG neurons? 

      We agree with the reviewer that the first sentence is vague, so we edited the paragraph and included the discussion of the regulation of DRG axons at the DREZ, which was the main theme of this last section.  In addition, we agree with the reviewer’s suggestion of the possible indirect role of Ntn1 on DRG axons via the control of interneuron migration.  This possibility was included in the last paragraph of the Discussion.

      (3) In several of the figures (3T, 5I, 5J) there are distance measurements that are presumably averages of multiple axons in 3 or 4 embryos because 3-4 points are shown per graph. However, the figure and methods do not say how many axons were measured per embryo and I could not find if it says these numbers are averages. Clarifying the details of these panels would be useful. 

      The n is the number of animals analyzed and is now added to the figure legends.  From each animal, multiple sections (2-4) were analyzed for various parameters in Fig. 3 and 5.  This information was added to the Method section of the revision.

      Reviewer #3 (Recommendations For The Authors):

      Overall the data matches the conclusions in the paper. However, to this reviewer, the title suggests that Ntn and Slit will have defects in bifurcation. This is not the presented phenotype. I recommend the authors change the title to better reflect the findings of the work. 

      We edited the title of the revised manuscript to reflect the control of growth direction in the context of bifurcation.  

      The introduction of the work clearly outlines what is known about DREZ formation in mice but could extend its discussion to other systems like chick and zebrafish (Jaeda Coutinho-Budd et al. 2008, Wang and Scott 2000, Golding et al 1997, Nichols and Smith 2019, Kikel-Coury et al 2021). These studies are particularly important given that pioneer events, including bifurcation, can be visualized. Acknowledging the contribution of other model systems to the understanding of DRG axon patterning is important to improve the scholarly discussion of the paper. 

      We added more detailed discussion of the current knowledge of DRG axon growth at the DREZ from several relevant studies of the rodent and zebrafish models in the last section of Discussion.

      In the data presented, the authors see defects in the axon patterning of DRG neurons and conclude it is a defect in the dorsal funiculus formation. Another interpretation is that a subset of axons cannot invade the spinal cord boundary properly. This phenotype was observed in zebrafish with timelapse imaging (Kikel-Coury et al 2021). It may not be necessary to specifically test the axons' ability to enter the spinal cord in this paper, but the possibility that this could drive the presented phenotypes should be more clearly stated in the results. Entry is not thoroughly addressed in this paper and would need to be confirmed by labeling the edge of the spinal cord with a second reporter. No entry would obviously impact axon targeting. However, delayed entry could place the axon in a navigation environment that is atypical, causing it to navigate aberrantly and present as a funiculus phenotype. 

      We thank the reviewer for raising this very interesting point.  In our present view, dorsal funiculus formation is related to DRG axon patterning, which involves growth, guidance, and bifurcation of the incoming afferents at the dorsal spinal cord.  We believe that these events are highly coordinated by various environmental cues to generate the DREZ and the dorsal funiculus.  The defects we observed could result from the disruption of such coordination that leads to misregulation of DRG axon entry at the dorsal spinal cord, as suggested by the reviewer.  We propose that further analysis by time-lapse imaging as done in zebrafish would provide better understanding of such coordination.  This discussion was included in the last section of Discussion. 

      The authors should clarify that their approach does not knock out molecules in a cell-specific way. This would specifically impact the interpretation of the Dcc phenotypes. It is possible that UNC-40/DCC is guiding cells that are not labeled. The non-autonomous role of UNC-40/DCC should be clearly stated as a possibility. 

      This discussion was added to the last paragraph of the Discussion section.

    1. Reviewer #2 (Public Review):

      Summary:

      Zylberberg and colleagues show that food choice outcomes and BOLD signal in the vmPFC are better explained by algorithms that update subjective values during the sequence of choices compared to algorithms based on static values acquired before the decision phase. This study presents a valuable means of reducing the apparent stochasticity of choices in common laboratory experiment designs. The evidence supporting the claims of the authors is solid, although currently limited to choices between food items because no other goods were examined. The work will be of interest to researchers examining decision-making across various social and biological sciences.

      Strengths:

      The paper analyses multiple food choice datasets to check the robustness of its findings in that domain.

      The paper presents simulations and robustness checks to back up its core claims.

      Weaknesses:

      To avoid potential misunderstandings of their work, I think it would be useful for the authors to clarify their statements and implications regarding the utility of item ratings/bids (e-values) in explaining choice behavior. Currently, the paper emphasizes that e-values have limited power to predict choices without explicitly stating the likely reason for this limitation given its own results or pointing out that this limitation is not unique to e-values and would apply to choice outcomes or any other preference elicitation measure too. The core of the paper rests on the argument that the subjective values of the food items are not stored as a relatively constant value, but instead are constructed at the time of choice based on the individual's current state. That is, a food's subjective value is a dynamic creation, and any measure of subjective value will become less accurate with time or new inputs (see Figure 3 regarding choice outcomes, for example). The e-values will change with time, choice deliberation, or other experiences to reflect the change in subjective value. Indeed, most previous studies of choice-induced preference change, including those cited in this manuscript, use multiple elicitations of e-values to detect these changes. It is important to clearly state that this paper provides no data on whether e-values are more or less limited than any other measure of eliciting subjective value. Rather, the paper shows that a static estimate of a food's subjective value at a single point in time has limited power to predict future choices. Thus, a more accurate label for the e-values would be static values because stationarity is the key assumption rather than the means by which the values are elicited or inferred.

      There is a puzzling discrepancy between the fits of a DDM using e-values in Figure 1 versus Figure 5. In Figure 1, the DDM using e-values provides a rather good fit to the empirical data, while in Figure 5 its match to the same empirical data appears to be substantially worse. I suspect that this is because the value difference on the x-axis in Figure 1 is based on the e-values, while in Figure 5 it is based on the r-values from the Reval algorithm. However, the computation of the value difference measure on the two x-axes is not explicitly described in the figures or methods section and these details should be added to the manuscript. If my guess is correct, then I think it is misleading to plot the DDM fit to e-values against choice and RT curves derived from r-values. Comparing Figures 1 and 5, it seems that changing the axes creates an artificial impression that the DDM using e-values is much worse than the one fit using r-values.

      Relatedly, do model comparison metrics favor a DDM using r-values over one using e-values in any of the datasets tested? Such tests, which use the full distribution of response times without dividing the continuum of decision difficulty into arbitrary hard and easy bins, would be more convincing than the tests of RT differences between the categorical divisions of hard versus easy.

      Revaluation and reduction in the imprecision of subjective value representations during (or after) a choice are not mutually exclusive. The fact that applying Reval in the forward trial order leads to lower deviance than applying it in the backwards order (Figure 7) suggests that revaluation does occur. It doesn't tell us if there is also a reduction in imprecision. A comparison of backwards Reval versus no Reval would indicate whether there is a reduction in imprecision in addition to revaluation. Model comparison metrics and plots of the deviance from the logistic regression fit using e-values against backward and forward Reval models would be useful to show the relative improvement for both forms of Reval.

      Did the analyses of BOLD activity shown in Figure 9 orthogonalize between the various e-value- and r-value-based regressors? I assume they were not because the idea was to let the two types of regressors compete for variance, but orthogonalization is common in fMRI analyses so it would be good to clarify that this was not used in this case. Assuming no orthogonalization, the unique variance for the r-value of the chosen option in a model that also includes the e-value of the chosen option is the delta term that distinguishes the r and e-values. The delta term is a scaled count of how often the food item was chosen and rejected in previous trials. It would be useful to know if the vmPFC BOLD activity correlates directly with this count or the entire r-value (e-value + delta). That is easily tested using two additional models that include only the r-value or only the delta term for each trial.

      Please confirm that the correlation coefficients shown in Figure 11 B are autocorrelations in the MCMC chains at various lags. If this interpretation is incorrect, please give more detail on how these coefficients were computed and what they represent.

      The paper presents the ceDDM as a proof-of-principle type model that can reproduce certain features of the empirical data. There are other plausible modifications to bounded evidence accumulation (BEA) models that may also reproduce these features as well or better than the ceDDM. For example, a DDM in which the starting point bias is a function of how often the two items were chosen or rejected in previous trials. My point is not that I think other BEA models would be better than the ceDDM, but rather that we don't know because the tests have not been run. Naturally, no paper can test all potential models and I am not suggesting that this paper should compare the ceDDM to other BEA processes. However, it should clearly state what we can and cannot conclude from the results it presents.

      This work has important practical implications for many studies in the decision sciences that seek to understand how various factors influence choice outcomes. By better accounting for the context-specific nature of value construction, studies can gain more precise estimates of the effects of treatments of interest on decision processes. That said, there are limitations to the generalizability of these findings that should be noted.

      These limitations stem from the fact that the paper only analyzes choices between food items and the outcomes of the choices are not realized until the end of the study (i.e., participants do not eat the chosen item before making the next choice). This creates at least two important limitations. First, preferences over food items may be particularly sensitive to mindsets/bodily states. We don't yet know how large the choice deltas may be for other types of goods whose value is less sensitive to satiety and other dynamic bodily states. Second, the somewhat artificial situation of making numerous choices between different pairs of items without receiving or consuming anything may eliminate potential decreases in the preference for the chosen item that would occur in the wild outside the lab setting. It seems quite probable that in many real-world decisions, the value of a chosen good is reduced in future choices because the individual does not need or want multiples of that item. Naturally, this depends on the durability of the good and the time between choices. A decrease in the value of chosen goods is still an example of dynamic value construction, but I don't see how such a decrease could be produced by the ceDDM.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript investigates the dynamics of GC-content patterns in the 5'end of the transcription start sites (TSS) of protein-coding genes (pc-genes). The manuscript introduces a quite careful and comprehensive analysis of GC content in pc-genes in humans and other vertebrates, specially around the TSS. The result of this investigation states that "GC-content surrounding the TSS is largely influenced by patterns of recombination." (from end of Introduction)

      My main concern with this manuscript is one of causal reasoning, whether intended or not. I hope the authors can follow my reasoning bellow on how the logic sometimes seems to fail, and that they introduce changes to clarify their suggested mechanisms of action.

      The above quoted sentence form the end of the Intro is in conflict with this other sentence that appears at the end of the Abstract "the dynamics of GC-content in mammals are largely shaped by patterns of recombination". The sentence in the Intro seems to indicate that the effect is specific to TSSs, but the one in the abstract seem to indicate the opposite, that is, that the effect is ubiquitous.

      We are sorry about the lack of clarity. We have now rewritten the abstract and intro to emphasize that our results are restricted to the 5' end of genes, and that by "patterns of recombination" we mean "historic patterns of recombination".

      The observations as stated in the abstract are: "We observe that in primates and rodents, where recombination is directed away from TSSs by PRDM9, GC-content at protein-coding gene TSSs is currently undergoing mutational decay."

      If I understand the measurements described in the manuscript correctly, and the arguments around them, you seem to show that the mutational decay of GC-content in humans is independent of location (TSSS or not), as noted here (also from the abstract) "These patterns extend into the open reading frame affecting protein-coding regions, and we show that changes in GC-content due to recombination affect synonymous codon position choices at the start of the open reading frame."

      Again, we have rewritten this section to clarify these points.

      There is one more result described in the manuscript, that in my mind is very important, but it is not given the relevance that it appears to me that it has. That is presented in Figure S3G. "we concluded that GC-content at the TSS of protein-coding genes is not at equilibrium, but in decay in primates and rodents. This decay rate is similar to the decay seen in intergenic regions that have the same GC-content (Figure S3G)"

      Thus, if the decaying effect happens everywhere, how can it be related to "recombination being directed away from TSSs by PRDM9" as it is stated in the abstract and in the model described in Figure 7?

      We make the argument that the GC-peak as likely caused by past recombination events. This is based on:

      1) The change in GC-content at the TSS in Dogs and Fox, coupled to the fact that they perform recombination at the TSS

      2) That the TSS can act as a default recombination site in mice when PRDM9 is knocked out

      3) That some forms of PRDM9 allow for recombination at TSS (see Schield et al., 2020, Hoge et al. 2023, and Joseph et al., 2023) and that this is expected to cause an increase in GC-content

      We thus speculate that the GC-peak in humans and rodents was caused by past recombination at TSSs that were permitted by ancient variants of PRDM9. We further point out that PRDM9 is undergoing rapid evolution, and some of the past versions of the protein may have had this property.

      We have tried to clarify these points in the latest version of the text.

      The fact that the decay rate is similar to any other region with similar GC-content should be an indication that the effect is not related to anything having to do with TSS or recombination being directed away from TSSs by PRDM9.

      We are sorry about the lack of clarity. TSSs in humans, chimpanzees, mouse and rats are are experiencing GC-decay at the same rate as in non-functional DNA regions with high GC-content. Thus the GC-peak is not being maintained by selection. This is surprising, given the role that GC-content plays in gene expression. This is a critical point, and we added it to the "conclusion" section of the abstract.

      I hope these paragraphs show my confusion about the relationship between the results presented which I think are very comprehensive and their interpretation and suggested model for GC-content dynamics around TSSs in human.

      On another note, can you provided a bit more background on recombination and its mechanisms?

      We have done our best to clarify these issues.

      You seem to have confident sets of genes under high/low/med recombination. How are those determined.

      We used the recombination rates per gene provided in Pouyet et al 2017 to identify the sets of genes under low/med/high recombination. Those rates were estimated from the HapMap genetic map (Frazer et al., 2007). This is now all specified in the methods section.

      You also seem to concentrate the cause of recombination on PRDM9, please explain. Is PRDM9 the unique indicator of recombination?

      PRDM9 has been shown to be the primary determinant of where recombination occurs in the genome (Grey et al., 2011, Brick et al., 2012). This is very well established. We now reword some of the introduction to make this clear.

      specific comments


      Figure 1, it is very hard to understand the differences between the three rows. Please explain more clearly in the legend, and add more information to the figure itself.

      We altered the axis titles to make this clearer. We also label "Upsream", "Exon 1" and "Part of Intron 1" in Figure 1C, F and I, and in Figure 2C. We now spell this out in the Figure Legend.

      Figure 7, express somewhere in the figure that the y axis measures GC content.

      We now added "GC Content" to the left of the first "graph" in Figure 7.

      Figure seems to introduce a 'causal' model of GC-content dismissing (diminishing?) based on recombination being directed away from TSSs. How about the diminishing of GC-content on any other genomic regions as you have shown in Figure S3G?

      Our focus in this model, and manuscript, is on TSSs. I think that to add the dynamics of other GC-rich regions is distracting. We do not know what caused these intergenic genomic regions to be high in GC-content prior to decay. After excluding known recombination sites and TSSs, these regions are very rare in the human genome. They may be ancient recombination sites that are decaying in GC-content. However, unlike TSSs, which have some connection to recombination (i.e. data from PRDM9 knockout mice and dogs and fox), we do not have any direct or indirect evidence that these other sites were used for recombination in the past. Alternatively, there could have been some other pressure on these sites in the past to increase GC-content that we are not aware of.

      -- The title is too selective, as to the results, and it has the implication that the decay is exclusive to the surrounding of the TSSs.

      Decay of GC-content towards equilibrium is the default state for non-functional DNA. That it is occurring at the TSS is surprising, as it indicates that the GC-peak is not maintained by selection. We now state this in the paper and include this in the "conclusion" portion of the abstract.

      Reviewer #1 (Significance (Required)):

      The statistical analysis is comprehensive and robust.

      We thank the reviewer for this.

      Their model interpretation as is describe induces confusion and needs to be clarified.

      We are sorry about this. Hopefully our revised text will clear up the confusion.

      I am an expert computational biologist, I do not have a deep knowledge of sequence implications of recombination, and it would be good if the manuscript could add some more background on that.

      We thank the reviewer for their perspective, and we hope that our text changes better explain to the non-expert why our findings are so surprising. We further clarify how recombination affects DNA sequence by gBGC and some of these changes are detailed in our response to the other reviewers.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this work, the author present various analyses suggesting that GC-content in TSS of coding genes is affected by recombination. The article findings are interesting and novel and are important to our understanding of how various non-adaptive evolutionary forces shape vertebrate genome evolutionary history.

      We thank the reviewer for these kind words.

      The Methods section includes most needed details (see comments below for missing information), and the scripts and data provided online help in transparency and usability of these analyses.

      I have several comments, mostly regarding clarifications in the text and several suggestions:

      1. In introduction: CpG islands, have been shown to activate transcription (Fenouil et al., 2012) - what is known about CpG Islands is somewhat inaccurately described. It should be rephrased more accurately, e.g. - CpG Islands found near TSS are associated with robust and high expression level of genes, including genes expressed in many tissues, such as housekeeping genes.

      We thank the reviewer for that. We have rewrote this part of the introduction.

      1. The following claim (in Introduction), regarding retrogenes and their GC content is not in agreement recent analyses: "Indeed, it has been observed that these genes have elevated GC-content at their 5' ends in comparison to their intron-containing counterparts, suggesting that elevation of GC-content can be driven by positive selection to drive their efficient export (Mordstein et al., 2020). Moreover, retrogenes tend to arise from parental genes that have high GC-content at their 5'ends (Kaessmann et al.,2009)." Recent work showed that retrogenes in mouse and human are significantly depleted of CpG islands in their promoters (PMID: 37055747). This follows the notion that young genes, such as these retrogenes, have simple promoters (PMID: 30395322) with few TF binding sites and without CpGs. The two reported trends should be both mentioned with some suggestions regarding why they seem to be contrasting each other and how they can be reconciled.

      We thank the reviewer for this information. The previous report (Mordstein et al., 2020) indicated that the increase in GC-content occurs downstream of the TSS in retrogenes. Since sequences upstream of the TSS are not part of the retro-insertion, it is not surprising that GC-content may differ between the retrogene and the parental gene. That retrogenes have lower numbers of CpGs upstream of the TSS, bolsters the idea that GC-content is not required for transcription and that the GC-peak is not being maintained in most genes by purging selection.

      1. In "Thus GC-content is expected, and is indeed observed to be higher near recombination hotspots due to gBGC (REF)." I think you forgot the reference...

      We thank the reviewer for catching this.

      1. In Results, regarding average GC content (Fig 2X): "Interestingly, this pattern is different in the nonamniotes examined, including anole lizard, coelacanth, shark and lamprey." - in lizard, it seems that the genomic average is lower (and lizards are amniotes)

      You are absolutely right. We now fix this.

      1. In Discussion, the statement: "This model is supported by findings in a recent preprint, which documents the equilibrium state of GC-content in TSS regions from numerous organisms" seems to contrast with the findings of the mentioned preprint. If "most mammals have a high GC-content equilibrium state" but still have a functional PRDM9, in the lack of evidence for functional differences between ortholog PRDM9 proteins (such as signatures for positive selection or functional assays), the authors' findings regarding the relationship between a lack of PRDM9 in canids and the trends observed in their TSS, are weakened.

      We are sorry about the confusion. We were not exactly sure what points were being commented on. 1) whether GC-content is at equilibrium for most mammals or 2) that the equilibrium state is high for most mammals despite containing PRDM9. We rewrote this sentence to clarify both issues (especially given that these concepts may not be clear to non-experts, such as the first reviewer). To answer the first potential concern, the paper in question (Joseph et al., 2023), does not show that GC-content at the TSS in mammals is at equilibrium, rather, it calculates what the equilibrium state is given the nucleotide substitution rates. In most organisms, the TSS is not at equilibrium. To answer both 1 and 2, Joseph et al., show that the equilibrium GC-content at the TSS for canids is much higher than for other mammals. They and others infer that the diversity between other mammals (where the equilibrium state is higher than humans and rodents but lower than canids) has to do with the variation between PRDM9 orthologues, however this has yet to be tested. Although the action of PRDM9 has not been evaluated in most mammals, we do point out that in snakes PRDM9 allows for some recombination at the TSS.

      1. In Methods, the ENSEMBL version (in addition of the per-species genome version) should be mentioned.

      This has been fixed.

      1. In Fig 1, it is worth clarifying in the legend that the differences between the first and second rows of panels is in the length of the plotted region.

      We have now indicated this in the figure legend.

      Reviewer #2 (Significance (Required)):

      The manuscript provides a rigorous analysis of the possible processes that have impacted the TSS GC-content during evolution. It should be of interest to a diverse set of investigators in the genomics community, since it touches on different topics including genome evolution, transcription and gene structures.

      Thank you.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This study analyzes the distribution of GC-content along genes in humans and vertebrates, and particularly the higher GC-content in the 5'-end than in the 3'-end of genes. The results suggest that this pattern is ancient in vertebrates, currently decaying in mouse and humans, and probably driven by recombination and GC-biased gene conversion. It is proposed that the 5'-3' gradient was generated during evolution when PRDM9 was less active (in which case recombination occurs mostly near transcription start sites), and decays when PRDM9 is very active, as it is currently in humans and mouse. This is a very interesting hypothesis, also corroborated by a recent, similar analysis in mammals (Joseph et al. 2023). These two preprints, which appeared around the same time, are, I think, quite novel and important. The analyses performed here are thorough and convincing. Source code and raw data sets are openly distributed. I only have a couple of minor comments and suggestions, which I hope might help improve the manuscript.

      Thank you very much for the kind words.

      A1. There has been quite some work on the 5'-3' GC-content gradient in plants (e.g. Clément et al. 2014 GBE, Ressayre et al. 2015 GBE, Brazier & Glemin 2023 biorxiv), which you might like to cite.

      Thank you for pointing out these very interesting papers, we have incorporated them into the latest version.

      A2. CpG-content and GC-content are related in various ways (e.g. see Galtier & Duret 2000 MBE, Fryxell & Moon 2005 MBE) that you might like to discuss; currently the manuscript discusses the CpG hypermutation rate as a driver of GC-content but the picture might be a bit more complex.

      Thank you for this, we have incorporated these citations.

      A3. The model introduced by this manuscript (figure 7) is dependent on the evolution of recombination determination in vertebrates and the role of PRDM9. A recent preprint by Raynaud et al (biorxiv) seems relevant to this issue.

      Thank you for pointing out this pre-print. We have added a paragraph to the discussion that mentions this work. This also initiated a conversation with the authors, and we include some "personal communications" that illuminate what is going on in teleost fish.

      Line-by-line comments

      B1. "First, highly spliced mRNAs tend to have high GC-content at their 5' ends despite the fact that it is not required for export and does not affect expression levels (Mordstein et al., 2020)" -> I do not totally understand this sentence, which seems to imply some link between splicing and export/expression, could you please clarify?

      We rewrote that sentence to make it clearer.

      B2. "mismatches will form in the heteroduplex which are typically corrected in favor of Gs and Cs over As and Ts by about 70%" -> This 70% figure is human-specific, and varies a lot among species; I know in this introduction you're mainly reviewing the human literature but since this part of the text introduces gBGC as a process maybe clarify by adding "in humans" or refrain from giving this figure?

      Thank you. This is a good point. We fixed this.

      B3. "Thus GC-content is expected, and is indeed observed to be higher near recombination hotspots due to gBGC (REF)." -> reference missing here; actually I'm not sure you will find a good reference for this because PRDM9-dependent hotspots are so short-lived that GC-content would only respond weakly; mayber rather refer to the equilibrium GC-content (and cite, for instance, Pratto et al 2014 Science), or to high-recombining regions instead of hotspots (and you have plenty of papers to cite)?

      Thanks for this.

      B4. Paragraph starting: "PRDM9 and recombination hotspots also experience accelerated rates of evolution..." -> I would suggest removing the word "also" and moving this paragraph up, just before the sentence I'm commenting above (the one starting "Thus GC-content..."). This will justify my suggestion in comment B3 of mentioning high-recombining regions instead of hotspots, while also avoiding to have the important paragraph on recombination at TSS (the one starting "There are interesting connections...") being sandwiched between two sections on PRDM9.

      We did not move this paragraph, although we did adjust the wording slightly.

      B5. Paragraph starting "There are interesting connections..." is crucial to your discussion and might be emphasized a bit more in introduction, in my opinion. For instance, what about adding a sentence like "Also not directly relevant to humans, these observations suggest that gBGC might have played a role in shaping the observed 5'-3' GC-content gradient."

      We did not alter the structure of this paragraph but we did reword sections of it.

      1. "Interestingly, this pattern is different in the non-amniotes examined, including anole lizard, coelacanth, shark and lamprey. These organisms had clear differences in GC-content between their first exon and surrounding sequences (upstream and intronic sequences), which came close to the overall genomic GC-content." -> I'm not sure I got the point the authors are intending to make here. Also please note that lizards are amniotes.

      We thank the reviewer for catching this error, we have fixed this.

      Reviewer #3 (Significance (Required)):

      This is one of two preprints having appeared ~at the same time (the other one being the cited Joseph et al 2023), which I think are quite important and convincing regarding the role of PRDM9-dependent and PRDM9-independent recombination on GC-content evolution in vertebrates. I support publication of this preprint in a molecular evolutionary journal.

      We thank the reviewer for their kind assessment!

    1. Reviewer #1 (Public Review):

      Rubin et al. study chondrocyte columns in the prenatal and postnatal growth plate in 3D for the first time, using a novel analysis pipeline in which Confetti clones in the murine growth plate are analysed morphometrically. Prenatal chondrocytes were found not to be organised in columns parallel to the main orientation of the long bone, but rather, prenatal chondrocytes were commonly organised perpendicular to the main direction of growth. In the postnatal (P40) growth plate there was a diverse arrangement of columns, but more of the columns were vertically aligned

      I enjoyed reading the work and the analysis is rigorous. However, I think that it is not valid to state that columns do not form in the embryo. The data only supports the finding that strictly vertical columns do not form in the embryo, as the cells are still organised into columns, albeit with a range of orientations. I do not like the term "typically" aligned, as how can we know what is "typical" when orientation has never before been assessed in 3D... And the authors' data demonstrates that it is certainly not "typical" for chondrocyte to organise into vertical columns prenatally.

      It would be very interesting to delve deeper into the reason for the change in orientation of columns between pre- and post-natal. For example, does more circumferential growth happen prenatally as compared to postnatally? Is the rate of circumferential vs longitudinal growth different between prenatal and postnatal, and could the change in column orientation be responsible for a (possible) shift in the balance between longitudinal vs circumferential growth before vs after birth? The first sentence of the Discussion refers to the role of chondrocyte columns in driving bone elongation, but aren't they also involved in driving bone morphology?

      I feel describing the activity of the cells as "mis-rotations" which implies the orientations are not intentional. It is likely not accidental or mistaken that the chondrocytes align in the ways they do- the diaphysis is largely for longitudinal growth while the epiphyses, and lateral expansion of the joint is also important. I find the data in Figure 4 fascinating, especially the variation in orientations between the regions of the growth plate (from proximal to distal), with the most lateral orientation at the most proximal and distal ends- it would be nice to see more discussion of these variations and what they may be contributing to.

      The abstract focuses solely on the analysis of columns prenatally and would benefit from the inclusion of the data from the postnatal growth plate and from the chondrocyte rotations.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Compared to our initial submission to Review Commons, we have addressed all the reviewers' comments. We have extensively re-written the manuscript to make it clearer to a larger audience. In particular, we have transferred Figure EV1 to Figure 1 with more complete panels and included a scheme (Figure EV3) on the steps of D2R internalization which we measure with live cell imaging. We have added a new paragraph to the start of the Discussion to summarize our main conclusions and reordered the discussion on the possible mechanisms of membrane PUFA enrichment on D2R endocytosis. All the changes in the text are in red for easier comparison with the previous version.

      As suggested by reviewer 1, we have performed additional experiments to test the specificity of the effects of PUFA treatments on D2R endocytosis, reinforcing the results shown in Figure 4 using feeding assays. We show with live cell TIRF imaging and the ppH assay that TfR-SEP endocytosis is not affected (Figure EV5) and that SEP-β2AR endocytosis and βarr2-mCherry recruitment to the plasma membrane are not affected (Figure EV6).

      Reviewer #1

      Evidence, reproducibility and clarity

      *The manuscript, using different live and fixed cell trafficking assays, demonstrates that incorporation of poly-unsaturated, but not saturated, free fatty acids in the membrane phospholipids reduce agonist induced internalization of the D2 dopamine receptor but not the adrenergic beta2 receptors or the transferrin receptor. Pulsed pH (ppH) live microscopy further demonstrated that the reduced internalization by incorporation of free fatty acid was accompanied by a blunted recruitment of Beta-arrestin for the D2R.

      I believe said claims put forward in the manuscript are overall well supported by the data and as such I do not believe that further experiments are necessarily needed to uphold these key claims. Also, the methodology is satisfactorily reported, and statistics are robust, although two-way Anova like used in Fig 1 seems appropriate for Fig 2 and 3*

      We thank the reviewer for his/her positive assessment of our work. We have checked the statistical tests used for all our measures. For Figure 2 and 3 (now 3 and 4) we test for only one factor (PUFA treatment or not) so we ran ordinary one-way ANOVA using Graphpad Prism.

      That said, I suggest that the fixed cell internalization experiments (Fig 2 and 3), which relate the effect on the D2R to B2AR and transferrin are revised. This is important since this is relevant to judge whether the effect is a general or a selective molecular mechanism since this is the one of the three assay which this comparison relies on. Alternatively, I suggest omitting this data and include the B2AR in the Live DERET assay and both B2AR and TfR in the ppH assay. Specifically, my concerns with the fixed cell internalization are: • The analysis is based on counting the number of endosomes, which is not necessarily equivalent to the number of receptors internalized

      The number of puncta, as well as their fluorescence, is reported by the analysis program (written in Matlab2021 and available upon request). We chose to show number of puncta because they reflect more directly the number of labelled endosomes (in Figures 3 and 4). As shown in the figure below, we found slight but significant differences between groups for FLAG-D2R (88.6 % and 87.6 % of average fluorescence in DHA and DPA treated cells compared to control cells), (panel A), and no differences for FLAG-β2AR (panel B). We find a significant decrease in puncta fluorescence for transferrin uptake in cells incubated with DHA (but not DPA) relative to control cells (panel C). However, because we did not detect differences in the number of puncta or in the frequency and amplitude of endocytic vesicle creation events (see below), we still conclude that enrichment with exogenous PUFAs does not affect clathrin mediated endocytosis.

      In conclusion, the most robust measure of endocytosis for this assay is the number of detected puncta per cell rather than their fluorescence.

      • The analysis relies on fully effective stripping of the surface pool of receptors - i.e clustered surface receptors not stripped by the protocol will be assessed as internalized. It is often very difficult to obtain full efficiency of the Flag-tag stripping and this is somewhat expression dependent. • The protocol for the constitutive and agonist induced internalization is different and yet shown on the same absolute graph. Although I take it the microscope gain setting are unaltered between the constitutive and agonist induced internalization I don't believe the quantification can be directly related. This is confusing at the very least. More critically however, the membrane signal from the non-stripped condition of constitutive internalization will likely fully shield internalized receptors in the Rab4 membrane proximal recycling pathway leading to under-estimation of the in the constitutive endocytosis. I believe this methodological limitation underlies the massive relative difference in the constitutive endocytosis between panel 2A,B and 2C,D. For comparison, by a quantitative dual color FACS endocytosis assay, we have previously demonstrated the ligand endocytosis a ~4 fold increased over constitutive (in concert with Fig 2A,B here) (Schmidt et al 20XX). Importantly, high relative variability by this methodology could well shield an actual effect of incorporation of FFAs on the constitutive endocytosis. We thank the reviewer for pointing this difference in the protocol. As a matter of fact, we have not used acid stripping in all the conditions used for the uptake assays (Figures 3 and 4). We apologize for the confusion and we have clarified this point in the Methods section. In early experiments we compared conditions with or without stripping but we concluded from these experiments that indeed, the stripping was not complete. Moreover, we noticed early on that many cells treated with DHA or DPA did not have any detectable cluster (13 cells out of 58 quantified cells treated with DHA after addition of QPL, 12/56 cells treated with DPA, 0/68 for cells treated with vehicle). Stripping the antibody would have made these cells undetectable, biasing the analysis. Therefore, to make our results more consistent we decided to use non-stripping conditions. To detect endosomes specifically, we used a segmentation tool developed earlier (see Rosendale et al.* 2019). This tool is based on wavelet transforms which recognizes dot-like structures. In addition, we excluded from the cell mask the labelled plasma membrane by a mask erosion.

      We agree the design of experiments was not aimed at comparing the effect of PUFA treatment on low levels of constitutive D2R endocytosis. This would require more sensitive assays and be addressed in subsequent studies.

      'Optional' Also, it would be informative to see the ppH Beta-arrestin experiments with the B2AR to assess, whether the putative discrepancy between D2R and B2AR is upstream or downstream of the blunted Beta-arrestin recruitment. To the same point, it would be very informative to assess how the incorporation of the free fatty acids affect receptor signalling, which would also help relate the effect of incorporation of the FFA's in the phospholipids to previous experiment using short term incubation with FFA's

      We have now performed live imaging experiments in HEK293 cells expressing SEP-β2AR, GRK2 and βarr2-mCherry and stimulated with isoproterenol (Figure EV6). We show that the clustering of SEP-β2AR, of βarr2-mCherry, as well as endocytosis, are not affected by treatments with DHA or DPA. In this study, we focused on the early trafficking steps of D2R internalization. It will be interesting in a future study to address its consequences on G protein dependent and independent signaling. Moreover, and for good measure, we performed experiments to assess TfR-SEP endocytosis with the ppH assay. Again, we found no difference between cells treated or not with PUFAs (Figure EV5)

      *References overall seem appropriate although Schmidt et al would be relevant for reference of the constitutive vs agonist induced endocytosis of D2R and B2AR. *

      We have now cited Schmidt et al. 2020 doi 10.1111/bcpt.13274 in the discussion with the following sentences: "D2R also shows constitutive endocytosis (Schmidt et al, 2020) which may be modulated by PUFAs although we did not detect any significant difference in our measures (see Figure 3) which were aimed at detecting high levels of internalization induced by agonists. Further work will be required to specifically examine the effect of PUFAs on constitutive GPCR internalization."

      Overall, the figures are well composed and convey the messages fairly well. Specific point that would strengthen the rigor include: • Chosing actual representative pictures of the quantitative data in Fig 2 and 3 (e.g. hard to see 25 endocytic events in Fig 2A constitutive endo, EtOH)

      We apologize for the confusion. We employ a normalization procedure to account for cell size. In addition, all numbers have been normalized to the condition stimulated with agonist with no PUFA treatment). In fact, we detect in unstimulated cells very few puncta (on average 0.6, range 0-5) compared to 27.3 clusters (range 2-87) in cells stimulated with QPL.

      • Showing actual p values for the statistical comparisons* For easier reading, we have kept the stars convention for the figures but added two tables with all statistical tests and the p values for both main figures and EV figures.

      Moreover, for ease of reading the figures (without consulting the legend repeatedly) it would be very helpful to headline individual panel with what the experiments assesses. Figure 1a and 1b for example can't be distinguished at all before reading the figure legend. Also, y-axis could be more informative on what I measured rather than just giving the unit.

      We have added titles to panels (in particular for Figure 2A,B which correspond to former Figure 1A,B) and we have given new titles to Y axes to make them clearer. We hope that the reading of our figures will now be easier.

      Finally, the figure presentation and description of S1 is very hard to follow. I cannot really make out what is assessed in the different panels.

      We have changed substantially Figure EV1 (now Figure 1) with new presentation of data: all 4 conditions (control, treated with DHA, DPA or BA) systematically presented in the same graph, and clearer titles for the parameter displayed on the Y axes. We hope that this figure is now easier to follow.

      Significance

      *The strength of the manuscript is the use and validation of incorporation of FFA's in the plasma membrane, which more closely mimics the physiological situation than brief application of FFAs as often done. Is addition, the blunted recruitment of beta-arrestin as assessed by the ppH protocol is quite intriguing mechanistically. The limitation are the relative narrow focus on the D2 receptor (and not multiple GPCRs) that does not really speak to as or assess the physiological, pathophysiological or therapeutic role of the observations (except from referring the relation between FFAs and disease). Also, despite the putative role of Beta-arrestin recruitment in the process, the actual causation in the process is not clear. This shortcoming is underscored by the putative effect on the constitutive internalization described above.

      My specific expertise for assessing the paper is within general trafficking processes (including the trafficking methodology applied), trafficking of GPCRs and function of the dopamine system including the role of D2 receptors.*

      • *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      • *

      The only conclusion that I was able to understand from the study was that enrichment of cell membranes with polyunsaturated fatty acids specifically inhibited agonist-induced internalization of D2 receptors. However, I think that the experiments used to conclude that PUFAs do not alter D2R clustering but reduce the recruitment of β-arrestin2 and D2R endocytosis need some clarification (i.e. data depicted in Fig. 2-5). This lack of clarity might be due to the fact I am not familiar enough with the employed technologies or to the unclear writing style of the paper. There was an overuse of acronyms, initialisms and abbreviations, which are difficult to understand for researchers outside of the specific lipid field. I think that the manuscript should be written in a way to be legible also for researchers not working in the immediate filed.

      The paper was not written in a manner that a general audience of cell biologists or those interested in GPCR biology could understand and judge. It is indeed interesting that polyunsaturated fatty acids specifically inhibit D2R internalization in HEK293 cells, and it could be significant. But, it is difficult to judge the significance of the observation without more in vivo data.

      I would suggest the following. Remove all acronyms and abbreviations. Significantly, expand the Materials and Methods section, either in the manuscript or in the Supplemental section. I suggest clearly explaining each construct used, and the function of each module in the construct, with diagrams. In addition, provide a comprehensive step by step description of each experimental protocol, providing the reader with the rationale for each step in the protocol with explanatory diagrams. The authors should also more clearly explain the rationale and logic that was utilized to make the conclusions that they did from the depicted observations. Only then can a broader audience determine if the authors' conclusions are justified.

      We thank the reviewer for his/her comments. Indeed, our main message was that two types of PUFAs (DHA and DPA) specifically alter D2R endocytosis by reducing the recruitment of β-arrestin2 without changing D2R clustering at the plasma membrane. We are sorry that our writing was not clear enough. We also found out that in the last steps of the submission to Review Commons, the first paragraph of the Discussion was inadvertently erased. This made our main conclusions, summarized in this first paragraph, less clear. We have now put back this important paragraph. Moreover, we have extensively rewritten the manuscript thriving to make it as clear as possible to a large audience. We have reduced the use of acronyms to keep only the most used ones [e.g. PUFA (used 99 times), DHA (37 times), GPCR (34 times), D2R (126 times), GRK (17 times)] and made them consistent throughout the manuscript. Following the reviewer's suggestion, we have also added a scheme of the steps following D2R activation by agonist leading to its internalization (Figure EV3).

      We understand that the reviewer implies by "in vivo data" results obtained in the brain of animals. As written in the Introduction and in the Discussion, the current work follows up on a recently published manuscripts by a subset of the authors, namely (i) Ducrocq et al. 2020 (doi 10.1016/j.cmet.2020.02.012) in which we show that deficits in motivation in animals deprived in ω3-PUFAs can be restored specifically by conditional expression of a fatty acid desaturase from c. elegans (FAT1) that allows restoring PUFA levels specifically in D2R-expressing striatal projection neurons (which mediate the so-called indirect pathway), and (ii) Jobin et al. 2023 (doi: 10.1038/s41380-022-01928-6) which combines in cellulo (HEK 293 cells) and in vivo data to show that PUFAs affects the ligand binding of the dopamine D2 receptor and its signaling in a lipid context that reflects patient lipid profiles regarding poly-unsaturation levels.

      Reviewer #2 (Significance (Required)):

      • *

      In summary, I will reiterate that the reported experiments need to be much better explained to make the study understandable to a broader audience and for that audience to determine whether the conclusions are justified.

      • *

      • *

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      • *

      Summary:

      The authors investigate the role of lipid polyunsaturation in endocytic uptake of the dopamine D2 receptor (D2R). To modulate the degree of unsaturation in live cell plasma membranes, the authors incubate cell lines with pure fatty acid that is metabolized and incorporated into the cellular membranes. To quantify the internalization of D2R in these live cells, the authors utilized quantitative fluorescence assays such as DERET and endosome analysis to determine the degree and rate of D2R internalization in the presence of two model agonists - dopamine and quinpirole. The authors conclude that when the PUFA content of the plasma membrane is increased (i.e., via ω3 or ω6 fatty acids), both the quantity and rate of D2R internalization decrease substantially. The authors confirmed that these phenomena are specific to D2R as caveolar endocytosis and clathrin-mediated endocytosis were unaffected when these same experimental techniques were utilized for β2 adrenergic receptor and transferrin. Additionally, the authors conclude that the clustering ability of D2R is unaffected by lipid unsaturation but that the ability of D2R clusters to interact with β-arrestin2 is inhibited in the presence of excess PUFA. Based on these findings, the authors propose several hypothetical mechanisms for lipid-D2R interactions on the plasma membrane, which will likely be the scope of future work.

      Overall, this is a highly thorough and rigorous body of work that convincingly illustrates the connection between PUFA levels and D2R activity. However, I do not agree with the authors' conclusions pertaining to how their results should be interpreted in the context of fatty acid-related disorders. Additionally, this manuscript could benefit from some reorganization which would present the work more clearly. Please see the comments below.

      We thank the reviewer for the positive appreciation of our work, qualified as a "thorough and rigorous body of work that convincingly illustrates the connection between PUFA levels and D2R activity". We will address the specific points raised by the reviewer with our answers below.

      Comments:

        • A recurring motivation for this study that is brought up by the authors is that dietary deficiency of ω3 fatty acids is tied to D2R dysfunction. This would indicate that PUFA reduction in the plasma membrane results in D2R dysfunction. However, the experiments emphasized in this manuscript investigate the condition where PUFA content is INCREASED in the plasma membrane and D2R function is compromised. It seems inappropriate for the authors to cite dietary deficiency of ω3 as a motivation when they experimentally test a condition that is tied to ω3 surplus.* Regarding the general comment of the reviewer, we agree that direct conclusion cannot be drawn on the etiology of psychiatric disorders by looking at the effect of membrane fatty acid levels on D2R in HEK 293 cells. Nevertheless, we mention in the Introduction the intriguing occurrence of low PUFA levels in psychiatric disorders as starting point to look at D2R as an important target for psychoactive drugs prescribed for these disorders. In the Discussion, we propose that manipulating fatty acid levels might potentiate the efficacy of D2R ligands used as treatments. We felt raising these aspects was not putting too much emphasis on psychiatric disorders. However, in accordance with the reviewer's comment, we toned down these descriptions in the revised manuscript.

      The goal of increasing the levels of fatty acids at the membrane in HEK 293, the most widely used cellular system to study GPCR trafficking, was to try to emulate the levels of lipids in brain cells. Indeed, the levels of PUFAs in our culture conditions are much lower (~8 %, Figure 1B) than in brain extracts (~30 %). Therefore, the "control" condition in HEK 293 cells would correspond to PUFA deficiency while after our enrichment protocol these levels are closer to those found in brain cells. Our results could therefore be interpreted as endocytosis of D2R being augmented under membrane PUFA decrease. Importantly, increased receptor internalization often correlates with decreased signaling. Therefore, membrane PUFA enrichment in our conditions would rather potentiate D2R signaling.

      • Following up on the first comment, the authors' results seem to indicate that excess ω3's are detrimental to D2R function. This result would be at odds with the conventional view that ω3's are essential and that excessive ω3 may not be harmful. The authors should rationalize their findings in the context of what is known about excess dietary ω3.*

      The Reviewer is right that the conventional view is that excessive ω3 PUFA may not be harmful. However, this rather applies to dietary consumption, which might have limited effect to brain fatty acid contents since their accretion is highly regulated. Moreover, the majority of studies looking at ω3 supplementation have been performed in young adults and the effects on the developing brain - as it might be happening in pathological conditions in which D2R is involved - remain poorly understood. Furthermore, as mentioned above, blunted internalization of D2R under membrane PUFA enrichment is not an indication of "detrimental" to D2R function. Nor do we argue that membrane enrichment corresponds to excess PUFAs.

      • I would argue that the control experiments with saturated fatty acids (i.e., Behenic Acid in figure 1), represent a scenario mimicking ω3 deficiency as the enrichment of Behenic Acid causes an overall reduction in PUFAs (Figure EV1C - an increase in SFA must correspond to a decrease in PUFA). These Behenic acid results are the only experiments presented by the authors that mimic a scenario resembling ω3 deficiency and the results show that the D2R internalization is unaffected (Figure 1G-H). Therefore, I would further argue that if anything, the authors results suggest that ω3 deficiency is NOT correlated to D2R internalization. Again, the authors must rationalize these findings in the context of what is known about dietary intake of ω3's.*

      The Reviewer must refer to the fact that nutrients rich in SFAs are usually poor in PUFAs and vice-versa. Based on our lipidomic analysis, we now present in Figure 1B the effect of treatments (DHA, DPA, BA) on the levels of PUFAs (Figure 1B) and saturated fatty acids (Figure 1C). In cells treated with behenic acid (BA), PUFA levels are not significantly changed relative to control, untreated cells, while saturated fatty acid levels are increased. BA was used here to determine whether the effects observed with PUFAs was related to the enrichment in unsaturations or due to carbon chain length (C22). It is not the case because BA treatment, unlike DHA or DPA treatment, does not affect D2R endocytosis (Figure 2G,H).

      • It's not clear why the authors decided to include an ω6 fatty acid in this study. The authors built up a detailed rationale for investigating ω3's as they are dietarily essential and tied to disease when deficient. To my knowledge, ω6's are considered much less beneficial than ω3's in a dietary sense. The inclusion of an ω6 almost seems coerced as the ω6-related results don't provide any interesting additional insights. It would benefit the manuscript if the authors provided some additional discussion explaining why ω6's are being investigated in addition to ω3's. *

      We agree that we could have made the rationale clearer. The goal in comparing ω3-DHA and ω6-DPA was to assess whether the position of the first unsaturation (n-3 vs n-6), with the same carbon chain length (C22) might differentially impact D2R endocytosis.

      • In Figure EV1D, the AHA and DPA percentages each increase by ~6%. The corresponding Figure EV1B indicates that the overall PUFA% in the plasma membrane also increases by 6%. This makes sense as the total change in PUFA content is consistent with the amount of AHA or DPA being internalized to cells. However, this consistency was not observed with BA and SFAs. In Figure EV1E, the BA percentage increases only ~1% while the total SFA percentage in Figure EV1C increases by ~6%. How can something undergoing a 1% change (relative to total lipid content) result in a 6% overall change in SFA content?*

      The reviewer is correct: the level of SFAs is increased by 5.2% (34.5 % of total FAs in control cells to 39.7 % in BA treated cells), more than the increase in BA alone (1.18% from 0.35 % to 1.53 %). A close look at our lipidomics data showed that many of the 10 saturated fatty acids quantified are enhanced. In particular, the two most abundant ones, palmitic acid (16:0) and stearic acid (18:0) are increased, from 21.37 % to 22.28 % and 8.47 % to 11.17%, respectively. The reasons for these apparent discrepancies may involve lipid metabolic pathways which convert the rare and long BA into more common and shorter SFAs to preserve lipid contents and thus membrane properties.

      • In Figure 4, the discussion of kinetics does not make sense. How exactly are kinetics being monitored in this figure? (Recruitment kinetics are discussed in panels D and G)*

      We wanted to convey the impression that the time to reach the peak βarr2-mCherry recruitment was shorter in PUFA-treated cells than in control cells. However, after analyzing the kinetics in individual cells, we did not find a statistically significant difference in the time to maximum fluorescence. Therefore, we removed this reference to the kinetics of recruitment.

      We now write: " However, treatment with DHA or DPA significantly decreased peak βarr2-mCherry fluorescence (Figure 5F-G).."

      • In Figure 5, What is the purpose of panel D? Would it be more helpful to include additional, overlaid "cumulative N" plots for scenarios in which PUFAs were enriched? This would work well in conjunction with panel F.*

      The purpose of this panel is to show the kinetics of increase in the frequency of endocytic vesicle formation upon agonist addition, and the decrease in frequency when the agonist is removed. We have now added examples of cells treated with DHA and DPA of similar surface for direct comparison with control (EtOH) cells.

      • For the readers who are new to this area or unfamiliar with the assays used, Figure 1 is not intuitive and initially difficult to interpret. It would greatly benefit the flow of the manuscript if Figures EV1A-C and EV2A were included in the main text and "Normalized R" was clearly defined in the main text, prior to discussion of Figure 1.*

      We have now transferred Figure EV1 as Figure 1. We have adapted the scheme of the DERET assay and its legend (now in Figure EV1A) to make it clearer. We did not put in Figure 2 because this figure is already very big. We have changed "Normalized R" to "Ratio 620/520) (% max)" to be clearer and more consistent with the scheme.

      Reviewer #3 (Significance (Required)):

      • *

      General assessment: The work, for the most part, is rigorous and scientifically sound. The authors utilize impressive, quantitative assays to expand our understanding of protein-lipid interactions. However, the authors need to improve their discussion of the actual physiological conditions that correspond to their experimental results.

      • *

      Advance: This work may fill a gap in our understanding of disorders related to the dopamine D2 receptor. However, some of the results may be at odds with what is currently known/understood about dietary ω3 fatty acids.

      • *

      Audience: This work will be of broad interest to researchers in the biophysics field, with particular emphasis on researchers who study protein and membrane biophysics. This work will also be of interest to researchers who study membrane molecular biology.

      • *

      Reviewer Expertise: quantitative fluorescence spectroscopy and microscopy; membrane biophysics; protein-lipid interactions

      • *
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Self-inhibiting percolation and viral spreading in epithelial tissue" describes a model based on 5-state cellular automata of development of an infection. The model is motivated and qualitatively justified by time-resolved measurements of expression levels of viral, interferon-producing, and antiviral genes. The model is set up in such a way that the crucial difference in outcomes (infection spreading vs. confinement) depends on the initial fraction of special virus-sensing cells. Those cells (denoted as 'type a') cannot be infected and do not support the propagation of infection, but rather inhibit it in a somewhat autocatalytic way. Presumably, such feedback makes the transition between two outcomes very sharp: a minor variation in concentration of ``a' cells results in qualitative change from one outcome to another. As in any percolation-like system, the transition between propagation and inhibition of infection goes through a critical state with all its attributes. A power-law distribution of the cluster size (corresponding to the fraction of infected cells) with a fairly universal exponent and a cutoff at the upper limit of this distribution.

      Strengths:

      The proposed model suggests an explanation for the apparent diversity of outcomes of viral infections such as COVID.

      Author response: We thank the referee for the concise and accurate summary of our work.

      Weaknesses:

      Those are not real points of weakness, though I think addressing them would substantially improve the manuscript.

      Author response: Below we will address these point by point.

      The key point in the manuscript is the reduction of actual biochemical processes to the NOVAa rules. I think more could be said about it, be it referring to a set of well-known connections between expression states of cells and their reaction to infection or justifying it as an educated guess.

      Author response: We have now improved this part in the model section. We have added a few sentences explaining how the cell state transitions are motivated by the UMAP results:

      “The cell state transitions triggered by IFN signaling or viral replication are known in viral infection, but how exactly the transitions are orchestrated for specific infections is poorly understood. The UMAP cell state distribution hints at possible preferred transitions between states. The closer two cell states are on the UMAP, the more likely transitions between them are, all else being equal. For instance, the antiviral state (𝐴) is easily established from a susceptible cell (𝑂), but not from the fully virus-hijacked cell (𝑉 ). The IFN-secreting cell state (𝑁) requires the co-presence of the viral and antiviral genes and thus the cell cluster is located between the antiviral state (𝐴) and virus-infected state (𝑉 ) but distant from the susceptible cells (𝑂).

      Inspired by the UMAP data visualization (Fig. 1a), we propose the following transitions between five main discrete cell states”

      Another aspect where the manuscript could be improved would be to look a little beyond the strange and 'not-so-relevant for a biomedical audience' focus on the percolation critical state. While the presented calculation of the precise percolation threshold and the critical exponent confirm the numerical skills of the authors, the probability that an actual infected tissue is right at the threshold is negligible. So in addition to the critical properties, it would be interesting to learn about the system not exactly at the threshold: For example, how the speed of propagation of infection depends on subcritical p_a and what is the cluster size distribution for supercritical p_a.

      Author response: We agree that further exploring the model away from the critical threshold is worthwhile. While our main focus has been on explaining the large degree of heterogeneity in outcomes – readily explained as a consequence of the sharp threshold-like behavior – we now include plots of the time-evolution of the infection (as well as the remaining states) over time for subcritical values of pa. The plots can be found in Figure S4 of the supplement.

      Reviewer #2 (Public Review):

      Xu et al. introduce a cellular automaton model to investigate the spatiotemporal spreading of viral infection. In this study, the author first analyzes the single-cell RNA sequencing data from experiments and identifies four clusters of cells at 48 hours post-viral infection, including susceptible cells (O), infected cells (V), IFN-secreting cells (N), and antiviral cells (A). Next, a cellular automaton model (NOVAa model) is introduced by assuming the existence of a transient pre-antiviral state (a). The model consists of an LxL lattice; each site represents one cell. The cells change their state following the rules depending on the interaction of neighboring cells. The model introduces a key parameter, p_a, representing the fraction of pre-antiviral state cells. Cell apoptosis is omitted in the model. Model simulations show a threshold-like behavior of the final attack rate of the virus when p_a changes continuously. There is a critical value p_c, so that when p_a < p_c, infections typically spread to the entire system, while at a higher p_a > p_c, the propagation of the infected state is inhibited. Moreover, the radius R that quantifies the diffusion range of N cells may affect the critical value p_c; a larger R yields a smaller value of the critical value p_c. The structure of clusters is different for different values of R; greater R leads to a different microscopic structure with fewer A and N cells in the final state. Compared with the single-cell RNA seq data, which implies a low fraction of IFN-positive cells - around 1.7% - the model simulation suggests R=5. The authors also explored a simplified version of the model, the OVA model, with only three states. The OVA model also has an outbreak size. The OVA model shows dynamics similar to the NOVAa model. However, the change in microstructure as a function of the IFN range R observed in the NOVAa model is not observed in the OVA model.

      Author response: We thank the referee for the comprehensive summary of our work.

      Data and model simulation mainly support the conclusions of this paper, but some weaknesses should be considered or clarified.

      Author response: Thank you - we will address these point by point below.

      (1) In the automaton model, the authors introduce a parameter p_a, representing the fraction of pre-antiviral state cells. The authors wrote: ``The parameter p_a can also be understood as the probability that an O cell will switch to the N or A state when exposed to the virus of IFNs, respectively.' Nevertheless, biologically, the fraction of pre-antiviral state cells does not mean the same value as the probability that an O cell switches to the N or A state. Moreover, in the numerical scheme, the cell state changes according to the deterministic role N(O)=a and N(a)=A. Hence, the probability p_a did not apply to the model simulation. It may need to clarify the exact meaning of the parameter p_a.

      Author response: We acknowledge that this was an imprecise formulation, and have now changed it.

      What we tried to convey with that comment was that, alternatively to having a certain fraction of cells be in the a state initially, one could instead have devised a model in which We should note that even the current model has a level of stochasticity, since we choose the cells to be updated with a constant probability rate - we choose N cells to update in each timestep, with replacement.

      However, based on your suggestion, we simulated a version of the dynamics which included stochastic conversion, i.e. each action of a cell on a nearby cell happens only with a probability p_conv (and the original model is recovered as the p_conv=1 scenario). Of course, this slows down the dynamics (or effectively rescales time by a factor p_conv), but crucially we find that it does not appreciably affect the location of the threshold p_c. Below we include a parameter scan across p_a values for R=1 and p_conv=0.5, which shows that the threshold continues to appear at around p_a=27%. each O-state cell simply had a probability to act as an a-state cell upon exposure to the virus or to interferons, i.e. to switch to an N state (if exposed to virus) or to the A state (if exposed to interferons). In this simplified model, there would be no functional difference, since it would simply amount to whether each cell had a probability to be designated an a-cell initially (as in our model), or upon exposure. So our remark mainly served to explain that the role of the p_a parameter is simply to encode that a certain fraction of virus-naive cells behave this way (whether predetermined or not).

      (2) The current model is deterministic. However, biologically, considering the probabilistic model may be more realistic. Are the results valid when the probability update strategy is considered? By the probability model, the cells change their state randomly to the state of the neighbor cells. The probability of cell state changes may be relevant for the threshold of p_a. It is interesting to know how the random response of cells may affect the main results and the critical value of p_a.

      Author response: This is a good point - we are firm believers in the importance of stochasticity. We should note that even the current model has a level of stochasticity, since we choose the cells to be updated with a constant probability rate - we choose N cells to update in each timestep, with replacement.

      However, based on your suggestion, we simulated a version of the dynamics which included stochastic conversion, i.e. each action of a cell on a nearby cell happens only with a probability p_conv (and the original model is recovered as the p_conv=1 scenario). Of course, this slows down the dynamics (or effectively rescales time by a factor p_conv), but crucially we find that it does not appreciably affect the location of the threshold p_c. Below we include a parameter scan across p_a values for R=1 and p_conv=0.5, which shows that the threshold continues to appear at around p_a=27%.

      We now discuss these findings in the supplement and include the figure below as Fig. S5.

      Author response image 1.

      (3) Figure 2 shows a critical value p_c = 27.8% following a simulation on a lattice with dimension L = 1000. However, it is unclear if dimension changes may affect the critical value.

      Author response: Re-running the simulations on a lattice 4x as large (i.e. L=2000) yields a similar critical value of 27-28% for R=1, so we are confident that finite size effects do not play a major role at L=1000 and beyond. For R=5, however, we find that a minimum lattice size greater than L=1000 is necessary to determine the critical threshold. Concretely, we find that the threshold value pc for R=5 changes somewhat when the lattice size is increased from 1000 to 2000, but is invariant under a change from 2000 to 3000, so we conclude that L=2000 is sufficient for R=5. The pc value for R=5 cited in the manuscript (~0.4%) was determined from simulations at L=2000.

      Reviewer #3 (Public Review):

      Summary:

      This study considers how to model distinct host cell states that correspond to different stages of a viral infection: from naïve and susceptible cells to infected cells and a minority of important interferon-secreting cells that are the first line of defense against viral spread. The study first considers the distinct host cell states by analyzing previously published single-cell RNAseq data. Then an agent-based model on a square lattice is used to probe the dependence of the system on various parameters. Finally, a simplified version of the model is explored, and shown to have some similarity with the more complex model, yet lacks the dependence on the interferon range. By exploring these models one gains an intuitive understanding of the system, and the model may be used to generate hypotheses that could be tested experimentally, telling us "when to be surprised" if the biological system deviates from the model predictions.

      Author response: Thank you for the summary! We agree with the role that you describe for a model such as this one.

      Strengths:

      -  Clear presentation of the experimental findings and a clear logical progression from these experimental findings to the modeling.

      -  The modeling results are easy to understand, revealing interesting behavior and percolation-like features.

      -  The scaling results presented span several decades and are therefore compelling. - The results presented suggest several interesting directions for theoretical follow-up work, as well as possible experiments to probe the system (e.g. by stimulating or blocking IFN secretion).

      Weaknesses:

      -  Since the "range" of IFN is an important parameter, it makes sense to consider lattice geometries other than the square lattice, which is somewhat pathological. Perhaps a hexagonal lattice would generalize better.

      -  Tissues are typically three-dimensional, not two-dimensional. (Epithelium is an exception). It would be interesting to see how the modeling translates to the three-dimensional case. Percolation transitions are known to be very sensitive to the dimensionality of the system.

      Author response: We agree that probing different lattice geometries (2- and 3-dimensional alike) would be interesting and worthwhile. However, for this manuscript, we prefer to confine the analysis to the current, simple case. We do agree, however, that an extensive exploration of the role of geometry is an interesting future possibility.

      -  The fixed time-step of the agent-based modeling may introduce biases. I would consider simulating the system with Gillespie dynamics where the reaction rates depend on the ambient system parameters.

      -  Single-cell RNAseq data typically involves data imputation due to the high sparsity of the measured gene expression. More information could be provided on this crucial data processing step since it may significantly alter the experimental findings.

      Justification of claims and conclusions:

      The claims and conclusions are well justified.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is necessary to explain what UMAP does. Is clustering done in the space of twenty-something original dimensions or 2D? How UMAP1 and UMAP2 are selected and are those the same in all plots?

      Author response: We have now added a few sentences to clarify the point raised above - the second snippet explains how clustering is performed:

      “As a dimension reduction algorithm, UMAP is a manifold learning technique that favors the preservation of local distances over global distances (McInnes et al., 2018; Becht et al., 2019). It constructs a weighted graph from the data points and optimizes the graph layout in the low-dimensional space.”

      “We cluster the cells with the principal components analysis (PCA) results from their gene expression. With the first 16 principal components, we calculate k-nearest neighbors and construct the shared nearest neighbor graph of the cells then optimize the modularity function to determine clusters. We present the cluster information on the UMAP plane and use the same UMAP coordinates for all the plots in this paper hereafter.”

      Figure 1, what do bars in the upper right corners of panels d,e,f, and g indicate? ``Averaged' refers to time average? Something is missing in ``Cell proportions are labeled with corresponding colors in a)' .

      Author response: Thank you - we have now modified the figure caption. The bars in the upper right corners of panels d, e, f are color keys for gene expression, the brighter the color is, the higher the gene expression is.

      “Averaged” gene expression refers to the mean expression of that particular gene across the cells within each indicated cluster.

      The lines in c) correspond to cell proportions in different states at different time points. The same state in 1) and c) is shown in the same color.

      Line 46, ``However' does not sound right in this context. Would ``Also' be better?

      Author response: We agree and have corrected it in the revised manuscript.

      Line 96``The viral genes are also partially expressed in these cells, but different from the 𝑁 cluster, the antiviral genes are fully expressed (Fig. S1 and S2).' The sentence needs to be rephrased.

      Author response: We have rephrased the sentence: “As in the N cluster, the viral gene E is barely detected in these cells, indicating incomplete viral replication. However, in contrast to the N cluster, the antiviral genes are expressed to their full extent (Fig. S1 and S2).”

      Line 126, missing "be", ``large' -> ``larger'.

      Author response: Thank you, we have now corrected these typos.

      Line 139-140 The logical link between ignoring apoptosis and the diffusion of IFN is unclear.

      Author response: We modified the sentence as “Here, we assume that the secretion of IFNs by the 𝑁 cells is a faster process than possible apoptosis (Wen et al., 1997; Tesfaigzi, 2006) of these cells and that the diffusion of IFNs to the neighborhood is not significantly affected by apoptosis.”

      Fig. 2a Do the yellow arrows show the effect of IFN and the purple arrows the propagation of viral infection?

      Author response: That is correct. We have added this information to the figure caption: “The straight black arrows indicate transitions between cell states. The curved yellow arrows indicate the effects of IFNs on activating antiviral states. The curved purple arrows indicate viral spread to cells with 𝑂 and 𝑎 states.”

      Fig. 3, n(s) as the axis label vs P(s) in the text? How do the curves in panel a) look when the p_a is well above or below p_c?

      Author response: Thank you. We have edited the labels in the figure to reflect the symbols used in the text.

      Boundary conditions? From Fig. 4, apparently periodic?

      Author response: Yes, we use periodic boundary conditions in the model. We clarify it in the model section now (last sentence).

      It will be good to see a plot with time dependences of all cell types for a couple of values of p_a, illustrating propagation and cessation of the infection.

      Author response: We agree, and have added a Figure S4 in the supplement which explores exactly that. Thank you for the suggestion.

      A verbal qualitative description of why p_a has such importance and how the infection is terminated for large p_a would help.

      Reviewer #2 (Recommendations For The Authors):

      Below are two minor comments:

      (1) In the single-cell RNA sequencing data analysis, the authors describe the cell clusters O, V, A, and N. However, showing how the clusters are identified from the data might be more straightforward.

      Author response: Technically, we cluster the cells using principal components analysis (PCA) results of their gene expression. With the first 16 principal components, we calculate k-nearest neighbors and construct the shared nearest neighbor graph of the cells and then optimize the modularity function to determine clusters. We manually annotate the clusters with O, V, A, and N based on the detected abundance of viral genes, antiviral genes, and IFNs.

      (2) In Figure 3, what does n(s) mean in Figure 3a? And what is the meaning of the distribution P(s) of infection clusters? It may be stated clearly.

      Author response: The use of n(s) was inconsistent, and we have now edited the figure to instead say P(s), to harmonize it with the text. P(s) is the distribution of cluster sizes, s, expressed as a fraction of the whole system. In other words, once a cluster has reached its final size, we record s=(N+V)/L^2 where N and V are the number of N and V state cells in the cluster (note that, by design, each simulation leads to a single cluster, since we seed the infection in one lattice point). We now indicate more clearly in the caption and the main text what exactly P(s) and s refer to.

      Reviewer #3 (Recommendations For The Authors):

      - Would the authors kindly share the simulation code with the community? Also, the data analysis code should be shared to follow current best practices. This needs to be standard practice in all publications. I would go as far as to say that in 2024 publishing a data analysis / simulation study without sharing the relevant code should be ostracized by the community.

      Author response: We absolutely agree and have created a GitHub repository in which we share the C++ source code for the simulations and a Python notebook for plotting. The public repository can be found at https://github.com/BjarkeFN/ViralPercolation. We add this information in supplement under section “Code availability”.

      ­

      - I would avoid the use of the wording "critical" threshold since this is almost guaranteed to infuriate a certain type of reader.

      ­

      - Line 265 has a curious use of " ... " which should be replaced with something more appropriate.

      Author response: Thank you for pointing it out! We have checked the typos.

    1. The child may feel shame (they might not be developmentally able to separate their identity from the momentary rejection)

      I think the nuance in meaning of guilt and shame reflects the belief we hold as a society that ppl aren’t inherently bad or good. Moreover, the kind of behavior and action that you do don’t define your moral compass nor does it say anything directly about yourself as an individual. In my own experience I find it very hard to unascribe myself to the criticism that my actions receive. For instance, when I get peer reviews, I know that objectively the comments are directed toward my writing but it’s difficult not to also put yourself under that lens of criticism when you are the one who procured the work.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements

      We thank all three reviewers for their time and care in reviewing our manuscript, in particular Reviewer 3 for providing a detailed critique that was very useful for planning revisions. We are grateful that all three reviewers indicate that the new genome resources presented in this work are of high-quality and address an existing knowledge gap. We are also grateful for general assessments that the manuscript is 'well-written', and the analyses 'well performed' and 'thorough'.

      We acknowledge Reviewer 3’s legitimate criticism that the assembly and annotation data is not already publicly available and would like to assure the reviewing team that we have been pressing NCBI to progress the submission status since before the preprint was submitted. We regret the delay but hope that we can resolve this issue promptly. Furthermore, as some additional fields in the REAT genome annotation are lost during the NCBI submission process, we will ensure that comprehensive annotation files are also added to Zenodo.

      Reviewer 3 also made the general comment that 'the manuscript could greatly benefit from merging the result and discussion sections' and we would naturally be happy to make this adjustment if the journal in question uses that format.

      Description of the planned revisions

      • We will follow suggestions by Reviewer 3 to improve clarity of two figures:

      Figure S9: Please use a more appropriate colour palette. It is difficult to know the copy number based on the colour gradient.

      Figure 5: Consider changing panel B for a similar version of Fig S12. I think it gives a cleaner and more general perspective of the presence of starship elements.

      • We will address the choice of LOESS versus linear regression for investigating the relationship between candidate secreted effector protein (CSEP) density and transposable element (TE) density, as queried by Reviewer 3:

      Lines 140-144: LOESS smoothing functions are based on local regressions and usually find correlations when there are very weak associations. The authors have to justify the use of this model versus a simpler and more straightforward linear regression. My suspicion is that the latter would fail to find an association. Also, there is no significance of Kendall's Tau estimate (p-value).

      We agree with the reviewer, that as we did not find an association with the more sensitive LOESS, we expect that linear regression would also not find an association, supporting our current conclusions. We will add this negative result into the text.

      • We will check for other features associated with the distribution of CSEPs, as queried by Reviewer 3:

      Lines 157-163: Was there any other feature associated with the CSEP enrichment? GC content? Repetitive content? Centromere likely localisation?

      • We will integrate TE variation into the PERMANOVA lifestyle testing, as suggested by Reviewer 3:

      Line 186: Why not to test the variation content of TEs as a factor for the PERMANOVA?

      In reviewing this suggestion, we also spotted an error in our data plotting code, and the PERMANOVA lifestyle result for all genes will be corrected from 17% to 15% in Fig. 4a. Correcting this error does not impact our ultimate results or interpretation.

      • To complement the current graphical-based assessment of approximate data normality, we will include additional tests (Shapiro-Wilk for sample sizes

      Line 743: Q-Q plots are not a formal statistical test for normality.

      • One of the main critiques from Reviewer 3 was that, although we already acknowledged low sample sizes being a limitation of this work, the manuscript could benefit from reframing with greater consideration of this factor. They also highlighted a few specific places in the text that could be rephrased in consideration of this:

      Line 267: "Multiple strains" can be misleading about the magnitude.

      Lines 305-307: The fact that there is significant copy number variation between the two GtA strains suggests that the variation in the GtA lineage has not been fully captured and that there may be an unsampled substructure. Although the authors acknowledge the need for pangenomic references, they should recognize this limitation in the sample size of their own study, especially when expressing its size as "multiple strains" (line 267).

      Lines 314-317: Again, the sample size is still very small and likely not representative. It suggests UNSAMPLED substructure even for the UK populations.

      Line 164 (and whole section): I would invite the authors to cautiously revisit the use of the terms "core", "soft core". The sample size is very low, as they themselves acknowledge, and probably not representative of the diversity of Gaeumannomyces.

      We intend to edit the text to address this, including removal of both text and figure references to ‘soft-core’ genes, as we agree the term is likely not meaningful in this case, and removing it has no bearing on the results or interpretation.

      Description of the revisions that have already been incorporated in the transferred manuscript

      • We have amended the text in a number of places for clarity/fluency as suggested by Reviewer 3:

      ii) There need to be an explicit conclusion about the differences between pathogenic Gt and non-pathogenic Gh. Somehow, this is not entirely clear and is probably only a matter of rephrasing.

      Please see new lines 477-478: ‘Regarding differences between pathogenic Gt and non-pathogenic Gh, we found that Gh has a larger overall genome size and greater number of genes.’

      Lines 309-314: The message seems a bit out of context in the paragraph.

      This is valid, these lines have now been removed.

      Lines 392-395: The idea that crop pathogenic fungi are under pressure that favours heterothallism does not take into account the multiple cases of successful pathogenic clonal lineages in which sexual reproduction is absent. This paragraph seems very speculative to me. Please rephrase it.

      Our intention here was the exact reverse, that crop pathogens are under pressure to favour homothallism (as Reviewer 3 points out, anecdotally this often seems to play out in nature). We have rephrased lines 386-390 to hopefully make our stance more explicit: 'Together, this could suggest a selective pressure towards homothallism for crop fungal pathogens, and a switch from heterothallism in Gh to homothallism in Gt and Ga may, therefore, have been a key innovation underlying lifestyle divergence between non-pathogenic Gh and pathogenic Gt and Ga.'

      Lines 463-464: Please refer to the analyses when discussing the genetic divergence.

      We have rephrased this sentence to make our intended point clearer, please see new lines 459-461: ‘If we compare Ga and Gt in terms of synteny, genome size and gene content, the magnitude of differences does not appear to be more pronounced than those between GtA and GtB.’

      • We have also fixed the following typographic errors highlighted by Reviewer 3:

      Line 399: You mean, Fig 4C?

      Line 722: You missed "trimAI"

      Lines 723-727: Missing citations for "AMAS" and RAxML-NG, "AHDR" and "OrthoFinder"

      • We have added genome-wide RIP estimates to Supplementary Table S1 as requested by Reviewer 3:

      Lines 416-422: Please provide the data related to the genome-wide estimates of RIP.

      • We have added a note clarifying that differences in overall genome size between lineages are not fully explained by differences in gene copy-number (lines 406-408: 'We should note that the total length of HCN genes was not sufficiently large to account for the overall greater genome size of GtB compared to GtA (Supplemental Table S1).') in response to a comment from Reviewer 3:

      Line 396: The difference in duplicated genes raises the question of whether there are differences in overall genome size between lineages and, if so, whether they can be explained by the presence of genes.

      • We have made an alteration to the author order and added equal second-author contributions.

      Description of analyses that authors prefer not to carry out

      • In response to our analysis regarding the absence of TE-effector compartmentalisation in this system, Reviewer 1 requested additional analyses:

      While TE enrichment is typically associated with accessory compartments, it is not a defining feature. To bolster the authors' claim, it is essential to demonstrate that there is no bias in the ratio of conserved and non-conserved genes across the genomes.

      We believe that there are two slightly different compartmentalisation concepts being somewhat conflated here – (1) the idea of compartments where TEs and virulence proteins such as effectors are significantly colocalised in comparison with the rest of the genome, and (2) the idea of compartments containing gene content that is not shared in all strains (i.e. accessory). The two may overlap – as Reviewer 2 states, accessory compartments may also be enriched with TEs – but not necessarily. We specifically address the first concept in our text, and we appreciate Reviewer 3’s response on this subject:

      There is a clear answer for the compartmentalisation question. The authors favour the idea of "one-compartment" with compelling analyses.

      We believe that the second concept of accessory compartments is shown to be irrelevant in this case from our GENESPACE results (see Fig. 2), which demonstrate that gene content is conserved, broadly syntenic even, across strains, with no clear evidence of accessory compartments or chromosomes regarding gene content. We have already acknowledged that other mechanisms of compartmentalisation beyond TE-effector colocalisation may be at play (as seen from our exploration of effector distributions biased towards telomeres, see section from line 156: ‘Although CSEPs were not broadly colocalised with TEs, we did observe that they appeared to be non-randomly distributed in some pseudochromosomes (Fig. 3a)…’).

      • Reviewer 1 questioned the statement that higher level of genome-wide RIP is consistent with lower levels of gene duplication:

      L422: Is the highest RIP rate in GtA consistent with its low levels of gene duplication? Does this suggest that duplicated sequences in GtA are no longer recognizable due to RIP mutations? This seems counterintuitive, as RIP is primarily triggered by gene duplication.

      Our understanding is that, while RIP can directly mutate coding regions, it predominantly acts on duplicated sequences within repetitive regions such as TEs (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02060-w), which has a knock-on effect of reducing TE-mediated gene duplication. In Neurospora crassa, where RIP was first discovered and thus the model species for much of our understanding of the process, a low number of gene duplicates has been linked to the activity of RIP (https://www.nature.com/articles/nature01554). We therefore believe the current text is reasonable.

      • Reviewer 2 stated that experimental validation of gene function is required to make clear links to lifestyle or pathogenicity:

      In my eyes, the study has two main limitations. First of all, the research only concerns genomics analyses, and therefore is rather descriptive and observational, and as such does not provide further mechanistic details into the pathogen biology and/or into pathogenesis. This is further enhanced by the lack of clear observations that discriminate particular species/lineages or life styles from others in the study. Some observations are made with respect to variations in candidate secreted effector proteins and biosynthetic gene clusters, but clear links to life style or pathogenicity are missing. To further substantiate such links, lab-based experimental work would be required.

      We agree that in an ideal world supportive wet biology gene function experimental evidence would be included. Unfortunately, transformation has not been successfully developed yet in this system (see lines 33-35: ‘There have also been considerable difficulties in producing a reliable transformation system for Gt, preventing gene disruption experiments to elucidate function (Freeman and Ward 2004).’) not for lack of trying – after 18 months of effort using all available transformation techniques and selectable markers neither Gt or Gh was transformable. Undertaking that challenge has proven to be far beyond the scope of this paper, the purpose of which was to generate and analyse high-quality genomic data, a major task in itself. We again appreciate Reviewer 3’s response to this point, agreeing that it is out of scope for this work:

      I just want to respectfully disagree with reviewer #2 about the need for more experimental laboratory work, as in my opinion it clearly goes beyond the intention and scope of the submitted work. This could be a limitation that would depend on the chosen journal and its specific format and requirements. Finally, I think it would suffice for the authors to discuss on the lack of in-depth experimental work as part of the limitations of their overall approach.

      As per the suggestion by Reviewer 3, we will add text to address the absence of in-depth experimental work within the scope of this study.

      • Reviewer 3 suggested we might 'consider including formal population differentiation estimators', however, as they previously highlighted above, our sample sizes are too small to produce reliable population-level statistics.

      • Reviewer 3 raised the disparity in the appearance of branches at the root of phylogenetic trees in various figures:

      Figure 4a (and Figs S5, S13): The depicted tree has a trichotomy at the basal node. Please correct it so Magnaporthiopsis poae is resolved as an outgroup, as in Fig. S17.

      All the trees were rooted with M. poae as the outgroup, and although it may seem counterintuitive, a trifurcation at the root is the correct outcome in the case of rerooting a bifurcating tree, please see this discussion including the developers of both leading phylogeny visualisation tools ggtree and phytools (https://www.biostars.org/p/332030/). Although it is possible to force a bifurcating tree after rooting by positioning the root along an edge, the resulting branch lengths in the tree can be misleading, and so in cases where we wanted to include meaningful branch lengths in the figure (i.e. estimated from DNA substitute rates, in Figures 4a, S5 and S13) we have not circumvented the trifurcation. In Fig S17 meaningful branch lengths have not been included and the tree only represents the topology, resulting in the appearance of bifurcation at the root.

      • Reviewer 3 suggested that the discussion on giant Starship TEs resembled more of a review:

      Lines 434-451: This section resembles more a review than a discussion of the results of the present work. This also highlights the lack of analysis on the genetic composition and putative function of the identified starship-like elements.

      The reviewer has a valid point. However, Starships are a recently discovered and thus underexplored genetic feature that readers from the wider mycology/plant pathology community may not yet be aware of. We believe it is warranted to include some additional exposition to give context for why their discovery here is novel, interesting and unexpected. We are naturally keen to investigate the make-up of the elements we have found in this lineage, however that will require a substantial amount of further work. Analysis of Starships is not trivial, for example the starfish tool is still under development and a limited number of species have been used to train it. How best to compare elements is also an active area of investigation – they are dynamic in their structure and may include genes originating from the host genome or a previous host – and for this reason we believe is out of scope to interrogate alongside the other foundational genomic data presented in this paper.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The manuscript "Evolutionary genomics reveals variation in structure and genetic content implicated in virulence and lifestyle in the genus Gaeumannomyces" by Rowena Hill and collaborators is a thorough, well-planned and designed work. They have described 9 almost complete new assemblages, from their most general characteristics to their genetic content and implications. I am very pleased with the quality and completeness of this work and agree that it provides a very useful resource and framework for further research on this important organism.

      The three main motivations of the present study were:

      1) Are there genomic signatures distinguishing Gt A/B virulence lineages?;

      2) How do gene repertoires differ between pathogenic Gt and non-pathogenic Gh? And, iii) Is there evidence of genome compartmentalisation in Gaeumannomyces?

      a) The authors themselves recognise the low number of samples in their work (Lines 453-454) and this limitation hampers the establishment of a clear association between lineage-specific virulence and genomic signatures. I would argue that the present work needs to be reframed factoring this main limitation.

      b) There need to be an explicit conclusion about the differences between pathogenic Gt and non-pathogenic Gh. Somehow, this is not entirely clear and is probably only a matter of rephrasing.

      c) There is a clear answer for the compartmentalisation question. The authors favour the idea of "one-compartment" with compelling analyses.

      Major comments:

      The authors have not published the genomic data. Therefore, it is impossible to audit the quality of the assemblies and impedes its reproducibility. It is also bad practice by current scientific standards.

      I strongly believe that the manuscript could greatly benefit from merging the result and discussion sections. It would be easier for the reader to follow their entire logic. This is of course something optional and contingent on the journal format.

      Minor and specific comments:

      RESULTS

      • Lines 140-144: LOESS smoothing functions are based on local regressions and usually find correlations when there are very weak associations. The authors have to justify the use of this model versus a simpler and more straightforward linear regression. My suspicion is that the latter would fail to find an association. Also, there is no significance of Kendall's Tau estimate (p-value).

      • Lines 157-163: Was there any other feature associated with the CSEP enrichment? GC content? Repetitive content? Centromere likely localisation?

      • Line 164 (and whole section): I would invite the authors to cautiously revisit the use of the terms "core", "soft core". The sample size is very low, as they themselves acknowledge, and probably not representative of the diversity of Gaeumannomyces.

      • Figure 4a (and Figs S5, S13): The depicted tree has a trichotomy at the basal node. Please correct it so Magnaporthiopsis poae is resolved as an outgroup, as in Fig. S17.

      • Line 186: Why not to test the variation content of TEs as a factor for the PERMANOVA?

      • Figure S9: Please use a more appropriate colour palette. It is difficult to know the copy number based on the colour gradient.

      • Figure 5: Consider changing panel B for a similar version of Fig S12. I think it gives a cleaner and more general perspective of the presence of starship elements.

      DISCUSSION

      • Line 267: "Multiple strains" can be misleading about the magnitude.

      • Lines 305-307: The fact that there is significant copy number variation between the two GtA strains suggests that the variation in the GtA lineage has not been fully captured and that there may be an unsampled substructure. Although the authors acknowledge the need for pangenomic references, they should recognize this limitation in the sample size of their own study, especially when expressing its size as "multiple strains" (line 267).

      • Lines 309-314: The message seems a bit out of context in the paragraph.

      • Lines 314-317: Again, the sample size is still very small and likely not representative. It suggests UNSAMPLED substructure even for the UK populations.

      • Lines 392-395: The idea that crop pathogenic fungi are under pressure that favours heterothallism does not take into account the multiple cases of successful pathogenic clonal lineages in which sexual reproduction is absent. This paragraph seems very speculative to me. Please rephrase it.

      • Line 396: The difference in duplicated genes raises the question of whether there are differences in overall genome size between lineages and, if so, whether they can be explained by the presence of genes.

      • Line 399: You mean, Fig 4C?

      • Lines 416-422: Please provide the data related to the genome-wide estimates of RIP.

      • Lines 434-451: This section resembles more a review than a discussion of the results of the present work. This also highlights the lack of analysis on the genetic composition and putative function of the identified starship-like elements.

      • Lines 463-464: Please refer to the analyses when discussing the genetic divergence. Consider including formal population differentiation estimators.

      METHODS

      • Line 722: You missed "trimAI"

      • Lines 723-727: Missing citations for "AMAS" and RAxML-NG, "AHDR" and "OrthoFinder" Line 743: Q-Q plots are not a formal statistical test for normality.

      Referees cross-commenting

      I agree with my peer reviewers and appreciate that we have shared common concerns and suggestions. I also agree with their comments.

      I just want to respectfully disagree with reviewer #2 about the need for more experimental laboratory work, as in my opinion it clearly goes beyond the intention and scope of the submitted work. This could be a limitation that would depend on the chosen journal and its specific format and requirements. Finally, I think it would suffice for the authors to discuss on the lack of in-depth experimental work as part of the limitations of their overall approach.

      Significance

      The work presented by Hill and co-workers contributes to the understanding of the genetic basis of host-pathogen interactions and evolutionary dynamics in the important fungus responsible for wheat "take-all-disease", Gaeumannomyces tritici. By providing 9 new near-complete assemblages, this work will provide a valuable resource for research on this agronomically important organism. This work sets the stage for developing a global pangenome of G. tritici that can expand analyses of its population structure and specific genetic elements that are associated with its virulence.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain cognition') as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.  

      Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. 

      REVISED VERSION: while the authors have partially addressed my concerns, I do not feel they have addressed them all. I do not feel they have addressed the weight instability and concerns about the stacked regression models satisfactorily.

      Please see our responses to Reviewer #1 Public Review #3 below

      I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. This suffers from the same problem the authors raise with brain age and would indeed disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain cognition. I have indicated the main considerations about these points in the recommendations section below. 

      Thank you so much for raising this point. We now have the following statement in the introduction and discussion to address this concern (see below). 

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study. 

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition,  we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.” 

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address, which mostly relate to clarity and interpretation 

      Reviewer #1 Public Review #1

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain age models more generally. 

      Thank you for your comments on this issue. 

      We now discussed the broader consideration in detail:

      (1) the consistency between our findings on fluid cognition and other recent works on brain disorders, 

      (2) the difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021)

      and 

      (3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      From Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance,  combining different MRI modalities into the prediction models, similar to our stacked models, ocen leads to the highest performance of age prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the lader as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore underfided models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age prediction models from MRI data of largely healthy participants and apply the built age prediction models to participants who are also largely healthy. Accordingly, the age prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fided. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder. 

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest.

      Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. I would request that the authors provide more information to enable the reader to beUer understand the stacked regression models used to ensure that these models are not overfit. 

      Thank you for allowing us an opportunity to clarify our stacked model. We made additional clarification to make this clearer (see below). We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models.  

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features),  “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. Acer looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values. 

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 \= 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models. 

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      Reviewer #1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits? 

      The focus of this article is on the predictions. Still, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features.  We found Spearman’s ρ to be varied dramatically in both age-prediction (range\=.31-.94) and fluid cognition-prediction (range\=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.   

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model.  The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.  

      Reviewer #1 Public Review #4

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods and bias correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.  

      Thank you for the opportunity for us to provide more methodical details.

      First, for the task design, we included the following statements:

      From Methods:

      “HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009). 

      First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a budon to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go]. 

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the lec or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.” 

      Second, for MRI processing procedures, we included the following statements.

      From Methods:

      “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.”

      “Sets of Features 1-10: Task fMRI contrast (Task Contrast)

      Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see hdps://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016). 

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features. “ 

      “Sets of Features 11-13: Task fMRI functional connectivity (Task FC)

      Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliod et al., 2019; Fair et al., 2007; Gradon et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliod et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task. 

      Set of Features 14: Resting-state functional MRI functional connectivity (Rest FC) Similar to Task FC, Rest FC reflects functional connectivity (FC ) among the brain regions, except that Rest FC occurred during the resting (as opposed to task-performing) period. HCPA collected Rest FC from four 6.42-min (488 frames) runs across two days, leading to 26-min long data (Harms et al., 2018). On each day, the study scanned two runs of Rest FC, starting with anterior-to-posterior (AP) and then with posterior-to-anterior (PA) phase encoding polarity. We used the “rfMRI_REST_Atlas_MSMAll_hp0_clean.dscalar.nii” file that was preprocessed and concatenated across the four runs.  We applied the same computations (i.e., highpass filter, parcellation, Pearson’s correlations, r-to-z transformation and PCA) with the Task FC. 

      Sets of Features 15-18: Structural MRI (sMRI)

      sMRI reflects individual differences in brain anatomy. The HCP-A used an established preprocessing pipeline for sMRI (Glasser et al., 2013). We focused on four sets of features: cortical thickness, cortical surface area, subcortical volume and total brain volume. For cortical thickness and cortical surface area, we used Destrieux’s atlas (Destrieux et al., 2010; Fischl, 2012) from FreeSurfer’s “aparc.stats” file, resulting in 148 regions for each set of features. For subcortical volume, we used the aseg atlas (Fischl et al., 2002) from FreeSurfer’s “aseg.stats” file, resulting in 19 regions. For total brain volume, we had five FreeSurfer-based features: “FS_IntraCranial_Vol” or estimated intra-cranial volume, “FS_TotCort_GM_Vol” or total cortical grey mader volume, “FS_Tot_WM_Vol” or total cortical white mader volume, “FS_SubCort_GM_Vol” or total subcortical grey mader volume and “FS_BrainSegVol_eTIV_Ratio” or ratio of brain segmentation volume to estimated total intracranial volume.”

      Third, for regression methods and bias correction methods used, we included the following statements:

      From Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and morecomplicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below). 

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘a’: the greater the a, the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘ℓ! ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; ℓ! ratio=0) or absolute (known as ‘Lasso’; ℓ! ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as:

      where X is the features, y is the target, and b is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters: a using 70 numbers in log space, ranging from .1 and 100, and ℓ!-ratio using 25 numbers in linear space, ranging from 0 and 1.

      To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘a’ and ‘ℓ! ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘a’ leads to similar predictive performance), resulting in different ‘a’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without spli{ng them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled acer data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices.

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikitlearn. Frontiers in Neuroinformatics, 8, 14. hdps://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. hdps://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Saderthwaite, T. D., … on behalf of the ISTAGING Consortium,  the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. hdps://doi.org/10.1093/brain/awaa160

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. hdps://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Saderthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pi alls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. hdps://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. hdps://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage, 53(1), 1–15. hdps://doi.org/10.1016/j.neuroimage.2010.06.010

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. hdps://doi.org/10.1111/j.16000587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. hdps://doi.org/10.1098/rstb.2017.0284

      Elliod, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffid, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. hdps://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. hdps://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B. (2012). FreeSurfer. NeuroImage, 62(2), 774–781. hdps://doi.org/10.1016/j.neuroimage.2012.01.021

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. hdps://doi.org/10.1016/S0896-6273(02)00569-X

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175– 1187. hdps://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. hdps://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. hdps://doi.org/10.1093/cercor/bhu239

      Gradon, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. hdps://doi.org/10.1016/j.neuron.2018.03.035

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fi{ng’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. hdps://doi.org/10.1093/brain/awaa454

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapredo, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. hdps://doi.org/10.1016/j.neuroimage.2018.09.060

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. hdps://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. PaUerns, 4(4), 100712. hdps://doi.org/10.1016/j.pader.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. hdps://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. hdps://doi.org/10.1016/j.biopsych.2015.12.023

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. hdps://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. hdps://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. hdps://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. hdps://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Predenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. hdps://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. hdps://doi.org/10.1371/journal.pcbi.1008347

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Huder, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. hdps://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. hdps://doi.org/10.1002/hbm.25323

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapredo, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. hdps://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. hdps://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. hdps://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. hdps://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain-cognition relationship: Integrating task-based fMRI across tasks markedly boosts prediction and test-retest reliability. NeuroImage, 263, 119588. hdps://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. hdps://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. hdps://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. hdps://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. hdps://doi.org/10.1111/j.1467-9868.2005.00503.x

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the Reviewers

      We thank the referees for their careful reading of the manuscript and their valuable suggestions for improvements.

      General Statements:

      Existing SMC-based loop extrusion models successfully predict and characterize mesoscale genome spatial organization in vertebrate organisms, providing a valuable computational tool to the genome organization and chromatin biology fields. However, to date this approach is highly limited in its application beyond vertebrate organisms. This limitation arises because existing models require knowledge of CTCF binding sites, which act as effective boundary elements, blocking loop-extruding SMC complexes and thus defining TAD boundaries. However, CTCF is the predominant boundary element only in vertebrates. On the other hand, vertebrates only contain a small proportion of species in the tree of life, while TADs are nearly universal and SMC complexes are largely conserved. Thus, there is a pressing need for loop extrusion models capable of predicting Hi-C maps in organisms beyond vertebrates.

      The conserved-current loop extrusion (CCLE) model, introduced in this manuscript, extends the quantitative application of loop extrusion models in principle to any organism by liberating the model from the lack of knowledge regarding the identities and functions of specific boundary elements. By converting the genomic distribution of loop extruding cohesin into an ensemble of dynamic loop configurations via a physics-based approach, CCLE outputs three-dimensional (3D) chromatin spatial configurations that can be manifested in simulated Hi-C maps. We demonstrate that CCLE-generated maps well describe experimental Hi-C data at the TAD-scale. Importantly, CCLE achieves high accuracy by considering cohesin-dependent loop extrusion alone, consequently both validating the loop extrusion model in general (as opposed to diffusion-capture-like models proposed as alternatives to loop extrusion) and providing evidence that cohesin-dependent loop extrusion plays a dominant role in shaping chromatin organization beyond vertebrates.

      The success of CCLE unambiguously demonstrates that knowledge of the cohesin distribution is sufficient to reconstruct TAD-scale 3D chromatin organization. Further, CCLE signifies a shifted paradigm from the concept of localized, well-defined boundary elements, manifested in the existing CTCF-based loop extrusion models, to a concept also encompassing a continuous distribution of position-dependent loop extrusion rates. This new paradigm offers greater flexibility in recapitulating diverse features in Hi-C data than strictly localized loop extrusion barriers.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript presents a mathematical model for loop extrusion called the conserved-current loop extrusion model (CCLE). The model uses cohesin ChIP-Seq data to predict the Hi-C map and shows broad agreement between experimental Hi-C maps and simulated Hi-C maps. They test the model on Hi-C data from interphase fission yeast and meiotic budding yeast. The conclusion drawn by the authors is that peaks of cohesin represent loop boundaries in these situations, which they also propose extends to other organism/situations where Ctcf is absent.

      __Response: __

      We would like to point out that the referee's interpretation of our results, namely that, "The conclusion drawn by the authors is that peaks of cohesin represent loop boundaries in these situations, ...", is an oversimplification, that we do not subscribe to. The referee's interpretation of our model is correct when there are strong, localized barriers to loop extrusion; however, the CCLE model allows for loop extrusion rates that are position-dependent and take on a range of values. The CCLE model also allows the loop extrusion model to be applied to organisms without known boundary elements. Thus, the strict interpretation of the positions of cohesin peaks to be loop boundaries overlooks a key idea to emerge from the CCLE model.

      __ Major comments:__

      1. More recent micro-C/Hi-C maps, particularly for budding yeast mitotic cells and meiotic cells show clear puncta, representative of anchored loops, which are not well recapitulated in the simulated data from this study. However, such punta are cohesin-dependent as they disappear in the absence of cohesin and are enhanced in the absence of the cohesin release factor, Wapl. For example - see the two studies below. The model is therefore missing some key elements of the loop organisation. How do the authors explain this discrepency? It would also be very useful to test whether the model can predict the increased strength of loop anchors when Wapl1 is removed and cohesin levels increase.

      Costantino L, Hsieh TS, Lamothe R, Darzacq X, Koshland D. Cohesin residency determines chromatin loop patterns. Elife. 2020 Nov 10;9:e59889. doi: 10.7554/eLife.59889. PMID: 33170773; PMCID: PMC7655110. Barton RE, Massari LF, Robertson D, Marston AL. Eco1-dependent cohesin acetylation anchors chromatin loops and cohesion to define functional meiotic chromosome domains. Elife. 2022 Feb 1;11:e74447. doi: 10.7554/eLife.74447. Epub ahead of print. PMID: 35103590; PMCID: PMC8856730.

      __Response: __

      We are perplexed by this referee comment. While we agree that puncta representing loop anchors are a feature of Hi-C maps, as noted by the referee, we would reinforce that our CCLE simulations of meiotic budding yeast (Figs. 5A and 5B of the original manuscript) demonstrate an overall excellent description of the experimental meiotic budding yeast Hi-C map, including puncta arising from loop anchors. This CCLE model-experiment agreement for meiotic budding yeast is described and discussed in detail in the original manuscript and the revised manuscript (lines 336-401).

      To further emphasize and extend this point we now also address the Hi-C of mitotic budding yeast, which was not included the original manuscript. We have now added an entire new section of the revised manuscript entitled "CCLE Describes TADs and Loop Configurations in Mitotic S. cerevisiae" including the new Figure 6, which presents a comparison between a portion of the mitotic budding yeast Hi-C map from Costantino et al. and the corresponding CCLE simulation at 500 bp-resolution. In this case too, the CCLE model well-describes the data, including the puncta, further addressing the referee's concern that the CCLE model is missing some key elements of loop organization.

      Concerning the referee's specific comment about the role of Wapl, we note that in order to apply CCLE when Wapl is removed, the corresponding cohesin ChIP-seq in the absence of Wapl should be available. To our knowledge, such data is not currently available and therefore we have not pursued this explicitly. However, we would reinforce that as Wapl is a factor that promotes cohesin unloading, its role is already effectively represented in the optimized value for LEF processivity, which encompasses LEF lifetime. In other words, if Wapl has a substantial effect it will be captured already in this model parameter.

      1. Related to the point above, the simulated data has much higher resolution than the experimental data (1kb vs 10kb in the fission yeast dataset). Given that loop size is in the 20-30kb range, a good resolution is important to see the structural features of the chromosomes. Can the model observe these details that are averaged out when the resolution is increased?

      __Response: __

      We agree with the referee that higher resolution is preferable to low resolution. In practice, however, there is a trade-off between resolution and noise. The first experimental interphase fission yeast Hi-C data of Mizuguchi et al 2014 corresponds to 10 kb resolution. To compare our CCLE simulations to these published experimental data, as described in the original manuscript, we bin our 1-kb-resolution simulations to match the 10 kb experimental measurements. Nevertheless, CCLE can readily predict the interphase fission yeast Hi-C map at higher resolution by reducing the bin size (or, if necessary, reducing the lattice site size of the simulations themselves). In the revised manuscript, we have added comparisons between CCLE's predicted Hi-C maps and newer Micro-C data for S. pombe from Hsieh et al. (Ref. [50]) in the new Supplementary Figures 5-9. We have chosen to present these comparisons at 2 kb resolution, which is the same resolution for our meiotic budding yeast comparisons. Also included in Supplementary Figures 5-9 are comparisons between the original Hi-C maps of Mizuguchi et al. and the newer maps of Hsieh et al., binned to 10 kb resolution. Inspection of these figures shows that CCLE provides a good description of Hsieh et al.'s experimental Hi-C maps and does not reveal any major new features in the interphase fission yeast Hi-C map on the 10-100 kb scale, that were not already apparent from the Hi-C maps of Mizuguchi et al 2014. Thus, the CCLE model performs well across this range of effective resolutions.

      3. Transcription, particularly convergent has been proposed to confer boundaries to loop extrusion. Can the authors recapitulate this in their model?

      __Response: __

      In response to the suggestion of the reviewer we have now calculated the correlation between cohesin ChIP-seq and the locations of convergent gene pairs, which is now presented in Supplementary Figures 17 and 18. Accordingly, in the revised manuscript, we have added the following text to the Discussion (lines 482-498):

      "In vertebrates, CTCF defines the locations of most TAD boundaries. It is interesting to ask what might play that role in interphase S. pombe as well as in meiotic and mitotic S. cerevisiae. A number of papers have suggested that convergent gene pairs are correlated with cohesin ChIP-seq in both S. pombe [65, 66] and S. cerevisiae [66-71]. Because CCLE ties TADs to cohesin ChIP-seq, a strong correlation between cohesin ChIP-seq and convergent gene pairs would be an important clue to the mechanism of TAD formation in yeasts. To investigate this correlation, we introduce a convergent-gene variable that has a nonzero value between convergent genes and an integrated weight of unity for each convergent gene pair. Supplementary Figure 17A shows the convergent gene variable, so-defined, alongside the corresponding cohesin ChIP-seq for meiotic and mitotic S. cerevisiae. It is apparent from this figure that a peak in the ChIP-seq data is accompanied by a non-zero value of the convergent-gene variable in about 80% of cases, suggesting that chromatin looping in meiotic and mitotic S. cerevisiae may indeed be tied to convergent genes. Conversely, about 50% of convergent genes match peaks in cohesin ChIP-seq. The cross-correlation between the convergent-gene variable and the ChIP-seq of meiotic and mitotic S. cerevisiae is quantified in Supplementary Figures 17B and C. By contrast, in interphase S. pombe, cross-correlation between convergent genes and cohesin ChIP-seq in each of five considered regions is unobservably small (Supplementary Figure 18A), suggesting that convergent genes per se do not have a role in defining TAD boundaries in interphase S. pombe."

      Minor comments:

      1. In the discussion, the authors cite the fact that Mis4 binding sites do not give good prediction of the HI-C maps as evidence that Mis4 is not important for loop extrusion. This can only be true if the position of Mis4 measured by ChIP is a true reflection of Mis4 position. However, Mis4 binding to cohesin/chromatin is very dynamic and it is likely that this is too short a time scale to be efficiently cross-linked for ChIP. Conversely, extensive experimental data in vivo and in vitro suggest that stimulation of cohesin's ATPase by Mis4-Ssl3 is important for loop extrusion activity.

      __Response: __

      We apologize for the confusion on this point. We actually intended to convey that the absence of Mis4-Psc3 correlations in S. pombe suggests, from the point of view of CCLE, that Mis4 is not an integral component of loop-extruding cohesin, during the loop extrusion process itself. We agree completely that Mis4/Ssl3 is surely important for cohesin loading, and (given that cohesin is required for loop extrusion) Mis4/Ssl3 is therefore important for loop extrusion. Evidently, this part of our Discussion was lacking sufficient clarity. In response to both referees' comments, we have re-written the discussion of Mis4 and Pds5 to more carefully explain our reasoning and be more circumspect in our inferences. The re-written discussion is described below in response to Referee #2's comments.

      Nevertheless, on the topic of whether Nipbl-cohesin binding is too transient to be detected in ChIP-seq, the FRAP analysis presented by Rhodes et al. eLife 6:e30000 "Scc2/Nipbl hops between chromosomal cohesin rings after loading" indicates that, in HeLa cells, Nipbl has a residence time bound to cohesin of about 50 seconds. As shown in the bottom panel of Supplementary Fig. 7 in the original manuscript (and the bottom panel of Supplementary Fig. 20 in the revised manuscript), there is a significant cross-correlation (~0.2) between the Nipbl ChIP-seq and Smc1 ChIP-seq in humans, indicating that a transient association between Nipbl and cohesin can be (and in fact is) detected by ChIP-seq.

      1. *Inclusion of a comparison of this model compared to previous models (for example bottom up models) would be extremely useful. What is the improvement of this model over existing models? *

      __Response: __

      As stated in the original manuscript, as far as we are aware, "bottom up" models, that quantitatively describe the Hi-C maps of interphase fission yeast or meiotic budding yeast or, indeed, of eukaryotes other than vertebrates, do not exist. Bottom-up models would require knowledge of the relevant boundary elements (e.g. CTCF sites), which, as stated in the submitted manuscript, are generally unknown for fission yeast, budding yeast, and other non-vertebrate eukaryotes. The absence of such models is the reason that CCLE fills an important need. Since bottom-up models for cohesin loop extrusion in yeast do not exist, we cannot compare CCLE to the results of such models.

      In the revised manuscript we now explicitly compare the CCLE model to the only bottom-up type of model describing the Hi-C maps of non-vertebrate eukaryotes by Schalbetter et al. Nat. Commun. 10:4795 2019, which we did cite extensively in our original manuscript. Schalbetter et al. use cohesin ChIP-seq peaks to define the positions of loop extrusion barriers in meiotic S. cerevisiae, for which the relevant boundary elements are unknown. In their model, specifically, when a loop-extruding cohesin anchor encounters such a boundary element, it either passes through with a certain probability, as if no boundary element is present, or stops extruding completely until the cohesin unbinds and rebinds.

      In the revised manuscript we refer to this model as the "explicit barrier" model and have applied it to interphase S. pombe, using cohesin ChIP-seq peaks to define the positions of loop extrusion barriers. The corresponding simulated Hi-C map is presented in Supplementary Fig. 19 in comparison with the experimental Hi-C. It is evident that the explicit barrier model provides a poorer description of the Hi-C data of interphase S. pombe compared to the CCLE model, as indicated by the MPR and Pearson correlation scores. While the explicit barrier model appears capable of accurately reproducing Hi-C data with punctate patterns, typically accompanied by strong peaks in the corresponding cohesin ChIP-seq, it seems less effective in several conditions including interphase S. pombe, where the Hi-C data lacks punctate patterns and sharp TAD boundaries, and the corresponding cohesin ChIP-seq shows low-contrast peaks. The success of the CCLE model in describing the Hi-C data of both S. pombe and S. cerevisiae, which exhibit very different features, suggests that the current paradigm of localized, well-defined boundary elements may not be the only approach to understanding loop extrusion. By contrast, CCLE allows for a concept of continuous distribution of position-dependent loop extrusion rates, arising from the aggregate effect of multiple interactions between loop extrusion complexes and chromatin. This paradigm offers greater flexibility in recapitulating diverse features in Hi-C data than strictly localized loop extrusion barriers.

      We have also added the following paragraph in the Discussion section of the manuscript to elaborate this point (lines 499-521):

      "Although 'bottom-up' models which incorporate explicit boundary elements do not exist for non-vertebrate eukaryotes, one may wonder how well such LEF models, if properly modified and applied, would perform in describing Hi-C maps with diverse features. To this end, we examined the performance of the model described in Ref. [49] in describing the Hi-C map of interphase S. cerevisiae. Reference [49] uses cohesin ChIP-seq peaks in meiotic S. cerevisiae to define the positions of loop extrusion barriers which either completely stall an encountering LEF anchor with a certain probability or let it pass. We apply this 'explicit barrier' model to interphase S. pombe, using its cohesin ChIP-seq peaks to define the positions of loop extrusion barriers, and using Ref. [49]'s best-fit value of 0.05 for the pass-through probability. Supplementary Figure 19A presents the corresponding simulated Hi-C map the 0.3-1.3 kb region of Chr 2 of interphase S. pombe in comparison with the corresponding Hi-C data. It is evident that the explicit barrier model provides a poorer description of the Hi-C data of interphase S. pombe compared to the CCLE model, as indicated by the MPR and Pearson correlation scores of 1.6489 and 0.2267, respectively. While the explicit barrier model appears capable of accurately reproducing Hi-C data with punctate patterns, typically accompanied by strong peaks in the corresponding cohesin ChIP-seq, it seems less effective in cases such as in interphase S. pombe, where the Hi-C data lacks punctate patterns and sharp TAD boundaries, and the corresponding cohesin ChIP-seq shows low-contrast peaks. The success of the CCLE model in describing the Hi-C data of both S. pombe and S. cerevisiae, which exhibit very different features, suggests that the current paradigm of localized, well-defined boundary elements may not be the only approach to understanding loop extrusion. By contrast, CCLE allows for a concept of continuous distribution of position-dependent loop extrusion rates, arising from the aggregate effect of multiple interactions between loop extrusion complexes and chromatin. This paradigm offers greater flexibility in recapitulating diverse features in Hi-C data than strictly localized loop extrusion barriers."

      Reviewer #1 (Significance (Required)):

      This simple model is useful to confirm that cohesin positions dictate the position of loops, which was predicted already and proposed in many studies. However, it should be considered a starting point as it does not faithfully predict all the features of chromatin organisation, particularly at better resolution.

      Response:

      As described in more detail above, we do not agree with the assertion of the referee that the CCLE model "does not faithfully predict all the features of chromatin organization, particularly at better resolution" and provide additional new data to support the conclusion that the CCLE model provides a much needed approach to model non-vertebrate contact maps and outperforms the single prior attempt to predict budding yeast Hi-C data using information from cohesin ChIP-seq.

      *It will mostly be of interest to those in the chromosome organisation field, working in organisms or systems that do not have ctcf. *

      __Response: __

      We agree that this work will be of special interest to researchers working on chromatin organization of non-vertebrate organisms. We would reinforce that yeast are frequently used models for the study of cohesin, condensin, and chromatin folding more generally. Indeed, in the last two months alone there are two Molecular Cell papers, one Nature Genetics paper, and one Cell Reports paper where loop extrusion in yeast models is directly relevant. We also believe, however, that the model will be of interest for the field in general as it simultaneously encompasses various scenarios that may lead to slowing down or stalling of LEFs.

      This reviewer is a cell biologist working in the chromosome organisation field, but does not have modelling experience and therefore does not have the expertise to determine if the modelling part is mathematically sound and has assumed that it is.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: Yuan et al. report on their development of an analytical model ("CCLE") for loop extrusion with genomic-position-dependent speed, with the idea of accounting for barriers to loop extrusion. They write down master equations for the probabilities of cohesin occupancy at each genomic site and obtain approximate steady-state solutions. Probabilities are governed by cohesin translocation, loading, and unloading. Using ChIP-seq data as an experimental measurement of these probabilities, they numerically fit the model parameters, among which are extruder density and processivity. Gillespie simulations with these parameters combined with a 3D Gaussian polymer model were integrated to generate simulated Hi-C maps and cohesin ChIP-seq tracks, which show generally good agreement with the experimental data. The authors argue that their modeling provides evidence that loop extrusion is the primary mechanism of chromatin organization on ~10-100 kb scales in S. pombe and S. cerevisiae.

      Major comments:

      1. I am unconvinced that this analysis specifically is sufficient to demonstrate that extrusion is the primary organizer of chromatin on these scales; moreover, the need to demonstrate this is questionable, as extrusion is widely accepted, even if not universally so. How is the agreement of CCLE with experiments more demonstrative of loop extrusion than previous modeling?

      __Response: __

      We agree with the referee's statement that "loop extrusion is extrusion is widely accepted, even if not universally so". We disagree with the referee that this state of affairs means that "the need to demonstrate this (i.e. loop extrusion) is questionable". On the contrary, studies that provide further compelling evidence that cohesin-based loop extrusion is the primary organizer of chromatin, such as ours, must surely be welcomed, first, in order to persuade those who remain unconvinced by the loop extrusion mechanism in general, and, secondly, because, until the present work, quantitative models of loop extrusion, capable of reproducing Hi-C maps quantitatively, in yeasts and other non-vertebrate eukaryotes have been lacking, leaving open the question of whether loop extrusion can describe Hi-C maps beyond vertebrates. CCLE has now answered that question in the affirmative. Moreover, the existence of a robust model to predict contact maps in non-vertebrate models, which are extensively used in the pursuit of research questions in chromatin biology, will be broadly enabling to the field.

      It is fundamental that if a simple, physically-plausible model/hypothesis is able to describe experimental data quantitatively, it is indeed appropriate to ascribe considerable weight to that model/hypothesis (until additional data become available to refute the model).

      How is the agreement of CCLE with experiments more demonstrative of loop extrusion than previous modeling?

      Response:

      As noted above and in the original manuscript, we are unaware of previous quantitative modeling of cohesin-based loop extrusion and the resultant Hi-C maps in organisms that lack CTCF, namely non-vertebrate eukaryotic models such as fission yeast or budding yeast, as we apply here. As noted in the original manuscript, previous quantitative modeling of Hi-C maps based on cohesin loop extrusion and CTCF boundary elements has been convincing that loop extrusion is indeed relevant in vertebrates, but the restriction to vertebrates excludes most of the tree of life.

      Below, the referee cites two examples of loop extrusion outside of vertebrates. The one that is suggested to correspond to yeast cells (Dequeker et al. Nature 606:197 2022) actually corresponds to mouse cells, which are vertebrate cells. The other one models the Hi-C map of the prokaryote, Bacillus subtilis, based on loop extrusion of the bacterial SMC complex thought to most resemble condensin (not cohesin), subject to barriers to loop extrusion that are related to genes or involving prokaryote-specific Par proteins (Brandao et al. PNAS 116:20489 2019). We have referenced this work in the revised manuscript but would reinforce that it lacks utility in predicting the contact maps for non-vertebrate eukaryotes.

      Relatedly, similar best fit values for S. pombe and S. cerevisiae might not point to a mechanistic conclusion (same "underlying mechanism" of loop extrusion), but rather to similar properties for loop-extruding cohesins in the two species.

      Response:

      In the revised manuscript, we have replaced "suggesting that the underlying mechanism that governs loop extrusion by cohesin is identical in both species" with "suggesting loop-extruding cohesins possess similar properties in both species" (lines 367-368).

      As an alternative, could a model with variable binding probability given by ChIP-seq and an exponential loop-size distribution work equally well? The stated lack of a dependence on extrusion timescale suggests that a static looping model might succeed. If not, why not?

      Response:

      A hypothetical mechanism that generates the same instantaneous loop distributions and correlations as loop extrusion would lead to the same Hi-C map as does loop extrusion. This circumstance is not confined to CCLE, but is equally applicable to previous CTCF-based loop extrusion models. It holds because Hi-C and ChIP-seq, and therefore models that seek to describe these measurements, provide a snapshot of the chromatin configuration at one instant of time.

      We would reinforce that there is no physical basis for a diffusion capture model with an approximately-exponential loop size distributions. Nevertheless, one can reasonably ask whether a physically-sensible diffusion capture model can simultaneously match cohesin ChIP-seq and Hi-C. Motivated by the referee's comment we have addressed this question and, accordingly, in the revised manuscript, we have added (1) an entire subsection entitled "Diffusion capture does not reproduce experimental interphase S. pombe Hi-C maps" (lines 303-335) and (2) Supplementary Figure 15. As we now demonstrate, the CCLE model vastly outperforms an equilibrium binding model in reproducing the experimental Hi-C maps and measured P(s).

      *2. I do not understand how the loop extrusion residence time drops out. As I understand it, Eq 9 converts ChIP-seq to lattice site probability (involving N_{LEF}, which is related to \rho, and \rho_c). Then, Eqs. 3-4 derive site velocities V_n and U_n if we choose rho, L, and \tau, with the latter being the residence time. This parameter is not specified anywhere and is claimed to be unimportant. It may be true that the choice of timescale is arbitrary in this procedure, but can the authors please clarify? *

      __Response: __

      As noted above, Hi-C and ChIP-seq both capture chromatin configuration at one instant in time. Therefore, such measurements cannot and do not provide any time-scale information, such as the loop extrusion residence time (LEF lifetime) or the mean loop extrusion rate. For this reason, neither our CCLE simulations, nor other researchers' previous simulations of loop extrusion in vertebrates with CTCF boundary elements, provide any time-scale information, because the experiments they seek to describe do not contain time-scale information. The Hi-C map simulations can and do provide information concerning the loop size, which is the product of the loop lifetime and the loop extrusion rate. Lines 304-305 of the revised manuscript include the text: "Because Hi-C and ChIP-seq both characterize chromatin configuration at a single instant of time, and do not provide any direct time-scale information, ..."

      In practice, we set the LEF lifetime to be some explicit value with arbitrary time-unit. We have added a sentence in the Methods that reads, "In practice, however, we set the LEF dissociation rate to 5e-4 time-unit-1 (equivalent to a lifetime of 2000 time-units), and the nominal LEF extrusion rate (aka \rho*L/\tau, see Supplementary Methods) can be determined from the given processivity" (lines 599-602), to clarify this point. We have also changed the terminology from "timesteps" to "LEF events" in the manuscript as the latter is more accurate for our purpose.

      1. The assumptions in the solution and application of the CCLE model are potentially constraining to a limited number of scenarios. In particular the authors specify that current due to binding/unbinding, A_n - D_n, is small. This assumption could be problematic near loading sites (centromeres, enhancers in higher eukaryotes, etc.) (where current might be dominated by A_n and V_n), unloading sites (D_n and V_{n-1}), or strong boundaries (D_n and V_{n-1}). The latter scenario is particularly concerning because the manuscript seems to be concerned with the presence of unidentified boundaries. This is partially mitigated by the fact that the model seems to work well in the chosen examples, but the authors should discuss the limitations due to their assumptions and/or possible methods to get around these limitations.

      4. Related to the above concern, low cohesin occupancy is interpreted as a fast extrusion region and high cohesin occupancy is interpreted as a slow region. But this might not be true near cohesin loading and unloading sites.

      __Response: __

      Our response to Referee 2's Comments 3. and 4. is that both in the original manuscript and in the revised manuscript we clearly delineate the assumptions underlying CCLE and we carefully assess the extent to which these assumptions are violated (lines 123-126 and 263-279 in the revised manuscript). For example, Supplementary Figure 12 shows that across the S. pombe genome as a whole, violations of the CCLE assumptions are small. Supplementary Figure 13 shows that violations are similarly small for meiotic S. cerevisiae. However, to explicitly address the concern of the referee, we have added the following sentences to the revised manuscript:

      Lines 277-279:

      "While loop extrusion in interphase S. pombe seems to well satisfy the assumptions underlying CCLE, this may not always be the case in other organisms."

      Lines 359-361:

      "In addition, the three quantities, given by Eqs. 6, 7, and 8, are distributed around zero with relatively small fluctuations (Supplementary Fig. 13), indicating that CCLE model is self-consistent in this case also."

      In the case of mitotic S. cerevisiae, Supplementary Figure 14 shows that these quantities are small for most of genomic locations, except near the cohesin ChIP-seq peaks. We ascribe these greater violations of CCLE's assumptions at the locations of cohesin peaks in part to the low processivity of mitotic cohesin in S. cerevisiae, compared to that of meiotic S. cerevisiae and interphase S. pombe, and in part to the low CCLE loop extrusion rate at the cohesin peaks. We have added a paragraph at the end of the Section "CCLE Describes TADs and Loop Configurations in Mitotic S. cerevisiae" to reflect these observations (lines 447-461).

      1. *The mechanistic insight attempted in the discussion, specifically with regard to Mis4/Scc2/NIPBL and Pds5, is problematic. First, it is not clear how the discussion of Nipbl and Pds5 is connected to the CCLE method; the justification is that CCLE shows cohesin distribution is linked to cohesin looping, which is already a questionable statement (point 1) and doesn't really explain how the model offers new insight into existing Nipbl and Pds5 data. *

      Furthermore, I believe that the conclusions drawn on this point are flawed, or at least, stated with too much confidence. The authors raise the curious point that Nipbl ChIP-seq does not correlate well with cohesin ChIP-seq, and use this as evidence that Nipbl is not a part of the loop-extruding complex in S. pombe, and it is not essential in humans. Aside from the molecular evidence in human Nipbl/cohesin (acknowledged by authors), there are other reasons to doubt this conclusion. First, depletion of Nipbl (rather than binding partner Mau2 as in ref 55) in mouse cells strongly inhibits TAD formation (Schwarzer et al. Nature 551:51 2017). Second, at least two studies have raised concerns about Nibpl ChIP-seq results: 1) Hu et al. Nucleic Acids Res 43:e132 2015, which shows that uncalibrated ChIP-seq can obscure the signal of protein localization throughout the genome due to the inability to distinguish from background * and 2) Rhodes et al. eLife 6:e30000, which uses FRAP to show that Nipbl binds and unbinds to cohesin rapidly in human cells, which could go undetected in ChIP-seq, especially when uncalibrated. It has not been shown that these dynamics are present in yeast, but there is no reason to rule it out yet.*

      Similar types of critiques could be applied to the discussion of Pds5. There is cross-correlation between Psc3 and Pds5 in S. pombe, but the authors are unable to account for whether Pds5 binding is transient and/or necessary to loop extrusion itself or, more importantly, whether Pds5 ChIP is associated with extrusive or cohesive cohesins; cross-correlation peaks at about 0.6, but note that by the authors own estimates, cohesive cohesins are approximately half of all cohesins in S. pombe (Table 3).

      *Due to the above issues, I suggest that the authors heavily revise this discussion to better reflect the current experimental understanding and the limited ability to draw such conclusions based on the current CCLE model. *

      __Response: __

      As stated above, our study demonstrates that the CCLE approach is able to take as input cohesin (Psc3) ChIP-seq data and produce as output simulated Hi-C maps that well reproduce the experimental Hi-C maps of interphase S. pombe and meiotic S. cerevisiae. This result is evident from the multiple Hi-C comparison figures in both the original and the revised manuscripts. In light of this circumstance, the referee's statement that it is "questionable", that CCLE shows that cohesin distribution (as quantified by cohesin ChIP-seq) is linked to cohesin looping (as quantified by Hi-C), is demonstrably incorrect.

      However, we did not intend to suggest that Nipbl and Pds5 are not crucial for cohesin loading, as the reviewer states. Rather, our inquiries relate to a more nuanced question of whether these factors only reside at loading sites or, instead, remain as a more long-lived constituent component of the loop extrusion complex. We regret any confusion and have endeavored to clarify this point in the revised manuscript in response to Referee 2's Comment 5. as well as Referee 1's Minor Comment 1. We have now better explained how the CCLE model may offer new insight from existing ChIP-seq data in general and from Mis4/Nipbl and Pds5 ChIP-seq, in particular. Accordingly, we have followed Referee 2's advice to heavily revise the relevant section of the Discussion.

      To this end, we have removed the following text from the original manuscript:

      "The fact that the cohesin distribution along the chromatin is strongly linked to chromatin looping, as evident by the success of the CCLE model, allows for new insights into in vivo LEF composition and function. For example, recently, two single-molecule studies [37, 38] independently found that Nipbl, which is the mammalian analogue of Mis4, is an obligate component of the loop-extruding human cohesin complex. Ref. [37] also found that cohesin complexes containing Pds5, instead of Nipbl, are unable to extrude loops. On this basis, Ref. [32] proposed that, while Nipbl-containing cohesin is responsible for loop extrusion, Pds5-containing cohesin is responsible for sister chromatid cohesion, neatly separating cohesin's two functions according to composition. However, the success of CCLE in interphase S. pombe, together with the observation that the Mis4 ChIP-seq signal is uncorrelated with the Psc3 ChIP-seq signal (Supplementary Fig. 7) allows us to infer that Mis4 cannot be a component of loop-extruding cohesin in S. pombe. On the other hand, Pds5 is correlated with Psc3 in S. pombe (Supplementary Fig. 7) suggesting that both proteins are involved in loop-extruding cohesin, contradicting a hypothesis that Pds5 is a marker for cohesive cohesin in S. pombe. In contrast to the absence of Mis4-Psc3 correlation in S. pombe, in humans, Nipbl ChIP-seq and Smc1 ChIP-seq are correlated (Supplementary Fig. 7), consistent with Ref. [32]'s hypothesis that Nipbl can be involved in loop-extruding cohesin in humans. However, Ref. [55] showed that human Hi-C contact maps in the absence of Nipbl's binding partner, Mau2 (Ssl3 in S. pombe [56]) show clear TADs, consistent with loop extrusion, albeit with reduced long-range contacts in comparison to wild-type maps, indicating that significant loop extrusion continues in live human cells in the absence of Nipbl-Mau2 complexes. These collected observations suggest the existence of two populations of loop-extruding cohesin complexes in vivo, one that involves Nipbl-Mau2 and one that does not. Both types are present in mammals, but only Mis4-Ssl3-independent loop-extruding cohesin is present in S. pombe."

      And we have replaced it by the following text in the revised manuscript (lines 533-568):

      "As noted above, the input for our CCLE simulations of chromatin organization in S. pombe, was the ChIP-seq of Psc3, which is a component of the cohesin core complex [75]. Accordingly, Psc3 ChIP-seq represents how the cohesin core complex is distributed along the genome. In S. pombe, the other components of the cohesin core complex are Psm1, Psm3, and Rad21. Because these proteins are components of the cohesin core complex, we expect that the ChIP-seq of any of these proteins would closely match the ChIP-seq of Psc3, and would equally well serve as input for CCLE simulations of S. pombe genome organization. Supplementary Figure 20C confirms significant correlations between Psc3 and Rad21. In light of this observation, we then reason that the CCLE approach offers the opportunity to investigate whether other proteins beyond the cohesin core are constitutive components of the loop extrusion complex during the extrusion process (as opposed to cohesin loading or unloading). To elaborate, if the ChIP-seq of a non-cohesin-core protein is highly correlated with the ChIP-seq of a cohesin core protein, we can infer that the protein in question is associated with the cohesin core and therefore is a likely participant in loop-extruding cohesin, alongside the cohesin core. Conversely, if the ChIP-seq of a putative component of the loop-extruding cohesin complex is uncorrelated with the ChIP-seq of a cohesin core protein, then we can infer that the protein in question is unlikely to be a component of loop-extruding cohesin, or at most is transiently associated with it.

      For example, in S. pombe, the ChIP-seq of the cohesin regulatory protein, Pds5 [74], is correlated with the ChIP-seq of Psc3 (Supplementary Fig. 20B) and with that of Rad21 (Supplementary Fig. 20D), suggesting that Pds5 can be involved in loop-extruding cohesin in S. pombe, alongside the cohesin core proteins. Interestingly, this inference concerning fission yeast cohesin subunit, Pds5, stands in contrast to the conclusion from a recent single-molecule study [38] concerning cohesin in vertebrates. Specifically, Reference [38] found that cohesin complexes containing Pds5, instead of Nipbl, are unable to extrude loops.

      Additionally, as noted above, in S. pombe the ChIP-seq signal of the cohesin loader, Mis4, is uncorrelated with the Psc3 ChIP-seq signal (Supplementary Fig. 20A), suggesting that Mis4 is, at most, a very transient component of loop-extruding cohesin in S. pombe, consistent with its designation as a "cohesin loader". However, both References [38] and [39] found that Nipbl (counterpart of S. pombe's Mis4) is an obligate component of the loop-extruding human cohesin complex, more than just a mere cohesin loader. Although CCLE has not yet been applied to vertebrates, from a CCLE perspective, the possibility that Nipbl may be required for the loop extrusion process in humans is bolstered by the observation that in humans Nipbl ChIP-seq and Smc1 ChIP-seq show significant correlations (Supplementary Fig. 20G), consistent with Ref. [32]'s hypothesis that Nipbl is involved in loop-extruding cohesin in vertebrates. A recent theoretical model of the molecular mechanism of loop extrusion by cohesin hypothesizes that transient binding by Mis4/Nipbl is essential for permitting directional reversals and therefore for two-sided loop extrusion [41]. Surprisingly, there are significant correlations between Mis4 and Pds5 in S. pombe (Supplementary Fig. 20E), indicating Pds5-Mis4 association, outside of the cohesin core complex."

      In response to Referee 2's specific comment that "at least two studies have raised concerns about Nibpl ChIP-seq results", we note (1) that, while Hu et al. Nucleic Acids Res 43:e132 2015 present a general method for calibrating ChIP-seq results, they do not measure Mis4/Nibpl ChIP-seq, nor do they raise any specific concerns about Mis4/Nipbl ChIP-seq, and (2) that (as noted above, in response to Referee 1's comment) while the FRAP analysis presented by Rhodes et al. eLife 6:e30000 indicates that, in HeLa cells, Nipbl has a residence time bound to cohesin of about 50 seconds, nevertheless, as shown in Supplementary Fig. 20G in the revised manuscript, there is a significant cross-correlation between the Nipbl ChIP-seq and Smc1 ChIP-seq in humans, indicating that a transient association between Nipbl and cohesin is detected by ChIP-seq, the referees' concerns notwithstanding.

      We thank the referee for pointing out Schwarzer et al. Nature 551:51 2017. However, our interpretation of these data is different than the referee's. As noted in our original manuscript, Nipbl has traditionally been considered to be a cohesin loading factor. If the role of Nipbl was solely to load cohesin, then we would expect that depleting Nipbl would have a major effect on the Hi-C map, because fewer cohesins are loaded onto the chromatin. Figure 2 of Schwarzer et al. Nature 551:51 2017, shows the effect of depleting Nibpl on a vertebrate Hi-C map. Even in this case when Nibpl is absent, this figure (Figure 2 of Schwarzer et al. Nature 551:51 2017) shows that TADs persist, albeit considerably attenuated. According to the authors' own analysis associated with Fig. 2 of their paper, these attenuated TADs correspond to a smaller number of loop-extruding cohesin complexes than in the presence of Nipbl. Since Nipbl is depleted, these loop-extruding cohesins necessarily cannot contain Nipbl. Thus, the data and analysis of Schwarzer et al. Nature 551:51 2017 actually seem consistent with the existence of a population of loop-extruding cohesin complexes that do not contain Nibpl.

      Concerning the referee's comment that we cannot be sure whether Pds5 ChIP is associated with extrusive or cohesive cohesin, we note that, as explained in the manuscript, we assume that the cohesive cohesins are uniformly distributed across the genome, and therefore that peaks in the cohesin ChIP-seq are associated with loop-extruding cohesins. The success of CCLE in describing Hi-C maps justifies this assumption a posteriori. Supplementary Figure 20B shows that the ChIP-seq of Pds5 is correlated with the ChIP-seq of Psc3 in S. pombe, that is, that peaks in the ChIP-seq of Psc3, assumed to derive from loop-extruding cohesin, are accompanied by peaks in the ChIP-seq of Pds5. This is the reasoning allowing us to associate Pds5 with loop-extruding cohesin in S. pombe.

      1. I suggest that the authors recalculate correlations for Hi-C maps using maps that are rescaled by the P(s) curves. As currently computed, most of the correlation between maps could arise from the characteristic decay of P(s) rather than smaller scale features of the contact maps. This could reduce the surprising observed correlation between distinct genomic regions in pombe (which, problematically, is higher than the observed correlation between simulation and experiment in cervisiae).

      Response:

      We thank the referee for this advice. Following this advice, throughout the revised manuscript, we have replaced our original calculation of the Pearson correlation coefficient of unscaled Hi-C maps with a calculation of the Pearson correlation coefficient of rescaled Hi-C maps. Since the MPR is formed from ratios of simulated to experimental Hi-C maps, this metric is unchanged by the proposed rescaling.

      As explained in the original manuscript, we attribute the lower experiment-simulation correlation in the meiotic budding yeast Hi-C maps to the larger statistical errors of the meiotic budding yeast dataset, which arises because of its higher genomic resolution - all else being equal we can expect 25 times the counts in a 10 kb x10 kb bin as in a 2 kb x 2 kb bin. For the same reason, we expect larger statistical errors in the mitotic budding yeast dataset as well. Lower correlations for noisier data are to be expected in general.

      *7. Please explain why the difference between right and left currents at any particular site, (R_n-L_n) / Rn+Ln, should be small. It seems easy to imagine scenarios where this might not be true, such as directional barriers like CTCF or transcribed genes. *

      __Response: __

      For simplicity, the present version of CCLE sets the site-dependent loop extrusion rates by assuming that the cohesin ChIP-seq signal has equal contributions from left and right anchors. Then, we carry out our simulations which subsequently allow us to examine the simulated left and right currents and their difference at every site. The distributions of normalized left-right difference currents are shown in Supplementary Figures 12B, 13B, and 14D, for interphase S. pombe, meiotic S. cerevisiae, and mitotic S. cerevisiae, respectively. They are all centered at zero with standard deviations of 0.12, 0.16, and 0.33. Thus, it emerges from our simulations that the difference current is indeed generally small.

      8. Optional, but I think would greatly improve the manuscript, but can the authors: a) analyze regions of high cohesin occupancy (assumed to be slow extrusion regions) to determine if there's anything special in these regions, such as more transcriptional activity

      __Response: __

      In response to Referee 1's similar comment, we have calculated the correlation between the locations of convergent genes and cohesin ChIP-seq. Supplementary Figure 18A in the revised manuscript shows that for interphase S. pombe no correlations are evident, whereas for both of meiotic and mitotic S. cerevisiae, there are significant correlations between these two quantities (Supplementary Fig. 17).

      *b) apply this methodology to vertebrate cell data *

      __Response: __

      The application of CCLE to vertebrate data is outside the scope of this paper which, as we have emphasized, has the goal of developing a model that can be robustly applied to non-vertebrate eukaryotic genomes. Nevertheless, CCLE is, in principle, applicable to all organisms in which loop extrusion by SMC complexes is the primary mechanism for chromatin spatial organization.

      1. *A Github link is provided but the code is not currently available. *

      __Response: __

      The code is now available.

      Minor Comments:

      1. Please state the simulated LEF lifetime, since the statement in the methods that 15000 timesteps are needed for equilibration of the LEF model is otherwise not meaningful. Additionally, please note that backbone length is not necessarily a good measure of steady state, since the backbone can be compacted to its steady-state value while the loop distribution continues to evolve toward its steady state.

      __Response: __

      The terminology "timesteps" used in the original manuscript in fact should mean "the number of LEF events performed" in the simulation. Therefore, we have changed the terminology from "timesteps" to "LEF events".

      The choice of 15000 LEF events is empirically determined to ensure that loop extrusion steady state is achieved, for the range of parameters considered. To address the referee's concern regarding the uncertainty of achieving steady state after 15000 LEF events, we compared two loop size distributions: each distribution encompasses 1000 data points, equally separated in time, one between LEF event 15000 and 35000, and the other between LEF event 80000 and 100000. The two distributions are within-errors identical, suggesting that the loop extrusion steady state is well achieved within 15000 LEF events.

      2. How important is the cohesive cohesin parameter in the model, e.g., how good are fits with \rho_c = 0?

      __Response: __

      As stated in the original manuscript, the errors on \rho_c on the order of 10%-20% (for S. pombe). Thus, fits with \rho_c=0 are significantly poorer than with the best-fit values of \rho_c.

      *3. A nice (but non-essential) supplemental visualization might be to show a scatter of sim cohesin occupancy vs. experiment ChIP. *

      __Response: __

      We have chosen not to do this, because we judge that the manuscript is already long enough. Figures 3A, 5D, and 6C already compare the experimental and simulated ChIP-seq, and these figures already contain more information than the figures proposed by the referee.

      1. *A similar calculation of Hi-C contacts based on simulated loop extruder positions using the Gaussian chain model was previously presented in Banigan et al. eLife 9:e53558 2020, which should be cited. *

      __Response: __

      We thank the referee for pointing out this citation. We have added it to the revised manuscript.

      1. It is stated that simulation agreement with experiments for cerevisiae is worse in part due to variability in the experiments, with MPR and Pearson numbers for cerevisiae replicates computed for reference. But these numbers are difficult to interpret without, for example, similar numbers for duplicate pombe experiments. Again, these numbers should be generated using Hi-C maps scaled by P(s), especially in case there are systematic errors in one replicate vs. another.

      __Response: __

      As noted above, throughout the revised manuscript, we now give the Pearson correlation coefficients of scaled-by-P(s) Hi-C maps.

      1. *In the model section, it is stated that LEF binding probabilities are uniformly distributed. Did the authors mean the probability is uniform across the genome or that the probability at each site is a uniformly distributed random number? Please clarify, and if the latter, explain why this unconventional assumption was made. *

      __Response: __

      It is the former. We have modified the manuscript to clarify that LEFs "initially bind to empty, adjacent chromatin lattice sites with a binding probability, that is uniformly distributed across the genome." (lines 587-588).

      *7. Supplement p4 line 86 - what is meant by "processivity of loops extruded by isolated LEFs"? "size of loops extruded by..." or "processivity of isolated LEFs"? *

      __Response: __

      Here "processivity of isolated LEFs" is defined as the processivity of one LEF without the interference (blocking) from other LEFs. We have changed "processivity of loops extruded by isolated LEFs" to "processivity of isolated LEFs" for clarity.

      1. The use of parentheticals in the caption to Table 2 is a little confusing; adding a few extra words would help.

      __Response: __

      In the revised manuscript, we have added an additional sentence, and have removed the offending parentheses.

      1. *Page 12 sentence line 315-318 is difficult to understand. The barrier parameter is apparently something from ref 47 not previously described in the manuscript. *

      __Response: __

      In the revised manuscript, we have removed mention of the "barrier parameter" from the discussion.

      1. *Statement on p14 line 393-4 is false: prior LEF models have not been limited to vertebrates, and the authors have cited some of them here. There are also non-vertebrate examples with extrusion barriers: genes as boundaries to condensin in bacteria (Brandao et al. PNAS 116:20489 2019) and MCM complexes as boundaries to cohesin in yeast (Dequeker et al. Nature 606:197 2022). *

      __Response: __

      In fact, Dequeker et al. Nature 606:197 2022 concerns the role of MCM complexes in blocking cohesin loop extrusion in mouse zygotes. Mouse is a vertebrate. The sole aspect of this paper, that is associated with yeast, is the observation of cohesin blocking by the yeast MCM bound to the ARS1 replication origin site, which is inserted on a piece of lambda phage DNA. No yeast genome is used in the experiment. Therefore, the referee is mistaken to suggest that this paper models yeast genome organization.

      We thank the referee for pointing out Brandao et al. PNAS 116:20489 2019, which includes the development of a tour-de-force model of condensin-based loop extrusion in the prokaryote, Bacillus subtilis, in the presence of gene barriers to loop extrusion. To acknowledge this paper, we have changed the objectionable sentence to now read (lines 571-575):

      "... prior LEF models have been overwhelmingly limited to vertebrates, which express CTCF and where CTCF is the principal boundary element. Two exceptions, in which the LEF model was applied to non-vertebrates, are Ref. [49], discussed above, and Ref. [76] (Brandao et al.), which models the Hi-C map of the prokaryote, Bacillus subtilis, on the basis of condensin loop extrusion with gene-dependent barriers."

      *Referees cross-commenting *

      I agree with the comments of Reviewer 1, which are interesting and important points that should be addressed.

      *Reviewer #2 (Significance (Required)):

      Analytically approaching extrusion by treating cohesin translocation as a conserved current is an interesting approach to modeling and analysis of extrusion-based chromatin organization. It appears to work well as a descriptive model. But I think there are major questions concerning the mechanistic value of this model, possible applications of the model, the provided interpretations of the model and experiments, and the limitations of the model under the current assumptions. I am unconvinced that this analysis specifically is sufficient to demonstrate that extrusion is the primary organizer of chromatin on these scales; moreover, the need to demonstrate this is questionable, as extrusion is widely accepted, even if not universally so. It is also unclear that the minimal approach of the CCLE necessarily offers an improved physical basis for modeling extrusion, as compared to previous efforts such as ref 47, as claimed by the authors. There are also questions about significance due to possible limitations of the model (detailed above). Applying the CCLE model to identify barriers would be interesting, but is not attempted. Overall, the work presents a reasonable analytical model and numerical method, but until the major comments above are addressed and some reasonable application or mechanistic value or interpretation is presented, the overall significance is somewhat limited.*

      __Response: __

      We agree with the referee that analytically approaching extrusion by treating cohesin translocation as a conserved current is an interesting approach to modeling and analysis of extrusion-based chromatin organization. We also agree with the referee that it works well as a descriptive model (of Hi-C maps in S. pombe and S. cerevisiae). Obviously, we disagree with the referee's other comments. For us, being able to describe the different-appearing Hi-C maps of interphase S. pombe (Fig. 1 and Supplementary Figures 1-9), meiotic S. cerevisiae (Fig. 5) and mitotic S. cerevisiae (Fig. 6), all with a common model with just a few fitting parameters that differ between these examples, is significant and novel. The reviewer prematurely ignores the fact that there are still debates about whether "diffusion-capture"-like model is the more dominant mechanism that shape chromatin spatial organization at the TAD-scale. Many works have argued that such models could describe TAD-scale chromatin organization, as cited in the revised manuscript (Refs. [11, 14, 15, 17, 20, 22-24, 55]). However, in contrast to the poor description of the Hi-C map using diffusion capture model (as demonstrated in the revised manuscript and Supplementary Fig. 15), the excellent experiment-simulation agreement achieved by CCLE provides compelling evidence that cohesin-based loop extrusion is indeed the primary organizer of TAD-scale chromatin.

      Importantly, CCLE provides a theoretical base for how loop extrusion models can be generalized and applied to organisms without known loop extrusion barriers. Our model also highlights that (and provides means to account for) distributed barriers that impede but do not strictly block LEFs could also impact chromatin configurations. This case might be of importance to organisms with CTCF motifs that infrequently coincide with TAD boundaries, for instance, in the case of Drosophila melanogaster. Moreover, CCLE promises theoretical descriptions of the Hi-C maps of other non-vertebrates in the future, extending the quantitative application of the LEF model across the tree of life. This too would be highly significant if successful.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Yuan et al. report on their development of an analytical model ("CCLE") for loop extrusion with genomic-position-dependent speed, with the idea of accounting for barriers to loop extrusion. They write down master equations for the probabilities of cohesin occupancy at each genomic site and obtain approximate steady-state solutions. Probabilities are governed by cohesin translocation, loading, and unloading. Using ChIP-seq data as an experimental measurement of these probabilities, they numerically fit the model parameters, among which are extruder density and processivity. Gillespie simulations with these parameters combined with a 3D Gaussian polymer model were integrated to generate simulated Hi-C maps and cohesin ChIP-seq tracks, which show generally good agreement with the experimental data. The authors argue that their modeling provides evidence that loop extrusion is the primary mechanism of chromatin organization on ~10-100 kb scales in S. pombe and S. cerevisiae.

      Major comments:

      1. I am unconvinced that this analysis specifically is sufficient to demonstrate that extrusion is the primary organizer of chromatin on these scales; moreover, the need to demonstrate this is questionable, as extrusion is widely accepted, even if not universally so. How is the agreement of CCLE with experiments more demonstrative of loop extrusion than previous modeling? Relatedly, similar best fit values for S. pombe and S. cerevisiae might not point to a mechanistic conclusion (same "underlying mechanism" of loop extrusion), but rather to similar properties for loop-extruding cohesins in the two species. As an alternative, could a model with variable binding probability given by ChIP-seq and an exponential loop-size distribution work equally well? The stated lack of a dependence on extrusion timescale suggests that a static looping model might succeed. If not, why not?
      2. I do not understand how the loop extrusion residence time drops out. As I understand it, Eq 9 converts ChIP-seq to lattice site probability (involving N_{LEF}, which is related to \rho, and \rho_c). Then, Eqs. 3-4 derive site velocities V_n and U_n if we choose rho, L, and \tau, with the latter being the residence time. This parameter is not specified anywhere and is claimed to be unimportant. It may be true that the choice of timescale is arbitrary in this procedure, but can the authors please clarify?
      3. The assumptions in the solution and application of the CCLE model are potentially constraining to a limited number of scenarios. In particular the authors specify that current due to binding/unbinding, A_n - D_n, is small. This assumption could be problematic near loading sites (centromeres, enhancers in higher eukaryotes, etc.) (where current might be dominated by A_n and V_n), unloading sites (D_n and V_{n-1}), or strong boundaries (D_n and V_{n-1}). The latter scenario is particularly concerning because the manuscript seems to be concerned with the presence of unidentified boundaries. This is partially mitigated by the fact that the model seems to work well in the chosen examples, but the authors should discuss the limitations due to their assumptions and/or possible methods to get around these limitations.
      4. Related to the above concern, low cohesin occupancy is interpreted as a fast extrusion region and high cohesin occupancy is interpreted as a slow region. But this might not be true near cohesin loading and unloading sites.
      5. The mechanistic insight attempted in the discussion, specifically with regard to Mis4/Scc2/NIPBL and Pds5, is problematic. First, it is not clear how the discussion of Nipbl and Pds5 is connected to the CCLE method; the justification is that CCLE shows cohesin distribution is linked to cohesin looping, which is already a questionable statement (point 1) and doesn't really explain how the model offers new insight into existing Nipbl and Pds5 data.

      Furthermore, I believe that the conclusions drawn on this point are flawed, or at least, stated with too much confidence. The authors raise the curious point that Nipbl ChIP-seq does not correlate well with cohesin ChIP-seq, and use this as evidence that Nipbl is not a part of the loop-extruding complex in S. pombe, and it is not essential in humans. Aside from the molecular evidence in human Nipbl/cohesin (acknowledged by authors), there are other reasons to doubt this conclusion. First, depletion of Nipbl (rather than binding partner Mau2 as in ref 55) in mouse cells strongly inhibits TAD formation (Schwarzer et al. Nature 551:51 2017). Second, at least two studies have raised concerns about Nibpl ChIP-seq results: 1) Hu et al. Nucleic Acids Res 43:e132 2015, which shows that uncalibrated ChIP-seq can obscure the signal of protein localization throughout the genome due to the inability to distinguish from background and 2) Rhodes et al. eLife 6:e30000, which uses FRAP to show that Nipbl binds and unbinds to cohesin rapidly in human cells, which could go undetected in ChIP-seq, especially when uncalibrated. It has not been shown that these dynamics are present in yeast, but there is no reason to rule it out yet.

      Similar types of critiques could be applied to the discussion of Pds5. There is cross-correlation between Psc3 and Pds5 in S. pombe, but the authors are unable to account for whether Pds5 binding is transient and/or necessary to loop extrusion itself or, more importantly, whether Pds5 ChIP is associated with extrusive or cohesive cohesins; cross-correlation peaks at about 0.6, but note that by the authors own estimates, cohesive cohesins are approximately half of all cohesins in S. pombe (Table 3).

      Due to the above issues, I suggest that the authors heavily revise this discussion to better reflect the current experimental understanding and the limited ability to draw such conclusions based on the current CCLE model. 6. I suggest that the authors recalculate correlations for Hi-C maps using maps that are rescaled by the P(s) curves. As currently computed, most of the correlation between maps could arise from the characteristic decay of P(s) rather than smaller scale features of the contact maps. This could reduce the surprising observed correlation between distinct genomic regions in pombe (which, problematically, is higher than the observed correlation between simulation and experiment in cervisiae). 7. Please explain why the difference between right and left currents at any particular site, (R_n-L_n) / Rn+Ln, should be small. It seems easy to imagine scenarios where this might not be true, such as directional barriers like CTCF or transcribed genes. 8. Optional, but I think would greatly improve the manuscript, but can the authors: a) analyze regions of high cohesin occupancy (assumed to be slow extrusion regions) to determine if there's anything special in these regions, such as more transcriptional activity

      b) apply this methodology to vertebrate cell data 9. A Github link is provided but the code is not currently available.

      Minor Comments:

      1. Please state the simulated LEF lifetime, since the statement in the methods that 15000 timesteps are needed for equilibration of the LEF model is otherwise not meaningful. Additionally, please note that backbone length is not necessarily a good measure of steady state, since the backbone can be compacted to its steady-state value while the loop distribution continues to evolve toward its steady state.
      2. How important is the cohesive cohesin parameter in the model, e.g., how good are fits with \rho_c = 0?
      3. A nice (but non-essential) supplemental visualization might be to show a scatter of sim cohesin occupancy vs. experiment ChIP.
      4. A similar calculation of Hi-C contacts based on simulated loop extruder positions using the Gaussian chain model was previously presented in Banigan et al. eLife 9:e53558 2020, which should be cited.
      5. It is stated that simulation agreement with experiments for cerevisiae is worse in part due to variability in the experiments, with MPR and Pearson numbers for cerevisiae replicates computed for reference. But these numbers are difficult to interpret without, for example, similar numbers for duplicate pombe experiments. Again, these numbers should be generated using Hi-C maps scaled by P(s), especially in case there are systematic errors in one replicate vs. another.
      6. In the model section, it is stated that LEF binding probabilities are uniformly distributed. Did the authors mean the probability is uniform across the genome or that the probability at each site is a uniformly distributed random number? Please clarify, and if the latter, explain why this unconventional assumption was made.
      7. Supplement p4 line 86 - what is meant by "processivity of loops extruded by isolated LEFs"? "size of loops extruded by..." or "processivity of isolated LEFs"?
      8. The use of parentheticals in the caption to Table 2 is a little confusing; adding a few extra words would help.
      9. Page 12 sentence line 315-318 is difficult to understand. The barrier parameter is apparently something from ref 47 not previously described in the manuscript.
      10. Statement on p14 line 393-4 is false: prior LEF models have not been limited to vertebrates, and the authors have cited some of them here. There are also non-vertebrate examples with extrusion barriers: genes as boundaries to condensin in bacteria (Brandao et al. PNAS 116:20489 2019) and MCM complexes as boundaries to cohesin in yeast (Dequeker et al. Nature 606:197 2022).

      Referees cross-commenting

      I agree with the comments of Reviewer 1, which are interesting and important points that should be addressed.

      Significance

      Analytically approaching extrusion by treating cohesin translocation as a conserved current is an interesting approach to modeling and analysis of extrusion-based chromatin organization. It appears to work well as a descriptive model. But I think there are major questions concerning the mechanistic value of this model, possible applications of the model, the provided interpretations of the model and experiments, and the limitations of the model under the current assumptions. I am unconvinced that this analysis specifically is sufficient to demonstrate that extrusion is the primary organizer of chromatin on these scales; moreover, the need to demonstrate this is questionable, as extrusion is widely accepted, even if not universally so. It is also unclear that the minimal approach of the CCLE necessarily offers an improved physical basis for modeling extrusion, as compared to previous efforts such as ref 47, as claimed by the authors. There are also questions about significance due to possible limitations of the model (detailed above). Applying the CCLE model to identify barriers would be interesting, but is not attempted. Overall, the work presents a reasonable analytical model and numerical method, but until the major comments above are addressed and some reasonable application or mechanistic value or interpretation is presented, the overall significance is somewhat limited.

    1. Author response:

      [The following is the authors’ response to the current reviews.]

      In response to Reviewer #2, we agree with the reviewer that it needs to be noted that not all forms of recognition are the same and have added the following: "However, we note that not all forms of recognition are the same; researchers may prefer to have their work featured instead of personal stories or critiques of the scientific environment."


      [The following is the authors’ response to the previous reviews.]

      We thank both reviewers for their detailed comments and insightful suggestions. Below we summarize our responses to each concern in addition to the edits within the manuscript.

      We would also like to add a clarification to the eLife assessment, it states “This important bibliometric analysis shows that authors of scientific papers whose names suggest they are female or East Asian get quoted less often in news stories about their work.” We show that individuals with names predicted to be from women or East Asian name origins are less likely to be quoted or mentioned in Nature’s scientific news stories than expected by publication demographics. In this study, we did not compare the level of coverage of a scientific article by the demographics of the authors of the article.

      Reviewer #1

      The article is not so clearly structured, which makes it hard to follow. A better framing, contextualization, and conceptualization of their analysis would help the readers to better understand the results. There are some unclear definitions and wrong wording of key concepts.

      We have adapted our wording in the text and added a more detailed discussion which hopefully makes the paper easier to comprehend. These changes are described in the context of your reviewer's suggestions and addressed in the next section.

      Language use: Male/Female refers to sex, not to gender.

      We have now updated the language throughout the text. Thank you for pointing this out.

      Regional disparities are not the same as names' origin. While the first might relate to the academic origin of authors, inferred from their institutional belonging, the latter reflects the authors' inferred identity. Ethnic identities and the construction of prejudice against specific populations need proper contextualization.

      We have added better contextualization in the manuscript and reworded the section in our results and discussion to clarify that we are analyzing disparities related to perceived ethnicity and not regions. We also added the following text to the results section “In our analysis, we use name origin as an estimate for the perceived ethnicity of a primary source by a journalist. Our prediction is not intended to assign ethnicity to an individual, but to be used broadly as a tool to quantify representational differences in a journalist's sociologically constructed perception of a primary source's ethnicity.” We also added the following text to our Discussion: “Our use of name origins is a proxy for a journalist's or referring scholarly peer’s potential perceptions of the ethnicity of a primary source as signaled by an individual's name. We do not intend to assign an identity to an individual, but to generate a broad metric to measure possible bias for particular ethnicities during journalists' primary source gathering.”

      It would be helpful to have a clear definition of what are quotes, mentions, and citations. For me, it was not so clear and made understanding the results more difficult.

      We added the following text to the results section Extracted Data Used for Analysis: “Quoted names are any names that were attached to a quote within the article. Mentioned names are any names that were stated within the article. Cited names are all author names of a scientific paper that was cited in the news article.”

      The comparison against Nature published research articles is not perfect because journalists will also cover articles not published in Nature. If for example, the gender representation in the quoted articles is not the same between Nature journals and other journals, then this source of inequality would be missing (e.g. if the journalists are biased against women, but not as much when they published in Nature, because they are also biased towards Nature articles). Also, the gender representation among Nature authors could not be the same as in general. Nevertheless, this seems to be a fair benchmark, especially if the authors did not have access to other more comprehensive databases. But a statement of limitations including these potential issues would be good to have.

      To add better context to the generalizability of our work, we added the following text to our discussion: “Furthermore, the news articles present on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership.”

      "we select the highest probability origin for each name as the resultant assignment". Threshold based approaches for race/ethnicity name-based inference have been criticized by the literature as they might reproduce biases (see Kozlowski, D., Murray, D. S., Bell, A., Hulsey, W., Larivière, V., Monroe-White, T., & Sugimoto, C. R. (2022). Avoiding bias when inferring race using name-based approaches. Plos one, 17(3), e0264270.). The authors could use the full distribution of probabilities over names instead of selecting one. The formulae proposed (3-5) could be easily adapted to this change.

      We thank the author for pointing this out. We have updated our analysis to use the probabilities instead of hard assignments. Figure 3 and formulae 3-5 have been updated. While we observe a slight shift in the calculated values, the overall trends are unchanged.

      Is it possible to make an analysis that intersects both name origin and gender? I am not sure if the sample size would allow for this, but if some other dimensions were collapsed, it would be very important to show what happens at the intersection of these two dimensions of discrimination.

      We agree that identifying any differences in quotation patterns at the intersection of gender and name origin would be very useful to identify. To address this, we added supplemental table 5. This table identifies the number of quotes per predicted name origin and gender over all years and article types. In this table, we don’t see a significant difference in gender distribution across predicted name origins.

      Given a larger sample size, we would be able to better identify more subtle differences, but at this sample size, we cannot make more detailed inferences. Additionally, this also addresses a QC-issue, where predicted gender accuracy varies by name origin, specifically East Asian name origin. From our data, we don’t see a large difference in proportions across any name origin. We added the following text to the results section to incorporate this analysis:

      “However, it should be noted that the error rate varies by name origin with the largest decrease in performance on names with an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]

      . In our analysis, we did not observe a large difference in names predicted to come from a man or woman between predicted East Asian and other name origins (Table 5). “

      The use of vocabulary should be more homogeneous. For example, in page 13 the authors start to use the concepts of over/under enrichment, which appeared before in a title but was not used.

      The text has been updated to remove all mentions of “over/under enrichment” with “over/under representation”

      In the discussions section, it would be important to see as a statement of limitations the problems that automatic origin and gender inference have.

      We thank the reviewer for this suggestion. We have added the following paragraph to our discussion.

      Computational tools enabled us to automatically analyze thousands of articles to identify existing disparities by gender and name origin, but these tools are not without limitations. Our tools are unable to identify non-binary people and rely on gender predictors that are known to have region-specific biases, with the largest decrease in performance on names of an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]. Furthermore, name origin is only a proxy for externally perceived racial or ethnic origins of a source or author and is not as accurate as self-identified race or ethnicity. Self-identification better captures the lived experience of an individual that computational estimates from a name can not capture. This is highlighted in our inability to distinguish between Black and White people from the US by their names. As the collection of demographic data by publication outlets grows, we believe this will enable a more fine-grained and accurate analysis of disparities in scientific journalism.

      Figures 2a and 3a show that the affiliations of authors and their countries was going to be used in this analysis. Yet, this section is not present in the article. I would encourage the authors to add this to the analysis as it would show important patterns, and to intersect the dimensions of gender, name origin and country.

      We were interested in using this analysis in our work, but unfortunately the sample size of cited works in each country was too small to make inferences. If this work was extended to larger scientific outlets to include larger corpora such as The Guardian or New York Times, we think one could be able to make more robust inferences. Since our work only focuses on Nature, we decided not to include this analysis. However, we do include a section in our discussion for future work.

      “As a proxy for measuring possible geographical bias of a journalist, we attempted to identify if there was any geographical bias of cited authors. To do this, we identified the affiliation of each cited author and identified their affiliated country. Unfortunately, we could not robustly extract a large enough number of cited authors from different countries to make any conclusive statements. Expanding our work to other science journalism outlets could help identify possible ways in which geographic region, genders, and perceived ethnicity interact and affect scientific visibility of specific groups. While we are unable to identify that journalists have a specific geographical bias, having reporters explicitly focused on specific regional sources will broaden coverage of international opinions in science.”

      It is not clear at that point what column dependence means.

      The abstract has been updated to state, “Gender disparity in Nature quotes was dependent on the article type.”

      Reviewer #2

      We thank the reviewer for their very detailed and insightful suggestions regarding our analysis and the key caveats that needed better contextualization in our analysis. We went through each major point the reviewer brought up below and included any additional text that was needed.

      In some cases, the manuscript lacks consistency in terminology, and uses word choice that is strange (e.g., "enrichment" and "depletion" when discussion representation).

      We thank the review for pointing this out, we have removed all instances of depletion/enrichment for over/under-representation

      Caveats to Claim 1. So while Claim 1 holds, it does not hold for all comparator sets and for all years. I don't think this is critical of the paper-the authors do discuss the trend in Claim 2-but interpretation of this claim should take care of these caveats, and readers should consider the important differences in first and last authorship.

      We thank the reviewer for their detailed feedback on this section. We have added the missing contextualization of our results. In the results section, I changed the figure caption to: “Speakers predicted to be men are sometimes overrepresented in quotes, but this depends on the year and article type.” Added the following paragraph “When considering the relative proportion of authors and speakers predicted to be men, we only find a slight over-representation of men. This overrepresentation is dependent on the authorship position and the year. Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      Generalizability to other contexts of science journalism:

      We thank the reviewer for their feedback on the generalizability of our work. We have now added the following text to our discussion to provide the reader with a better context of our results: “To articles presented on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found very similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The

      Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership. ”

      Shallow discussion:

      The authors highlight gender parity in career features, but why exactly is there gender parity in this format

      We thank the reviewer for encouraging us to better contextualize our findings in the broader discourse. We have now added several sections to our Discussion. To address gender parity, we have added the following text: “This finding, coupled with the near equal number of articles written by journalists predicted to be men or women, argues for more diversity in topical coverage. "Career Feature" articles highlight current topics relevant to working scientists and frequently highlight systemic issues with the scientific environment. This column allows space for marginalized people to critique the current state of affairs in science or share their personal stories. This type of content encourages the journalist to seek out a diverse set of primary sources. Including more content that is not primarily focused on recent publications, but all topics surrounding the practice of science, can serve as an additional tool to rapidly achieve gender parity in journalistic recognition.”

      Representation in quotations varies by first and last author, most certainly as a result of the academic division of labor in the life sciences. However, what does it say about the scientific quotation that it appears first authors are more often to be quoted? Does this mean that the division of labor is changing such that the first authors are the lead scientists? Or does it imply that senior authors are being skipped over, or giving away their chance to comment on a study to the first author?

      We thank the reviewer for asking bringing up these important questions. We have added better context to our first author analysis in our discussion. We have included the following two sections to address this. Also, we want to state that we find last authors to be slightly more quoted than first authors, as depicted in Fig. 2d., with first author quotation percentage largely appearing below the red line. We included this text in a response above and include it again here for convenience.

      “Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins.

      Furthermore, we see that the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      What might be the downstream impacts on the public stemming from the under-representation of scientists with East Asian names? According to Figure 3d, not only are East Asian names under-represented in quotations, but they are becoming more under-represented over time as they appear as authors in a greater number of Nature publications; Those with European names are proportionately represented in quotations given their share of authors in Nature. Why might this be, especially seeing as Anglo names are heavily over-represented?

      To address this point, we have added the following text to our discussion: “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins. Furthermore, the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      I am very confused by Figure 1B. It mixes the counts of News-related items with (non-Springer) research articles in a single stacked bar plot which makes determining the quantity of either difficult. I would advise splitting them out

      Figure 1B has been updated, and the News and Research articles have been separated.

      When querying the first 2000 or so results from the SpringerNature API, are the authors certain that they are getting a random sample of papers?

      These papers were the first 200 English language "Journal" papers returned by the Springer Nature API for each month, resulting in 2400 papers per year from 2005 through 2020. These papers are the first 200 papers published each month by a Springer Nature journal, which may not be completely random, but we believe to be a reasonably representative sample. Furthermore, the Springer Nature comparator set is being used as an additional comparator to the complete set of all Nature research papers used in our analyses.

      In all figures: the authors use capital letters to indicate panels in the caption, but lowercase letters in the figure itself and in the main text. This should be made consistent.

      This has been updated.

      In all figures: the authors should make the caption letter bold in the figure captions, which makes it much easier to find descriptions of specific panels

      This has been updated.

      In the section "coreNLP": the authors mention "co-reference resolution" but without really remarking why it is being used. This is an issue throughout the methods-the authors describe what method they are using but either they don't mention why they are using that method until later, or else not at all.

      We have added better reasoning behind our coreNLP selected methods: “We used the standard set of annotaters: tokenize, ssplit, pos, lemma, ner, parse, coref, and additionally the quote annotator. These perform text tokenization, sentence splitting, part of speech recognition, lemmatization, named entity recoginition, division of sentences into constituent phrases, co-reference resolution, and identification of quoted entities, respectively. We used the "statistical" algorithm to perform coreference resolution for speed. Each of these aspects is required to identify the names of quoted or mentioned speakers and identify any of their associated pronouns. All results were output to json format for further downstream processing.”

      We included a better description of scrapy: “Scrapy is a tool that applies user-defined rules to follow hyperlinks on webpages and return the information contained on each webpage.

      We used Scrapy to extract all web pages containing news articles and extract the text.”

      We also included our motivation for bootstrapping: “We used the boostrap method to construct confidence intervals for each of our calculated statistics.”

      In the section "Name Formatting for Gender Prediction in Quotes or Mentions", genderizeR is mentioned before an introduction to the tool

      We added the following text to provide context: “Even though genderizeR, the computational method used to predict the name's gender, only uses the first name to make the gender prediction, identifying the full name gives us greater confidence that we correctly identified the first name. “

      In the section "Name Formatting for Gender Prediction of Authors", you state that you exclude papers with only one author. How many papers is this? I assume few, in Nature, but if not I can imagine gender differences based on who writes first-authored papers.

      We find that the number excluded is roughly 7% of all papers, which is consistent across Nature and Springer Nature (1113/15013 for cited springer articles, 2899/42155 for random springer articles, 955/12459 for nature authors). We have added the following text to the manuscript for better context: “Roughly 7% of all papers were estimated to be by a single author and removed from this analysis.: 1113/15013 for cited Springer articles, 2899/42155 for random Springer articles, 955/12459 for Nature research articles.”

      In "Name Origin Analysis", for the in-text reference to Equation 3: include the prefix "Eq." or similar to mark this as referencing the equation and not something else

      This has been updated.

      The use of the word "enrichment" in reference to the representation of East Asian authors is strange and does not fit the colloquial definition of the term. I suggest just using a simpler term like "representation" instead

      Similarly, the authors use the word "depletion" to reflect the lower rate of quotes to scientists with East-Asian names, but I feel a simpler word would be more appropriate.

      We thank the reviewer for this suggestion, all instances of “enrichment/depletion” have been replaced with “over/under representation”

      The authors claim in Figure 2d that there is a steady increase in the rate of first author citations, however, this graph is not convincing. It appears to show much more noise than anything resembling a steady change.

      We have reworded our figure description to state that there is a consistent bias towards quoting last authors. Our figure description now states: “Panel d shows a consistent but slight bias towards quoting the last author of a cited article than the first author over time.”

      Supplemental Figures 1b and 1c do not seem to be mentioned in the main text, and I struggle to see their relevance.

      We thank the reviewer for identifying this error; these subpanels have been removed.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      I have trialled the package on my lab's data and it works as advertised. It was straightforward to use and did not require any special training. I am confident this is a tool that will be approachable even to users with limited computational experience. The use of artificial data to validate the approach - and to provide clear limits on applicability - is particularly helpful.

      The main limitation of the tool is that it requires the user to manually select regions. This somewhat limits the generalisability and is also more subjective - users can easily choose "nice" regions that better match with their hypothesis, rather than quantifying the data in an unbiased manner. However, given the inherent challenges in quantifying biological data, such problems are not easily circumventable.

      *

      * I have some comments to clarify the manuscript:

      1. A "straightforward installation" is mentioned. Given this is a Method paper, the means of installation should be clearly laid out.*

      __This sentence is now modified. In the revised manuscript we now describe how to install the toolset and we give the link to the toolset website if further information is needed. __On this website, we provide a full video tutorial and a user manual. The user manual is provided as a supplementary material of the manuscript.

      * It would be helpful if there was an option to generate an output with the regions analysed (i.e., a JPG image with the data and the drawn line(s) on top). There are two reasons for this: i) A major problem with user-driven quantification is accidental double counting of regions (e.g., a user quantifies a part of an image and then later quantifies the same region). ii) Allows other users to independently verify measurements at a later time.*

      We agree that it is helpful to save the analyzed regions. To answer this comment and the other two reviewers' comments pointing at a similar feature, we have now included an automatic saving of the regions of interest. The user will be able to reopen saved regions of interest using a new function we included in the new version of PatternJ.

      * 3. Related to the above point, it is highlighted that each time point would need to be analysed separately (line 361-362). It seems like it should be relatively straightforward to allow a function where the analysis line can be mapped onto the next time point. The user could then adjust slightly for changes in position, but still be starting from near the previous timepoint. Given how prevalent timelapse imaging is, this seems like (or something similar) a clear benefit to add to the software.*

      We agree that the analysis of time series images can be a useful addition. We have added the analysis of time-lapse series in the new version of PatternJ. The principles behind the analysis of time-lapse series and an example of such analysis are provided in Figure 1 - figure supplement 3 and Figure 5, with accompanying text lines 140-153 and 360-372. The analysis includes a semi-automated selection of regions of interest, which will make the analysis of such sequences more straightforward than having to draw a selection on each image of the series. The user is required to draw at least two regions of interest in two different frames, and the algorithm will automatically generate regions of interest in frames in which selections were not drawn. The algorithm generates the analysis immediately after selections are drawn by the user, which includes the tracking of the reference channel.

      * Line 134-135. The level of accuracy of the searching should be clarified here. This is discussed later in the manuscript, but it would be helpful to give readers an idea at this point what level of tolerance the software has to noise and aperiodicity.

      *

      We agree with the reviewer that a clarification of this part of the algorithm will help the user better understand the manuscript.__ We have modified the sentence to clarify the range of search used and the resulting limits in aperiodicity (now lines 176-181). __Regarding the tolerance to noise, it is difficult to estimate it a priori from the choice made at the algorithm stage, so we prefer to leave it to the validation part of the manuscript. We hope this solution satisfies the reviewer and future users.

      *

      **Referees cross-commenting**

      I think the other reviewer comments are very pertinent. The authors have a fair bit to do, but they are reasonable requests. So, they should be encouraged to do the revisions fully so that the final software tool is as useful as possible.

      Reviewer #1 (Significance (Required)):

      Developing software tools for quantifying biological data that are approachable for a wide range of users remains a longstanding challenge. This challenge is due to: (1) the inherent problem of variability in biological systems; (2) the complexity of defining clearly quantifiable measurables; and (3) the broad spread of computational skills amongst likely users of such software.

      In this work, Blin et al., develop a simple plugin for ImageJ designed to quickly and easily quantify regular repeating units within biological systems - e.g., muscle fibre structure. They clearly and fairly discuss existing tools, with their pros and cons. The motivation for PatternJ is properly justified (which is sadly not always the case with such software tools).

      Overall, the paper is well written and accessible. The tool has limitations but it is clearly useful and easy to use. Therefore, this work is publishable with only minor corrections.

      *We thank the reviewer for the positive evaluation of PatternJ and for pointing out its accessibility to the users.

      *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      # Summary

      The authors present an ImageJ Macro GUI tool set for the quantification of one-dimensional repeated patterns that are commonly occurring in microscopy images of muscles.

      # Major comments

      In our view the article and also software could be improved in terms of defining the scope of its applicability and user-ship. In many parts the article and software suggest that general biological patterns can be analysed, but then in other parts very specific muscle actin wordings are used. We are pointing this out in the "Minor comments" sections below. We feel that the authors could improve their work by making a clear choice here. One option would be to clearly limit the scope of the tool to the analysis of actin structures in muscles. In this case we would recommend to also rename the tool, e.g. MusclePatternJ. The other option would be to make the tool about the generic analysis of one-dimensional patterns, maybe calling the tool LinePatternJ. In the latter case we would recommend to remove all actin specific wordings from the macro tool set and also the article should be in parts slightly re-written.

      *

      We agree with the reviewer that our initial manuscript used a mix of general and muscle-oriented vocabulary, which could make the use of PatternJ confusing especially outside of the muscle field. To make PatternJ useful for the largest community, we corrected the manuscript and the PatternJ toolset to provide the general vocabulary needed to make it understandable for every biologist. We modified the manuscript accordingly.

      * # Minor/detailed comments

      # Software

      We recommend considering the following suggestions for improving the software.

      ## File and folder selection dialogs

      In general, clicking on many of the buttons just opens up a file-browser dialog without any further information. For novel users it is not clear what the tool expects one to select here. It would be very good if the software could be rewritten such that there are always clear instructions displayed about which file or folder one should open for the different buttons.*

      We experienced with the current version of macOS that the file-browser dialog does not display any message; we suspect this is the issue raised by the reviewer. This is a known issue of Fiji on Mac and all applications on Mac since 2016. We provided guidelines in the user manual and on the tutorial video to correct this issue by changing a parameter in Fiji. Given the issues the reviewer had accessing the material on the PatternJ website, which we apologize for, we understand the issue raised. We added an extra warning on the PatternJ website to point at this problem and its solution. Additionally, we have limited the file-browser dialog appearance to what we thought was strictly necessary. Thus, the user will experience fewer prompts, speeding up the analysis.

      *

      ## Extract button

      The tool asks one to specify things like whether selections are drawn "M-line-to-M-line"; for users that are not experts in muscle morphology this is not understandable. It would be great to find more generally applicable formulations. *

      We agree that this muscle-oriented vocabulary can make the use of PatternJ confusing. We have now corrected the user interface to provide both general and muscle-specific vocabulary ("center-to-center or edge-to-edge (M-line-to-M-line or Z-disc-to-Z-disc)").*

      ## Manual selection accuracy

      The 1st step of the analysis is always to start from a user hand-drawn profile across intensity patterns in the image. However, this step can cause inaccuracy that varies with the shape and curve of the line profile drawn. If not strictly perpendicular to for example the M line patterns, the distance between intensity peaks will be different. This will be more problematic when dealing with non-straight and parallelly poised features in the image. If the structure is bended with a curve, the line drawn over it also needs to reproduce this curve, to precisely capture the intensity pattern. I found this limits the reproducibility and easy-usability of the software.*

      We understand the concern of the reviewer. On curved selections this will be an issue that is difficult to solve, especially on "S" curved or more complex selections. The user will have to be very careful in these situations. On non-curved samples, the issue may be concerning at first sight, but the errors go with the inverse of cosine and are therefore rather low. For example, if the user creates a selection off by 5 degrees, which is visually obvious, lengths will be affected by an increase of only 0.38%. The point raised by the reviewer is important to discuss, and we therefore added a paragraph to comment on the choice of selection (lines 94-98) and a supplementary figure to help make it clear (Figure 1 - figure supplement 1).*

      ### Reproducibility

      Since the line profile drawn on the image is the first step and very essential to the entire process, it should be considered to save together with the analysis result. For example, as ImageJ ROI or ROIset files that can be re-imported, correctly positioned, and visualized in the measured images. This would greatly improve the reproducibility of the proposed workflow. In the manuscript, only the extracted features are being saved (because the save button is also just asking for a folder containing images, so I cannot verify its functionality). *

      We agree that this is a very useful and important feature. We have added ROI automatic saving. Additionally, we now provide a simplified import function of all ROIs generated with PatternJ and the automated extraction and analysis of the list of ROIs. This can be done from ROIs generated previously in PatternJ or with ROIs generated from other ImageJ/Fiji algorithms. These new features are described in the manuscript in lines 120-121 and 130-132.

      *

      ## ? button

      It would be great if that button would open up some usage instructions.

      *

      We agree with the reviewer that the "?" button can be used in a better way. We have replaced this button with a Help menu, including a simple tutorial showing a series of images detailing the steps to follow by the user, a link to the user website, and a link to our video tutorial.

      * ## Easy improvement of workflow

      I would suggest a reasonable expansion of the current workflow, by fitting and displaying 2D lines to the band or line structure in the image, that form the "patterns" the author aims to address. Thus, it extracts geometry models from the image, and the inter-line distance, and even the curve formed by these sets of lines can be further analyzed and studied. These fitted 2D lines can be also well integrated into ImageJ as Line ROI, and thus be saved, imported back, and checked or being further modified. I think this can largely increase the usefulness and reproducibility of the software.

      *

      We hope that we understood this comment correctly. We had sent a clarification request to the editor, but unfortunately did not receive an answer within the requested 4 weeks of this revision. We understood the following: instead of using our 1D approach, in which we extract positions from a profile, the reviewer suggests extracting the positions of features not as a single point, but as a series of coordinates defining its shape. If this is the case, this is a major modification of the tool that is beyond the scope of PatternJ. We believe that keeping our tool simple, makes it robust. This is the major strength of PatternJ. Local fitting will not use line average for instance, which would make the tool less reliable.

      * # Manuscript

      We recommend considering the following suggestions for improving the manuscript. Abstract: The abstract suggests that general patterns can be quantified, however the actual tool quantifies specific subtypes of one-dimensional patterns. We recommend adapting the abstract accordingly.

      *

      We modified the abstract to make this point clearer.

      * Line 58: Gray-level co-occurrence matrix (GLCM) based feature extraction and analysis approach is not mentioned nor compared. At least there's a relatively recent study on Sarcomeres structure based on GLCM feature extraction: https://github.com/steinjm/SotaTool with publication: *https://doi.org/10.1002/cpz1.462

      • *

      We thank the reviewer for making us aware of this publication. We cite it now and have added it to our comparison of available approaches.

      * Line 75: "...these simple geometrical features will address most quantitative needs..." We feel that this may be an overstatement, e.g. we can imagine that there should be many relevant two-dimensional patterns in biology?!*

      We have modified this sentence to avoid potential confusion (lines 76-77).

      • *

      • Line 83: "After a straightforward installation by the user, ...". We think it would be convenient to add the installation steps at this place into the manuscript. *

      __This sentence is now modified. We now mention how to install the toolset and we provide the link to the toolset website, if further information is needed (lines 86-88). __On the website, we provide a full video tutorial and a user manual.

      * Line 87: "Multicolor images will give a graph with one profile per color." The 'Multicolor images' here should be more precisely stated as "multi-channel" images. Multi-color images could be confused with RGB images which will be treated as 8-bit gray value (type conversion first) images by profile plot in ImageJ. *

      We agree with the reviewer that this could create some confusion. We modified "multicolor" to "multi-channel".

      * Line 92: "...such as individual bands, blocks, or sarcomeric actin...". While bands and blocks are generic pattern terms, the biological term "sarcomeric actin" does not seem to fit in this list. Could a more generic wording be found, such as "block with spike"? *

      We agree with the reviewer that "sarcomeric actin" alone will not be clear to all readers. We modified the text to "block with a central band, as often observed in the muscle field for sarcomeric actin" (lines 103-104). The toolset was modified accordingly.

      * Line 95: "the algorithm defines one pattern by having the features of highest intensity in its centre". Could this be rephrased? We did not understand what that exactly means.*

      We agree with the reviewer that this was not clear. We rewrote this paragraph (lines 101-114) and provided a supplementary figure to illustrate these definitions (Figure 1 - figure supplement 2).

      * Line 124 - 147: This part the only description of the algorithm behind the feature extraction and analysis, but not clearly stated. Many details are missing or assumed known by the reader. For example, how it achieved sub-pixel resolution results is not clear. One can only assume that by fitting Gaussian to the band, the center position (peak) thus can be calculated from continuous curves other than pixels. *

      Note that the two sentences introducing this description are "Automated feature extraction is the core of the tool. The algorithm takes multiple steps to achieve this (Fig. S2):". We were hoping this statement was clear, but the reviewer may refer to something else. We agree that the description of some of the details of the steps was too quick. We have now expanded the description where needed.

      * Line 407: We think the availability of both the tool and the code could be improved. For Fiji tools it is common practice to create an Update Site and to make the code available on GitHub. In addition, downloading the example file (https://drive.google.com/file/d/1eMazyQJlisWPwmozvyb8VPVbfAgaH7Hz/view?usp=drive_link) required a Google login and access request, which is not very convenient; in fact, we asked for access but it was denied. It would be important for the download to be easier, e.g. from GitHub or Zenodo.

      *

      We are sorry for issues encountered when downloading the tool and additional material. We thank the reviewer for pointing out these issues that limited the accessibility of our tool. We simplified the downloading procedure on the website, which does not go through the google drive interface nor requires a google account. Additionally, for the coder community the code, user manual and examples are now available from GitHub at github.com/PierreMangeol/PatternJ, and are provided as supplementary material with the manuscript. To our knowledge, update sites work for plugins but not for macro toolsets. Having experience sharing our codes with non-specialists, a classical website with a tutorial video is more accessible than more coder-oriented websites, which deter many users.

      * Reviewer #2 (Significance (Required)):

      The strength of this study is that a tool for the analysis of one-dimensional repeated patterns occurring in muscle fibres is made available in the accessible open-source platform ImageJ/Fiji. In the introduction to the article the authors provide an extensive review of comparable existing tools. Their new tool fills a gap in terms of providing an easy-to-use software for users without computational skills that enables the analysis of muscle sarcomere patterns. We feel that if the below mentioned limitations could be addressed the tool could indeed be valuable to life scientists interested in muscle patterning without computational skills.

      In our view there are a few limitations, including the accessibility of example data and tutorials at sites.google.com/view/patternj, which we had trouble to access. In addition, we think that the workflow in Fiji, which currently requires pressing several buttons in the correct order, could be further simplified and streamlined by adopting some "wizard" approach, where the user is guided through the steps.

      *As answered above, the links on the PatternJ website are now corrected. Regarding the workflow, we now provide a Help menu with:

      1. __a basic set of instructions to use the tool, __
      2. a direct link to the tutorial video in the PatternJ toolset
      3. a direct link to the website on which both the tutorial video and a detailed user manual can be found. We hope this addresses the issues raised by this reviewer.

      *Another limitation is the reproducibility of the analysis; here we recommend enabling IJ Macro recording as well as saving of the drawn line ROIs. For more detailed suggestions for improvements please see the above sections of our review. *

      We agree that saving ROIs is very useful. It is now implemented in PatternJ.

      We are not sure what this reviewer means by "enabling IJ Macro recording". The ImageJ Macro Recorder is indeed very useful, but to our knowledge, it is limited to built-in functions. Our code is open and we hope this will be sufficient for advanced users to modify the code and make it fit their needs.*

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript, the authors present a new toolset for the analysis of repetitive patterns in biological images named PatternJ. One of the main advantages of this new tool over existing ones is that it is simple to install and run and does not require any coding skills whatsoever, since it runs on the ImageJ GUI. Another advantage is that it does not only provide the mean length of the pattern unit but also the subpixel localization of each unit and the distributions of lengths and that it does not require GPU processing to run, unlike other existing tools. The major disadvantage of the PatternJ is that it requires heavy, although very simple, user input in both the selection of the region to be analyzed and in the analysis steps. Another limitation is that, at least in its current version, PatternJ is not suitable for time-lapse imaging. The authors clearly explain the algorithm used by the tool to find the localization of pattern features and they thoroughly test the limits of their tool in conditions of varying SNR, periodicity and band intensity. Finally, they also show the performance of PatternJ across several biological models such as different kinds of muscle cells, neurons and fish embryonic somites, as well as different imaging modalities such as brightfield, fluorescence confocal microscopy, STORM and even electron microscopy.

      This manuscript is clearly written, and both the section and the figures are well organized and tell a cohesive story. By testing PatternJ, I can attest to its ease of installation and use. Overall, I consider that PatternJ is a useful tool for the analysis of patterned microscopy images and this article is fit for publication. However, i do have some minor suggestions and questions that I would like the authors to address, as I consider they could improve this manuscript and the tool:

      *We are grateful to this reviewer for this very positive assessment of PatternJ and of our manuscript.

      * Minor Suggestions: In the methodology section is missing a more detailed description about how the metric plotted was obtained: as normalized intensity or precision in pixels. *

      We agree with the reviewer that a more detailed description of the metric plotted was missing. We added this information in the method part and added information in the Figure captions where more details could help to clarify the value displayed.

      * The validation is based mostly on the SNR and patterns. They should include a dataset of real data to validate the algorithm in three of the standard patterns tested. *

      We validated our tool using computer-generated images, in which we know with certainty the localization of patterns. This allowed us to automatically analyze 30 000 images, and with varying settings, we sometimes analyzed 10 times the same image, leading to about 150 000 selections analyzed. From these analyses, we can provide with confidence an unbiased assessment of the tool precision and the tool capacity to extract patterns. We already provided examples of various biological data images in Figures 4-6, showing all possible features that can be extracted with PatternJ. In these examples, we can claim by eye that PatternJ extracts patterns efficiently, but we cannot know how precise these extractions are because of the nature of biological data: "real" positions of features are unknown in biological data. Such validation will be limited to assessing whether a pattern was found or not, which we believe we already provided with the examples in Figures 4-6.

      * The video tutorial available in the PatternJ website is very useful, maybe it would be worth it to include it as supplemental material for this manuscript, if the journal allows it. *

      As the video tutorial may have been missed by other reviewers, we agree it is important to make it more prominent to users. We have now added a Help menu in the toolset that opens the tutorial video. Having the video as supplementary material could indeed be a useful addition if the size of the video is compatible with the journal limits.

      * An example image is provided to test the macro. However, it would be useful to provide further example images for each of the three possible standard patterns suggested: Block, actin sarcomere or individual band.*

      We agree this can help users. We now provide another multi-channel example image on the PatternJ website including blocks and a pattern made of a linear intensity gradient that can be extracted with our simpler "single pattern" algorithm, which were missing in the first example. Additionally, we provide an example to be used with our new time-lapse analysis.

      * Access to both the manual and the sample images in the PatternJ website should be made publicly available. Right now they both sit in a private Drive account. *

      As mentioned above, we apologize for access issues that occurred during the review process. These files can now be downloaded directly on the website without any sort of authentication. Additionally, these files are now also available on GitHub.

      * Some common errors are not properly handled by the macro and could be confusing for the user: When there is no selection and one tries to run a Check or Extraction: "Selection required in line 307 (called from line 14). profile=getProfile( ;". A simple "a line selection is required" message would be useful there. When "band" or "block" is selected for a channel in the "Set parameters" window, yet a 0 value is entered into the corresponding "Number of bands or blocks" section, one gets this error when trying to Extract: "Empty array in line 842 (called from line 113). if ( ( subloc . length == 1 ) & ( subloc [ 0 == 0) ) {". This error is not too rare, since the "Number of bands or blocks" section is populated with a 0 after choosing "sarcomeric actin" (after accepting the settings) and stays that way when one changes back to "blocks" or "bands".*

      We thank the reviewer for pointing out these bugs. These bugs are now corrected in the revised version.

      * The fact that every time one clicks on the most used buttons, the getDirectory window appears is not only quite annoying but also, ultimately a waste of time. Isn't it possible to choose the directory in which to store the files only once, from the "Set parameters" window?*

      We have now found a solution to avoid this step. The user is only prompted to provide the image folder when pressing the "Set parameter" button. We kept the prompt for directory only when the user selects the time-lapse analysis or the analysis of multiple ROIs. The main reason is that it is very easy for the analysis to end up in the wrong folder otherwise.

      * The authors state that the outputs of the workflow are "user friendly text files". However, some of them lack descriptive headers (like the localisations and profiles) or even file names (like colors.txt). If there is something lacking in the manuscript, it is a brief description of all the output files generated during the workflow.*

      PatternJ generates multiple files, several of which are internal to the toolset. They are needed to keep track of which analyses were done, and which colors were used in the images, amongst others. From the user part, only the files obtained after the analysis All_localizations.channel_X.txt and sarcomere_lengths.txt are useful. To improve the user experience, we now moved all internal files to a folder named "internal", which we think will clarify which outputs are useful for further analysis, and which ones are not. We thank the reviewer for raising this point and we now mention it in our Tutorial.

      I don't really see the point in saving the localizations from the "Extraction" step, they are even named "temp".

      We thank the reviewer for this comment, this was indeed not necessary. We modified PatternJ to delete these files after they are used.

      * In the same line, I DO see the point of saving the profiles and localizations from the "Extract & Save" step, but I think they should be deleted during the "Analysis" step, since all their information is then grouped in a single file, with descriptive headers. This deleting could be optional and set in the "Set parameters" window.*

      We understand the point raised by the reviewer. However, the analysis depends on the reference channel picked, which is asked for when starting an analysis, and can be augmented with additional selections. If a user chooses to modify the reference channel or to add a new profile to the analysis, deleting all these files would mean that the user will have to start over again, which we believe will create frustration. An optional deletion at the analysis step is simple to implement, but it could create problems for users who do not understand what it means practically.

      * Moreover, I think it would be useful to also save the linear roi used for the "Extract & Save" step, and eventually combine them during the "Analysis step" into a single roi set file so that future re-analysis could be made on the same regions. This could be an optional feature set from the "Set parameters" window. *

      We agree with the reviewer that saving ROIs is very useful. ROIs are now saved into a single file each time the user extracts and saves positions from a selection. Additionally, the user can re-use previous ROIs and analyze an image or image series in a single step.

      * In the "PatternJ workflow" section of the manuscript, the authors state that after the "Extract & Save" step "(...) steps 1, 2, 4, and 5 can be repeated on other selections (...)". However, technically, only steps 1 and 5 are really necessary (alternatively 1, 4 and 5 if the user is unsure of the quality of the patterning). If a user follows this to the letter, I think it can lead to wasted time.

      *

      We agree with the reviewer and have corrected the manuscript accordingly (line 119-120).

      • *

      *I believe that the "Version Information" button, although important, has potential to be more useful if used as a "Help" button for the toolset. There could be links to useful sources like the manuscript or the PatternJ website but also some tips like "whenever possible, use a higher linewidth for your line selection" *

      We agree with the reviewer as pointed out in our previous answers to the other reviewers. This button is now replaced by a Help menu, including a simple tutorial in a series of images detailing the steps to follow, a link to the user website, and a link to our video tutorial.

      * It would be interesting to mention to what extent does the orientation of the line selection in relation to the patterned structure (i.e. perfectly parallel vs more diagonal) affect pattern length variability?*

      As answered to reviewer 1, we understand this concern, which needs to be clarified for readers. The issue may be concerning at first sight, but the errors grow only with the inverse of cosine and are therefore rather low. For example, if the user creates a selection off by 3 degrees, which is visually obvious, lengths will be affected by an increase of only 0.14%. The point raised by the reviewer is important to discuss, and we therefore have added a comment on the choice of selection (lines 94-98) as well as a supplementary figure (Figure 1 - figure supplement 1).

      * When "the algorithm uses the peak of highest intensity as a starting point and then searches for peak intensity values one spatial period away on each side of this starting point" (line 133-135), does that search have a range? If so, what is the range? *

      We agree that this information is useful to share with the reader. The range is one pattern size. We have modified the sentence to clarify the range of search used and the resulting limits in aperiodicity (now lines 176-181).

      * Line 144 states that the parameters of the fit are saved and given to the user, yet I could not find such information in the outputs. *

      The parameters of the fits are saved for blocks. We have now clarified this point by modifying the manuscript (lines 186-198) and modifying Figure 1 - figure supplement 5. We realized we made an error in the description of how edges of "block with middle band" are extracted. This is now corrected.

      * In line 286, authors finish by saying "More complex patterns from electron microscopy images may also be used with PatternJ.". Since this statement is not backed by evidence in the manuscript, I suggest deleting it (or at the very least, providing some examples of what more complex patterns the authors refer to). *

      This sentence is now deleted.

      * In the TEM image of the fly wing muscle in fig. 4 there is a subtle but clearly visible white stripe pattern in the original image. Since that pattern consists of 'dips', rather than 'peaks' in the profile of the inverted image, they do not get analyzed. I think it is worth mentioning that if the image of interest contains both "bright" and "dark" patterns, then the analysis should be performed in both the original and the inverted images because the nature of the algorithm does not allow it to detect "dark" patterns. *

      We agree with the reviewer's comment. We now mention this point in lines 337-339.

      * In line 283, the authors mention using background correction. They should explicit what method of background correction they used. If they used ImageJ's "subtract background' tool, then specify the radius.*

      We now describe this step in the method section.

      *

      Reviewer #3 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. Being a software paper, the advance proposed by the authors is technical in nature. The novelty and significance of this tool is that it offers quick and simple pattern analysis at the single unit level to a broad audience, since it runs on the ImageJ GUI and does not require any programming knowledge. Moreover, all the modules and steps are well described in the paper, which allows easy going through the analysis.
      • Place the work in the context of the existing literature (provide references, where appropriate). The authors themselves provide a good and thorough comparison of their tool with other existing ones, both in terms of ease of use and on the type of information extracted by each method. While PatternJ is not necessarily superior in all aspects, it succeeds at providing precise single pattern unit measurements in a user-friendly manner.
      • State what audience might be interested in and influenced by the reported findings. Most researchers working with microscopy images of muscle cells or fibers or any other patterned sample and interested in analyzing changes in that pattern in response to perturbations, time, development, etc. could use this tool to obtain useful, and otherwise laborious, information. *

      We thank the reviewer for these enthusiastic comments about how straightforward for biologists it is to use PatternJ and its broad applicability in the bio community.